Data Replication
- enhance system reliability
- improve performance
- improve system scalability
- keeping data replicas consistent
- Reliability: If one replica crashes it could contnue working
- better protection against corrupted data
The Price of Replication
- having multiple copies may lead to consistency problems
- when and how those modifications need to be carries out deterines the price of replication
Replication for Performance
- reduce access time and solve scalability problems
Problems:
- keeping multiple copies up to date require more network bandwidth
- keeping multiple copies consistent lead to serious scalability problems
Tight consistency
- Provides strong guarantees about the ordering and visibility of updates to shared data
- updates to shared data are immediately visible to all nodes in the system
Relaxing tight consistency
- using a consistency model that allows for some level of inconsistency or latency
- global synchronizations are avoided
- performance improved
Consistency Models - Assumptions
- assuming a data store which multiple processes running on different machines have access
- a consistency model is a “contract” between the processes and this data store
Sequential Consistency Model
- updates to shared data are applied in the order in which they are generated by the processes in the system
- no involvement of time
- no reference to the “most recent” write operation
- different processes must see the updates to the shared data in the same order
- 所有的process都需要看到同一个order,如果P2执行了write b那么所有的node都需要同时执行read b,但是如果两个write看起来有先后顺序,如果所有的read操作都是对于相同的变量,那么可以允许
Causal Consistency Model
If b caused by earlier event a, causality requires that everyone first sees a then see b
Operations that are not causally related are said to be concurrent
P2 有一个R(X)a的操作,这就说明W(X)b有可能是基于W(X)a之后的X执行的write操作,所以他俩是causally related,所以在所有的nodes里都需要先看见a再看见b,在P3中先看见了b所以这个不是causally related
Replica management
- keeping the replicas consistent
- classical optimisation problem
Agreedy Heuristic to Find Location
首先找到总cost最小的网站,然后用这个网站来更新其他网站到这个网站的cost,然后重复(在第一步之后可以去掉这个网站)