DS-week7

Data Replication

  • enhance system reliability
  • improve performance
  • improve system scalability
  • keeping data replicas consistent
  1. Reliability: If one replica crashes it could contnue working
    • better protection against corrupted data

The Price of Replication

  • having multiple copies may lead to consistency problems
  • when and how those modifications need to be carries out deterines the price of replication

Replication for Performance

  • reduce access time and solve scalability problems

Problems:

  • keeping multiple copies up to date require more network bandwidth
  • keeping multiple copies consistent lead to serious scalability problems

Tight consistency

  • Provides strong guarantees about the ordering and visibility of updates to shared data
  • updates to shared data are immediately visible to all nodes in the system

Relaxing tight consistency

  • using a consistency model that allows for some level of inconsistency or latency
  • global synchronizations are avoided
  • performance improved

Consistency Models - Assumptions

  • assuming a data store which multiple processes running on different machines have access
  • a consistency model is a “contract” between the processes and this data store

Sequential Consistency Model

  • updates to shared data are applied in the order in which they are generated by the processes in the system
  • no involvement of time
  • no reference to the “most recent” write operation
  • different processes must see the updates to the shared data in the same order
  • 所有的process都需要看到同一个order,如果P2执行了write b那么所有的node都需要同时执行read b,但是如果两个write看起来有先后顺序,如果所有的read操作都是对于相同的变量,那么可以允许

Causal Consistency Model

  • If b caused by earlier event a, causality requires that everyone first sees a then see b

  • Operations that are not causally related are said to be concurrent

    P2 有一个R(X)a的操作,这就说明W(X)b有可能是基于W(X)a之后的X执行的write操作,所以他俩是causally related,所以在所有的nodes里都需要先看见a再看见b,在P3中先看见了b所以这个不是causally related

Replica management

  • keeping the replicas consistent
  • classical optimisation problem

Agreedy Heuristic to Find Location

首先找到总cost最小的网站,然后用这个网站来更新其他网站到这个网站的cost,然后重复(在第一步之后可以去掉这个网站)