Fault Tolerance
- Strongly associated with dependable systems
- availablity: the probability that the system operates correctly at any given moment 能正确运行的概率
- reliability: length of time that it can run continuously without failure 能正确运行的连续市场
- safety: if and when failures occur, the consequences are not catastrophic for the system 如果发生错误,对于系统不是灾难性的
- maintainability: how easily a failed system can be repaired 维修难度
- building dependable, fault tolerant systems relates to controlling faults
Types of Failures
- crash failure: halts, but working correctly until it halts
- omission failure(遗漏): fails to respond to incoming requests
- receive omission: fails to receive
- Send omission: fails to send
- Timing failure: Response lies outside a specified time interval
- Response failure: response is incorrect
- value failure: value is wrong
- State_transition failure: deviates from the correct flow of control
- Arbitrary failure: may produce arbitrary responses at arbitrary times
- When arbitrary failures occur, clients should be prepared for the worst
Arbitrary failures are also known as Byzantine failures
Redundancy in Masking Failures
- Physical redundancy
- Time redundancy: an action is performed if needed, again and again, helpful when faults are transient(短时间的) and intermittent(断断续续的)
- Information redundancy: eg. send extra bits
- should improve performance, but keeping replicas consistency can damage performance
All-or-Nothing: THe need for Atomicity
- Either ALL operations execute or NONE
Isolated Execution - The need for Isolation
- ensure that “concurrent” applications do not interfere with each other
Seial Executions
- concurrent executions do not interfere with each other if their execution is equavalent to a serial one
- one transfer at a time
- not scalable and very very slow
ACID
- atomicity: all or nothing, either completed in its entriety or not at all
- consistency: all constraints, rules and relationships are maintained throughout the transaction(kept up-to-date)一致性
- isolation: concurrent transactions being executed as if they were the only transactions in the system, this ensures the outcome is not affected by the concurrent execution of other transactions 合流的交易分别看作自己独立的交易确保不会被同时进行的另一个交易干扰
- durability: once a transaction is commited, its changes must be permanent and survive any subsequent system failures or errors, this ensures the data remains consistent and reliable
How Transactions are Implemented
- Managing multiple simutaneous users
- Concurrency control algorithms
- Durability
- Recovery algorithms
Concurrency Control
- Acquire locks phase
- get a read lock before read
- get a write lock before write
- Read lock conflict with write lock
- write lock conflict with read and write lock
- Release locks phase: where transaction terminates