Recovery Management and Lock management

Recovery Management:

Model of Errors:

So as to design a recovery system it is significant to have a clear notion of what kinds of errors can be expected and what their probabilities are. The model of faults below is inspired by the presentation by Lampson as well as Sturgis in ‘Crash Recovery in a Distributed Data Storage System’, which may well someday appear in the CACM.

We first postulate that every error is detectable. i.e., if r. o one complains about a condition then it is OK.

Model of Storage Errors:

Storage comes in three flavours with independent failure modes as well as increasing reliability:

a) Volatile storage- paging space as well as main memory,
b) On-Line Non-volatile Storage- disks typically survive crashes. Is more dependable than volatile storage.
c) Off-Line Non-volatile Storage- Tape archive. Even more dependable than disks.

To repeat we presume that these three kinds of storage have independent failure modes.

The storage is blocked into fixed length units called as pages that are the unit of allocation as well as transfer.

Any page transfer is able to have one of three outcomes:

1. Success (target gets new value)
2. Partial failure (target is a ness)
3. Total failure (target is unchanged)

Any page may perhaps spontaneously fail. That is a spec of dust may perhaps settle on it or a black hole may pass through it thus that it no longer retains its original information, One can forever detect whether a transfer failed or a page spontaneously failed by reading the target page at a later time. (This can be made more as well as more certain by adding redundancy to the page.)

Finally the probability that N "independent" archive pages fail is negligible. Here we prefer N=2 (This can be made more as well as more certain by choosing larger and larger N.)

Model of Data Communications Errors:

Communication traffic is broken into units called messages via sessions.

The transmission of a message has one of three likely outcomes:

1. Successfully received.
2. Incorrectly received.
3. Not received.

The receiver of the message is able to detect whether he has received a particular message and whether it gas correctly received.

For every message transmitted there is a non-zero probability that it will be successfully received. It is the job of recovery administrator to deal with these storage and transmission errors and correct them. This model of errors is implied in what follows and will appear again in the examples at the end of the section.

