×

Recovery in distributed systems using optimistic message logging and checkpointing. (English) Zbl 0711.68009

The paper presents an algorithm for determining the unique maximum recoverable system state. Using message logging and checkpointing, the algorithm provide fault tolerance in distributed systems.
The main feature of the algorithm is that it always finds the maximum recoverable state. It is based on a procedure, FIND-REC, which uses an initial recoverable system state and some stable state interval \(\sigma\) of some process k. The procedure attempts to find, if possible, a new recoverable system state in which the state of the process k is advanced at least to state interval \(\sigma\).
Finally, some related work is surveyed.
Reviewer: D.Grigoras

MSC:

68M15 Reliability, testing and fault tolerance of networks and computer systems
68N25 Theory of operating systems
Full Text: DOI