×

Resolving error propagation in distributed systems. (English) Zbl 1338.68024

Summary: This paper investigates the problem of error propagation in distributed systems. To resolve this problem, a state preservation scheme is presented to save process states in main memory. Based on the state preservation, the processes suffering from error propagation can be recovered without involving stable storage. The recovery overhead is significantly reduced. In addition, a well-known single-source-all-destination graph algorithm is also utilized to find the optimal recovery points of the processes suffering from error propagation.

MSC:

68M14 Distributed systems
68R10 Graph theory (including graph drawing) in computer science

References:

[1] Koo, R.; Toueg, S., Checkpointing and rollback recovery for distributed systems, IEEE Trans. Software Engrg., Vol. SE-13, 1, 23-31 (1987) · Zbl 0603.68018
[2] Elnozahy, E. N.; Zwaenepoel, W., Manetho: Transparent rollback-recovery with low overhead, limited rollback, and fast output commit, IEEE Trans. Comput., Vol. 41, 5, 526-531 (1990)
[3] Kim, J. L.; Park, T., An efficient protocol for checkpointing recovery in distributed systems, IEEE Trans. Parallel Distrib. Systems, Vol. 4, 8, 955-960 (1993)
[4] Silva, L. M.; Silva, J. G., Global checkpointing for distributed programs, (Proc. IEEE Symp. on Reliable Distributed Systems (1992)), 155-162
[5] Krishna, P.; Vaidya, N. H.; Pradhan, D. K., Recovery in multicomputers with finite error detection latency, (Proc. 24th Internat. Symp. on Fault-Tolerant Computing (1994)), 155-162
[6] Jeffrey, K. L.; Naughton, F.; Plank, J. S., Low-latency, concurrent checkpointing for parallel programs, IEEE Trans. Parallel Distrib. Systems, Vol. 5, 8, 874-879 (1994)
[7] Manber, U., Introduction to Algorithms A Creative Approach (1989), Addison-Wesley: Addison-Wesley Reading, MA · Zbl 0825.68397
[8] Janssens, B.; Fuchs, W. K., Relaxing consistency in recoverable distributed shared memory, (Proc. 23rd Internat. Symp. on Fault-Tolerant Computing (1993)), 155-163
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.