×

Optimal checkpointing interval of a communication system with rollback recovery. (English) Zbl 1065.90009

Summary: This paper considers a communication system which consists of many processors and studies the problem for improving its reliability by adopting the recovery techniques of checkpoint and rollback. When either processor failure or communication error has occurred, the rollback recovery for processors associated with such an event is executed to the most recent checkpoint, and so, a consistent state in the whole system is maintained. The stochastic model with the above recovery techniques is formulated, using the theory of Markov renewal processes. The mean time to take checkpoint and the expected numbers of rollback recovery caused by processor failures and communication errors are derived. Further, an optimal checkpointing interval which minimizes the expected cost is analytically discussed.

MSC:

90B18 Communication networks in operations research
90B25 Reliability, availability, maintenance, inspection in operations research
Full Text: DOI

References:

[1] Yoneda, K.; Matsubara, T.; Koga, Y., Investigation of multi-processor system with rollback function, Technical Report of IEICE, FTS97-20, 27-33 (1997)
[2] Chandy, K. M.; Lamport, L., Distributed snapshots: Determining global states of distributed systems, ACM Trans. Comput. Syst., 3, 63-75 (1985)
[3] Koo, R.; Toueg, S., Checkpointing and rollback-recovery for distributed systems, IEEE Trans. Software Eng., SE-13, 23-31 (1987) · Zbl 0603.68018
[4] Strom, R. E.; Yemini, S. A., Optimistic recovery in distributed systems, ACM Trans. Comput. Syst., 3, 204-226 (1985)
[5] Randell, B., System structure for software fault tolerance, IEEE Trans. Software Eng., SE-1, 220-232 (1975)
[6] Osaki, S., Applied Stochastic System Modeling (1992), Springer-Verlag · Zbl 0745.60090
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.