×

A new heuristic approach for reliability optimization of distributed computing systems subject to capacity constraints. (English) Zbl 0830.68005

Summary: Distributed Computing Systems (DCS) have become a major trend in computer system design today, because of their high speed and reliable performance. Reliability is an important performance parameter in DCS design. In the reliability analysis of a DCS, the term of \(K\)-Node Reliability (KNR) is defined as the probability that all nodes in \(K\) (a subset of all processing elements) are connected.
In this paper, we propose a simple, easily programmed heuristic method for obtaining the optimal design of a DCS in terms of maximizing reliability subject to a capacity constraint. The first part of this paper presents a heuristic algorithm which selects an optimal set of \(K\)- nodes that maximizes the KNR in a DCS subject to the capacity constraint. The second part of the paper describes a new approach that uses a \(K\)- tree disjoint reduction method to speed up the KNR evaluation. Compared with existing algorithms on various DCS topologies, the proposed algorithm finds a suboptimal design much more efficiently in terms of both execution time and space than an exact and exhaustive method for a large DCS.

MSC:

68M15 Reliability, testing and fault tolerance of networks and computer systems
Full Text: DOI

References:

[1] Kumar, V. K.Prasnna; Hariri, S.; Raghavendra, C. S., Distributed program reliability analysis, IEEE Trans. Software Eng., 12, 1, 42-50 (1986) · Zbl 0575.68028
[2] Kumar, A.; Rai, S.; Agrawal, D. P., Reliability evaluation algorithms for distributed systems, (Proc. IEEE INFOCOM 88 (1988)), 851-860
[3] Satyanarayna, A.; Hagstrom, J. N., New algorithm for reliability analysis of multiterminal networks, IEEE Trans. Reliability, 30, 325-333 (1981) · Zbl 0484.90052
[4] Chen, D. J.; Huang, T. H., Reliability analysis of distributed systems based on a fast reliability algorithm, IEEE Trans. on Parallel and Distributed Systems, 3, 2 (1992)
[5] Fratta, L.; Montanari, U. G., Synthesis of available networks, IEEE Trans. Reliab., R-25, 81-87 (1976) · Zbl 0351.90033
[6] Wilkov, R. S., Design of computer networks based on a reliability measure, (Proceedings of the Symposium on Computer Communication. Networks and Teletraffics (1986)), 371-384
[7] Lawler, E. L.; Bell, M. D., A method for solving discrete optimization problems, Ops. Res., 14, 1098-1111 (1966)
[8] Stankovic, J. A., A perspective on distributed computer systems, IEEE Trans. Comput., 33, 1102-1115 (1984)
[9] Aggarwal, K. K.; Chopra, Y. C.; Bajwa, J. S., Topological layout of links for optimizing the S-T reliability in a computer communication system, Microelectron. Reliab., 22, 341-345 (1982)
[10] Aggarwal, K. K.; Chopra, Y. C.; Bajwa, J. S., Topological layout or links for optimizing the overall reliability in a computer communication system, Microelectron. Reliab., 22, 347-351 (1982)
[11] Chopra, Y. C.; Sohi, B. S.; Aggarwal, K. K., Network topological for maximizing the terminal 1 reliability in a computer communication system, Microelectron. Reliab., 24, 5, 911-913 (1984)
[12] R.-S. Chen, D.J. Chen and Y.S. Yeh, Reliability optimization of distributed computing systems subject to capacity constraint, Computers Math. Applic.; R.-S. Chen, D.J. Chen and Y.S. Yeh, Reliability optimization of distributed computing systems subject to capacity constraint, Computers Math. Applic. · Zbl 0830.68006
[13] Hariri, S.; Raghavendra, C. S., SYREL: A symbolic reliability algorithm based on path and cutset methods, USC Tech. Rep. (1984)
[14] Merwin, R. E.; Mirherkerk, M., Derivation and use of survivability criterion for DDP systems, (Proc. 1980 Nat’l. Computer Conference (1980)), 139-146
[15] Aggrawal, K. K.; Rai, S., Reliability evaluation in computer-communication networks, IEEE Trans. Reliability, 30, 32-35 (1981) · Zbl 0456.90026
[16] Grnarov, A.; Gerla, M., Multiterminal reliability analysis of distributed processing system, (Proc. 1981 Int’l. Conf. on Parallel Processing (1986)), 79-86
[17] Chen, D.-J.; Lin, M. S., New reliability evaluation algorithms for distributed computing system, Journal of Computer (1992), (to appear)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.