Research Article
Fault-tolerant procedures for redundant computer systems
Article first published online: 4 NOV 2008
DOI: 10.1002/qre.949
Copyright © 2008 John Wiley & Sons, Ltd.
Issue
1099-1638/asset/cover.gif?v=1&s=91cf0b1d2c7590acacca32deb1b06ecd55dbc6e0)
Quality and Reliability Engineering International
Volume 25, Issue 1, pages 41–68, February 2009
Additional Information
How to Cite
Samet, R. (2009), Fault-tolerant procedures for redundant computer systems. Quality and Reliability Engineering International, 25: 41–68. doi: 10.1002/qre.949
Publication History
- Issue published online: 8 JAN 2009
- Article first published online: 4 NOV 2008
- Abstract
- References
- Cited By
Keywords:
- redundant computer system;
- fault-tolerant procedure;
- non-Byzantine and Byzantine fault types;
- classification;
- selection algorithm;
- graphical model
Abstract
Real-time computer systems deployed in life-critical control applications must be designed to meet stringent reliability specifications. The minimum acceptable degree of reliability for systems of this type is ‘7 nines’, which is not generally achieved. This paper aims at contributing to the achievement of that degree of reliability. To this end, this paper proposes a classification scheme of the fault-tolerant procedures for redundant computer systems (RCSs). The proposed classification scheme is developed on the basis of the number of counteracted fault types. Table I is created to relate the characteristics of the RCSs to the characteristics of the fault-tolerant procedures. A selection algorithm is proposed, which allows designers to select the optimal type of fault-tolerant procedures according to the system characteristics and capabilities. The fault-tolerant procedure, which is selected by this algorithm, provides the required degree of reliability for a given RCS. According to the proposed graphical model only a part of the fault-tolerant procedure is executed depending on the absence or presence (type and sort) of faults. The proposed methods allow designers to counteract Byzantine and non-Byzantine fault types during degradation of RCSs from N to 3, and only the non-Byzantine fault type during degradation from 3 to 1 with optimal checkpoint time period. Copyright © 2008 John Wiley & Sons, Ltd.

1099-1638/asset/olbannerleft.gif?v=1&s=bfc6df9976184df556ca3a19460e571212b8a928)
1099-1638/asset/olbannerright.gif?v=1&s=1af18e491c66b62f1150533434e0d92814c3c747)