Fault-tolerant procedures for redundant computer systems

Authors

  • Refik Samet

    Corresponding author
    1. Department of Computer Engineering, Ankara University, Ord.Prof.Dr. Sevket Kansu Binasi, Besevler, Ankara 06100, Turkey
    • Department of Computer Engineering, Ankara University, Ord.Prof.Dr. Sevket Kansu Binasi, Besevler, Ankara 06100, Turkey
    Search for more papers by this author

Abstract

Real-time computer systems deployed in life-critical control applications must be designed to meet stringent reliability specifications. The minimum acceptable degree of reliability for systems of this type is ‘7 nines’, which is not generally achieved. This paper aims at contributing to the achievement of that degree of reliability. To this end, this paper proposes a classification scheme of the fault-tolerant procedures for redundant computer systems (RCSs). The proposed classification scheme is developed on the basis of the number of counteracted fault types. Table I is created to relate the characteristics of the RCSs to the characteristics of the fault-tolerant procedures. A selection algorithm is proposed, which allows designers to select the optimal type of fault-tolerant procedures according to the system characteristics and capabilities. The fault-tolerant procedure, which is selected by this algorithm, provides the required degree of reliability for a given RCS. According to the proposed graphical model only a part of the fault-tolerant procedure is executed depending on the absence or presence (type and sort) of faults. The proposed methods allow designers to counteract Byzantine and non-Byzantine fault types during degradation of RCSs from N to 3, and only the non-Byzantine fault type during degradation from 3 to 1 with optimal checkpoint time period. Copyright © 2008 John Wiley & Sons, Ltd.

Ancillary