SEARCH

SEARCH BY CITATION

Abstract

To realize a highly reliable large-scale distributed function system composed of a number of HOST and FEP, it is necessary that the fault detection of the HOST and FEP processors, as well as the switching to the back-up processor, be conducted at a high speed.

This paper considers a hot standby system with N operating processors and 1 standby processor. A method of high-speed recovery from the processor fault and the centralized monitoring by the system control processor (SCP) are described. In the proposed method, SCP detects the generation of fault in HOST or FEP, and instructs the switching to the standby processor. In the recovery from a fault in HOST, the communication path to the terminal is maintained by FEP, and the received message is kept in FEP until the switching of HOST is completed.

It is shown also that the recovery in approximately half the time needed in the conventional system realized in FEP fault, and the availability of FEP is improved by one order of magnitude, by floating the standby.

By these elaborations, the switching time to the standby in the fault of HOST and FEP is reduced drastically in a large-scale function distributed system with several thousands of terminals. It is verified that in the fault of HOST, the response message can be returned to most of the terminals within the response monitoring time, and the system fault can be concealed from most terminal users.