A highly reliable large scale computer complex
Version of Record online: 22 MAR 2007
Copyright © 1991 Wiley Periodicals, Inc., A Wiley Company
Systems and Computers in Japan
Volume 22, Issue 4, pages 10–21, 1991
How to Cite
Turuho, S., Nakatani, H. and Gohara, J. (1991), A highly reliable large scale computer complex. Syst. Comp. Jpn., 22: 10–21. doi: 10.1002/scj.4690220402
- Issue online: 22 MAR 2007
- Version of Record online: 22 MAR 2007
To realize a highly reliable large-scale distributed function system composed of a number of HOST and FEP, it is necessary that the fault detection of the HOST and FEP processors, as well as the switching to the back-up processor, be conducted at a high speed.
This paper considers a hot standby system with N operating processors and 1 standby processor. A method of high-speed recovery from the processor fault and the centralized monitoring by the system control processor (SCP) are described. In the proposed method, SCP detects the generation of fault in HOST or FEP, and instructs the switching to the standby processor. In the recovery from a fault in HOST, the communication path to the terminal is maintained by FEP, and the received message is kept in FEP until the switching of HOST is completed.
It is shown also that the recovery in approximately half the time needed in the conventional system realized in FEP fault, and the availability of FEP is improved by one order of magnitude, by floating the standby.
By these elaborations, the switching time to the standby in the fault of HOST and FEP is reduced drastically in a large-scale function distributed system with several thousands of terminals. It is verified that in the fault of HOST, the response message can be returned to most of the terminals within the response monitoring time, and the system fault can be concealed from most terminal users.