Standard Article

Coordination and Synchronization: Designing Practical Detectors for Large-Scale Distributed Systems

  1. Indranil Gupta

Published Online: 15 JAN 2008

DOI: 10.1002/9780470050118.ecse085

Wiley Encyclopedia of Computer Science and Engineering

Wiley Encyclopedia of Computer Science and Engineering

How to Cite

Gupta, I. 2008. Coordination and Synchronization: Designing Practical Detectors for Large-Scale Distributed Systems. Wiley Encyclopedia of Computer Science and Engineering. .

Author Information

  1. University of Illinois at Urbana-Champaign, Urbana, Illinois

Publication History

  1. Published Online: 15 JAN 2008

Abstract

Online detectors are important to achieving coordination and synchronization in large-scale distributed applications that run in peer-to-peer systems, the Grid, PlanetLab, and large-scale, enterprise-like server farms. Detectors can be used to monitor the up/down status of hosts, the malicious behavior among processes, and the availability behavior among hosts, and to estimate the number of hosts in a distributed system. We discuss a variety of online detectors that exist for these different problems with an emphasis on practical solutions that satisfy two characteristics: They have been implemented and validated in experimental evaluation or practice, and they are based on novel ideas and on strong theory. The goal of this article is to enable practitioners to understand these protocols so they can be implemented easily or adapted for various distributed systems. This article aims to provide the starting researcher an overvew, and a good feel for the area of detectors to enable additional learning in this interesting field.

Keywords:

  • distributed systems;
  • detection;
  • crash-stop;
  • byzantine;
  • availability;
  • system size estimation;
  • scalability;
  • fault-tolerance