Adaptive fault tolerance mechanisms for opportunistic environments: a mobile agent approach



The mobile agent paradigm has emerged as a promising alternative to overcome the construction challenges of opportunistic grid environments. This model can be used to implement mechanisms that enable application execution progress even in the presence of failures such as the mechanisms provided by the MAG middleware (Mobile Agents for Grids). MAG includes retrying, replication, and checkpointing as fault tolerance techniques; they operate independently from each other and they are not capable of detecting changes on resource availability. In this paper, we describe a MAG extension that is capable of migrating agents when nodes fail, which optimizes application progress by keeping only the most advanced checkpoint, and also migrates slow replicas. The proposed approach was evaluated via simulations and experiments, which showed significant improvements. Copyright © 2011 John Wiley & Sons, Ltd.