Full autonomic repair for distributed applications



Grid or cloud environments leverage the need for self-repair solutions that resist and repair their own failures, something not yet ensured by existing solutions. In this paper, we describe the JADE Autonomic Repair System for legacy applications deployed in a grid or cloud environment. JADE is based on three main design principles. First, legacy applications are wrapped with Java objects, obtaining a uniform set of management operations over the heterogeneous legacy management capabilities. Second, to gain full autonomy, we adopt a replicated design combined with a recursive approach that makes JADE appear to JADE as any distributed application it manages and repairs. Finally, to scale, we rely on tiling the distributed environment and structuring our repair system per tile. To our knowledge, our repair system is the only one that is designed to scale and is fully autonomic, repairing not only the failures of the managed system but also its own. Our repair system has been tested in various realistic scenarios. Copyright © 2013 John Wiley & Sons, Ltd.