Scripting distributed scientific workflows using Weaver


Peter Bui, Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA.



Weaver is a high-level distributed computing framework that enables researchers to construct scalable scientific data-processing workflows. Instead of developing a new workflow language, we introduce a domain-specific language built on top of Python called Weaver, which takes advantage of users' familiarity with the programming language, minimizes barriers to adoption, and allows for integration with a rich ecosystem of existing software. In this paper, we provide an overview of Weaver's programming model, which allows users to organize and specify scientific workflows by using a collection of datasets, functions, and abstractions. We also explain how these workflow specifications are compiled into a directed acyclic graph that is used by the Makeflow workflow manager to dispatch work to a variety of distributed execution platforms. To demonstrate the power and benefits of using the framework in constructing scientific research applications, the paper examines four distinct real-world applications scripted using Weaver and analyzes the performance, scalability, and impact of the distributed generated scientific workflows. Copyright © 2011 John Wiley & Sons, Ltd.