Research Article
Scientific workflow management and the Kepler system
Article first published online: 13 DEC 2005
DOI: 10.1002/cpe.994
Copyright © 2005 John Wiley & Sons, Ltd.
Issue
1532-0634/asset/cover.gif?v=1&s=6094df24c795ce080ff6df6ff3b6bcec19adb708)
Concurrency and Computation: Practice and Experience
Special Issue: Workflow in Grid Systems
Volume 18, Issue 10, pages 1039–1065, 25 August 2006
Additional Information
How to Cite
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E. A., Tao, J. and Zhao, Y. (2006), Scientific workflow management and the Kepler system. Concurrency Computat.: Pract. Exper., 18: 1039–1065. doi: 10.1002/cpe.994
Publication History
- Issue published online: 19 JUL 2006
- Article first published online: 13 DEC 2005
- Manuscript Accepted: 27 APR 2005
- Manuscript Revised: 6 APR 2005
- Manuscript Received: 1 JUN 2004
Funded by
- NSF/ITR. Grant Numbers: 0225676 (SEEK), CCR-00225610 (Chess), 0225673 (GEON), 0325963 (ROADNet)
- DOE SciDAC. Grant Number: DE-FC02-01ER25486 (SDM)
- NIH/NCRR. Grant Number: 1R24 RR019701-01
- Biomedical Informatics Research Network Coordinating Center (BIRN-CC)
- NSF/DBI. Grant Number: 0078296 (Resurgence)
- Abstract
- References
- Cited By
Keywords:
- scientific workflows;
- Grid workflows;
- scientific data management;
- problem-solving environments;
- dataflow networks
Abstract
Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists putting together data analysis and knowledge discovery ‘pipelines’. A related trend is that more and more scientific communities realize the benefits of sharing their data and computational services, and are thus contributing to a distributed data and computational community infrastructure (a.k.a. ‘the Grid’). However, this infrastructure is only a means to an end and ideally scientists should not be too concerned with its existence. The goal is for scientists to focus on development and use of what we call scientific workflows. These are networks of analytical steps that may involve, e.g., database access and querying steps, data analysis and mining steps, and many other steps including computationally intensive jobs on high-performance cluster computers. In this paper we describe characteristics of and requirements for scientific workflows as identified in a number of our application projects. We then elaborate on Kepler, a particular scientific workflow system, currently under development across a number of scientific data management projects. We describe some key features of Kepler and its underlying Ptolemy II system, planned extensions, and areas of future research. Kepler is a community-driven, open source project, and we always welcome related projects and new contributors to join. Copyright © 2005 John Wiley & Sons, Ltd.

1532-0634/asset/olbannerleft.gif?v=1&s=a4e4e145787de94e1d91eaab3c8c29d8a9d96a26)