An e-Science data infrastructure for simulations within Grid computing environment: methods, approaches and practice


Correspondence to: Xiaoyu Yang, Senior Member, Wolfson College, University of Cambridge, Barton Road, Cambridge, CB3 9BB, UK.



Grid-based simulation usually involves large quantities of data at each stage of the simulation process. These data include simulation input and output files, intermediate results files, log and error files, associated metadata, and information capturing the processes that generate the data. The question of how to effectively store and manage data files within a Grid computing environment is increasingly becoming an important issue. This paper illustrates how we built a lightweight e-Science infrastructure for data management within a Grid computing environment, including the integration of data curation activities into the entire Grid-based simulation process. Rather than focusing on specific implementation details, we aim to identify the key issues and research challenges, describing how various existing technologies and tools can be best integrated to address these requirements and challenges. Although the case of quantum mechanical simulation of materials properties is used in the paper, much of the discussion is as generic as possible so that approaches, methods and practice (e.g. integrated approach, workflow taxonomy and development approach, simple but useful semantic annotation approach) can be applied to wider domains and disciplines to facilitat the digital research. A comparison between our approach and Cloud computing, and lessons learned in data management within the Grid computing environment, are also presented. Copyright © 2012 John Wiley & Sons, Ltd.