Enabling cloud bursting for life sciences within Galaxy


  • Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.
  • Categories and Subject Descriptors

    • H.2.4 [Information Storage and Retrieval]: Systems and Software - distributed systems
  • General Terms

    • Performance, Design, Experimentation


Fueled by the radically increased capacity to generate data over the past decade, the field of biomedical research has been constrained by the ability to analyze data. Galaxy, a Web-based, open-source data integration and analysis platform for life science research, has been democratizing access to data analysis tools. However, the scale of data and the scope of tools required have proven to be a significant challenge for any monolithic deployment of the Galaxy application. We have found that a distributed and federated approach to utilizing compute and storage resources is necessary. This paper describes the ongoing efforts in creating a ubiquitous platform capable of simultaneously utilizing dedicated as well as on-demand cloud resources. Specifically, the requirements, process, and an implementation of a cloud-bursting system are detailed. Copyright © 2010 John Wiley & Sons, Ltd.