Adapting bioinformatics applications for heterogeneous systems: a case study


  • Supporting information may be found in the online version of this article.


The advent of new sequencing technologies has generated extremely large amounts of information. To successfully apply bioinformatics tools to such large datasets, they need to exhibit scalability and ideally elasticity in diverse computing environments. We describe the application of previously obtained lessons to a new workflow with and without shared file storage. Because the original workflows have an intractable sequential running times on large datasets, we propose lessons and results for refactoring bioinformatics tools for elastic scaling on personal clouds. Our case studies describe the various challenges faced when constructing such a workflow, from dealing with failure detection, to managing dependencies, to handling the quirks of the underlying operating systems. The practice of scaling bioinformatics tools is increasingly commonplace. As such, this hands-on application of refactoring techniques can serve as a valuable guide. Significantly, our customized Makeflow framework enabled generalizable deployment on a wider variety of systems while substantially reducing wall clock runtimes using hundreds of cores. Copyright © 2012 John Wiley & Sons, Ltd.