Get access

Energy landscape analysis for regulatory RNA finding using scalable distributed cyberinfrastructure

Authors


Joohyun Kim and Shantenu Jha, Center for Computation & Technology, Louisiana State University, Baton Rouge, LA 70803, USA.

E-mail: jhkim@cct.lsu.edu

E-mail: sjha@cct.lsu.edu

SUMMARY

We investigate the folding energy landscape for a given RNA sequence through Boltzmann ensemble (BE) sampling of RNA secondary structures. The ensemble of sampled structures is used to derive distributions of energies and base-pair distances between two configurations. We identify structural features that can be utilized for RNA gene finding. Characterization of the EL through BE sampling of secondary structures is computationally demanding and has multiple heterogeneous stages. We develop the Distributed Adaptive Runtime Environment to effectively address the computational requirements. Distributed Adaptive Runtime Environment is built upon an extensible and interoperable pilot-job and supports the concurrent execution of a broad range of task sizes across a range of infrastructure. It is used to investigate two RNA systems of different sizes, S-adenosyl methionine (SAM) binding RNA sequences known as SAM-I riboswitches, and the S gene of the bovine corona virus RNA genome. We demonstrate how the implementation lowers the total time to solution for increases in RNA length, the number of sequences investigated, and the number of sampled structures. The distributions of energies and base-pair distances reveal variations in folding dynamics and pathways among the SAM riboswitch sequences. Our results for BCoV RNA genome sequences also indicate sensitivity of folding to coding-neutral variations in sequence. We search for a characteristic motif from within the SAM-I consensus structure – a four-way junction, among BE sampled structures for all 2910 SAM-I sequences identified from Rfam (the curated ncRNA family database). We find that BE sampling provides insight into the variations in conformational distribution among sequences of the same ncRNA family. Therefore, BE sampling of secondary structures is a viable pre-processing or post-processing tool to complement comparative sequence analysis. The understanding gained shows how appropriately designed cyberinfrastructure can provide new insight into RNA folding and structure formation. Copyright © 2011 John Wiley & Sons, Ltd.

Ancillary