This article was published online on 26 June 2013. An error was subsequently identified in the author group, where half of the authors were listed in the “Additional Authors” section at the last part of the article. The full author listing has now been moved to the front page. This notice is included in the online and print versions to indicate that both have been corrected [20 July 2013].
Special Issue Paper
Visualizing the Protein Sequence Universe†
Article first published online: 26 JUN 2013
Copyright © 2013 John Wiley & Sons, Ltd.
Concurrency and Computation: Practice and Experience
Special Issue: Combined Special Issues on Emerging Computational Methods for the Life Sciences Workshop (ECMLS 2012) and Multicore Cache Hierarchies: Design And Programmability Issues (MCH 2012)
Volume 26, Issue 6, pages 1313–1325, 25 April 2014
How to Cite
Stanberry, L., Higdon, R., Haynes, W., Kolker, N., Broomall, W., Ekanayake, S., Hughes, A., Ruan, Y., Qiu, J., Kolker, E. and Fox, G. (2014), Visualizing the Protein Sequence Universe. Concurrency Computat.: Pract. Exper., 26: 1313–1325. doi: 10.1002/cpe.3072
- Issue published online: 20 MAR 2014
- Article first published online: 26 JUN 2013
- Manuscript Accepted: 18 MAY 2013
- Manuscript Revised: 28 JAN 2013
- Manuscript Received: 2 JUL 2012
- NSF. Grant Numbers: 0969929 to E.K., 0910818 to G.F.
- NIH. Grant Number: 5 RC2 HG 005806- 02 to G.F.
- NIGMS. Grant Number: R01 GM-076680-04 to E.K.
- NIDDK. Grant Numbers: U01-DK-089571, U01-DK-072473 to E.K.
- data-enabled life sciences;
- sequence similarity;
- computational bioinfor-matics;
- protein annotation;
- protein sequence universe;
- PSU, COG;
- multidimensional scaling;
- data visualization;
Modern biology is experiencing a rapid increase in data volumes that challenges our analytical skills and existing cyberinfrastructure. Exponential expansion of the protein sequence universe (PSU), the protein sequence space, together with the costs and complexities of manual curation creates a major bottleneck in life sciences research. Existing resources lack scalable visualization tools that are instrumental for functional annotation. Here, we describe a new visualization tool using multidimensional scaling to create a 3D embedding of the protein space. The advantages of the proposed PSU method include the ability to scale to large numbers of sequences, integrate different similarity measures with other functional and experimental data, and facilitate protein annotation. We applied the method to visualize the prokaryotic PSU using sequence alignment scores. As an annotation example, we used the interpolation approach to map the set of annotated archaeal proteins into the prokaryotic PSU. Transdisciplinary approaches akin to the one described in this paper are urgently needed to quickly and efficiently translate the influx of new data into tangible innovations and groundbreaking discoveries. Copyright © 2013 John Wiley & Sons, Ltd.