Special Issue Paper
Browsing large-scale cheminformatics data with dimension reduction
Article first published online: 14 JUN 2011
DOI: 10.1002/cpe.1781
Copyright © 2011 John Wiley & Sons, Ltd.
Issue

Concurrency and Computation: Practice and Experience
Volume 23, Issue 17, pages 2315–2325, 10 December 2011
Additional Information
How to Cite
Choi, J. Y., Bae, S.-H., Qiu, J., Chen, B. and Wild, D. (2011), Browsing large-scale cheminformatics data with dimension reduction. Concurrency Computat.: Pract. Exper., 23: 2315–2325. doi: 10.1002/cpe.1781
Publication History
- Issue published online: 20 OCT 2011
- Article first published online: 14 JUN 2011
- Manuscript Accepted: 17 APR 2011
- Manuscript Revised: 26 MAR 2011
- Manuscript Received: 6 OCT 2010
Funded by
- Microsoft “CRMC”
- NIH. Grant Number: RC2HG005806-02
- National Science Foundation (NSF). Grant Number: 0910812
- Abstract
- Article
- References
- Cited By
Keywords:
- visualization;
- MDS;
- GTM;
- interpolation;
- semantic web
SUMMARY
Visualization of large-scale high dimensional data is highly valuable for data analysis facilitating scientific discovery in many fields. We present PubChemBrowse, a customized visualization tool for cheminformatics research. It provides a novel 3D data point browser that displays complex properties of massive data on commodity clients. As in Geographic Information System browsers for Earth and Environment data, chemical compounds with similar properties are nearby in the browser. PubChemBrowse is built around in-house high performance parallel Multi-dimensional scaling and Generative topographic mapping services and supports fast interaction with an external property database. These properties can be overlaid on 3D mapped compound space or queried for individual points. We prototype the integration with Chem2Bio2RDF system using SPARQL endpoint to access over 20 publicly accessible bioinformatics databases. We describe our design and implementation of the integrated PubChemBrowse application and outline its use in drug discovery. The same core technologies are generally applicable to develop high performance scientific data browsing systems for other applications. Copyright © 2011 John Wiley & Sons, Ltd.

1532-0634/asset/olbannerleft.gif?v=1&s=a4e4e145787de94e1d91eaab3c8c29d8a9d96a26)