Research Article
Automatic target selection for structural genomics on eukaryotes
Article first published online: 5 MAR 2004
DOI: 10.1002/prot.20012
Copyright © 2004 Wiley-Liss, Inc.
Issue
1097-0134/asset/cover.gif?v=1&s=d817e79b67ba6cacf8bdcce1a819c04de300a7e3)
Proteins: Structure, Function, and Bioinformatics
Volume 56, Issue 2, pages 188–200, 1 August 2004
Additional Information
How to Cite
Liu, J., Hegyi, H., Acton, T. B., Montelione, G. T. and Rost, B. (2004), Automatic target selection for structural genomics on eukaryotes. Proteins, 56: 188–200. doi: 10.1002/prot.20012
Publication History
- Issue published online: 11 JUN 2004
- Article first published online: 5 MAR 2004
- Manuscript Accepted: 23 SEP 2003
- Manuscript Received: 3 JUL 2003
Funded by
- Protein Structure Initiative of National Institutes of Health. Grant Number: P50 GM52413
- Abstract
- Article
- References
- Cited By
Keywords:
- structural genomics;
- target selection;
- protein structure family;
- cluster;
- domains;
- proteome analysis
Abstract
A central goal of structural genomics is to experimentally determine representative structures for all protein families. At least 14 structural genomics pilot projects are currently investigating the feasibility of high-throughput structure determination; the National Institutes of Health funded nine of these in the United States. Initiatives differ in the particular subset of “all families” on which they focus. At the NorthEast Structural Genomics consortium (NESG), we target eukaryotic protein domain families. The automatic target selection procedure has three aims: 1) identify all protein domain families from currently five entirely sequenced eukaryotic target organisms based on their sequence homology, 2) discard those families that can be modeled on the basis of structural information already present in the PDB, and 3) target representatives of the remaining families for structure determination. To guarantee that all members of one family share a common foldlike region, we had to begin by dissecting proteins into structural domain-like regions before clustering. Our hierarchical approach, CHOP, utilizing homology to PrISM, Pfam-A, and SWISS-PROT chopped the 103,796 eukaryotic proteins/ORFs into 247,222 fragments. Of these fragments, 122,999 appeared suitable targets that were grouped into >27,000 singletons and >18,000 multifragment clusters. Thus, our results suggested that it might be necessary to determine >40,000 structures to minimally cover the subset of five eukaryotic proteomes. Proteins 2004;55:000–000. © 2004 Wiley-Liss, Inc.

1097-0134/asset/PROT_centre.gif?v=1&s=77b56b1f2cdaba74cb3bb149bd9b029cd8803cdb)