Get access

The structural genomics experimental pipeline: Insights from global target lists

Authors

  • Nicholas O'Toole,

    1. Department of Biochemistry, McGill University, Montréal, Québec H3G 1Y6, Canada
    2. Montreal Joint Centre for Structural Biology, Montréal, Québec, Canada
    Search for more papers by this author
  • Marek Grabowski,

    1. Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia
    Search for more papers by this author
  • Zbyszek Otwinowski,

    1. Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas
    Search for more papers by this author
  • Wladek Minor,

    Corresponding author
    1. Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia
    • Department of Molecular Physiology, University of Virginia, 1300 Jefferson Park Avenue, Charlottesville, Virginia 22908, USA
    Search for more papers by this author
  • Miroslaw Cygler

    Corresponding author
    1. Department of Biochemistry, McGill University, Montréal, Québec H3G 1Y6, Canada
    2. Montreal Joint Centre for Structural Biology, Montréal, Québec, Canada
    3. Biotechnology Research Institute, National Research Council, Montréal, Québec, Canada
    • Biotechnology Research Institute, National Research Council, 6100 Royalmount Avenue, Montréal, Québec H4P 2R2, Canada
    Search for more papers by this author

Abstract

Structural genomics (SG) initiatives are currently attempting to achieve the high-throughput determination of protein structures on a genome-wide scale. Here we analyze the SG target data that have been publicly released over a period of 16 months to assess the potential of the SG initiatives. We use statistical techniques most commonly applied in epidemiology to describe the dynamics of targets through the experimental SG pipeline. There is no clear bottleneck among the key stages of cloning, expression, purification and crystallization. An SG target will progress through each of these steps with a probability of approximately 45%. Around 80% of targets with diffraction data will yield a crystal structure, and 20% of targets with HSQC spectra will yield an NMR structure. We also find the overlaps among SG targets: 61% of SG protein sequences share at least 30% sequence identity with one or more other SG targets. There is no significant difference in average structure quality among SG structures and other structures in the PDB determined by “traditional” methods, but on average SG structures are deposited to the PDB twice as quickly after X-ray data collection. Proteins 2004. © 2004 Wiley-Liss, Inc.

Ancillary