An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms

Authors

  • Samuel A. Lee,

    Corresponding author
    1. Infectious Diseases Section, Department of Medicine, Yale University School of Medicine, New Haven, CT, USA
    2. Infectious Diseases Section, Department of Medicine, VA Connecticut Healthcare System, West Haven, CT, USA
    • Infectious Diseases Section, VA Connecticut Healthcare System, 950 Campbell Avenue, Building 8 (111-I), West Haven, CT 06516, USA.
    Search for more papers by this author
    • These authors contributed equally to this work.

  • Steven Wormsley,

    1. Infectious Diseases Section, Department of Medicine, Yale University School of Medicine, New Haven, CT, USA
    Search for more papers by this author
    • These authors contributed equally to this work.

  • Sophien Kamoun,

    1. Department of Plant Pathology, The Ohio State University, Ohio Agricultural Research and Development Center, Wooster, OH, USA
    Search for more papers by this author
  • Austin F. S. Lee,

    1. Department of Mathematics and Statistics, Boston University, Boston, MA, USA
    2. Center for Health Quality, Outcomes, and Economic Research, Bedford VA Hospital, Bedford, MA, USA
    Search for more papers by this author
  • Keith Joiner,

    1. Infectious Diseases Section, Department of Medicine, Yale University School of Medicine, New Haven, CT, USA
    Search for more papers by this author
  • Brian Wong

    1. Infectious Diseases Section, Department of Medicine, Yale University School of Medicine, New Haven, CT, USA
    2. Infectious Diseases Section, Department of Medicine, VA Connecticut Healthcare System, West Haven, CT, USA
    Search for more papers by this author

Abstract

We sought to identify all genes in the Candida albicans genome database whose deduced proteins would likely be soluble secreted proteins (the secretome). While certain C. albicans secretory proteins have been studied in detail, more data on the entire secretome is needed. One approach to rapidly predict the functions of an entire proteome is to utilize genomic database information and prediction algorithms. Thus, we used a set of prediction algorithms to computationally define a potential C. albicans secretome. We first assembled a validation set of 47 C. albicans proteins that are known to be secreted and 47 that are known not to be secreted. The presence or absence of an N-terminal signal peptide was correctly predicted by SignalP version 2.0 in 47 of 47 known secreted proteins and in 47 of 47 known non-secreted proteins. When all 6165 C. albicans ORFs from CandidaDB were analysed with SignalP, 495 ORFs were predicted to encode proteins with N-terminal signal peptides. In the set of 495 deduced proteins with N-terminal signal peptides, 350 were predicted to have no transmembrane domains (or a single transmembrane domain at the extreme N-terminus) and 300 of these were predicted not to be GPI-anchored. TargetP was used to eliminate proteins with mitochondrial targeting signals, and the final computationally-predicted C. albicans secretome was estimated to consist of up to 283 ORFs. The C. albicans secretome database is available at http://info.med.yale.edu/intmed/infdis/candida/ Copyright © 2003 John Wiley & Sons, Ltd.

Ancillary