A general model of G protein-coupled receptor sequences and its application to detect remote homologs

Authors

  • Markus Wistrand,

    1. Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden
    Search for more papers by this author
    • 1These authors contributed equally to this work.

  • Lukas Käll,

    1. Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden
    Search for more papers by this author
    • 1These authors contributed equally to this work.

  • Erik L.L. Sonnhammer

    Corresponding author
    1. Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden
    • Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden; fax: 46-8-337983.
    Search for more papers by this author

Abstract

G protein-coupled receptors (GPCRs) constitute a large superfamily involved in various types of signal transduction pathways triggered by hormones, odorants, peptides, proteins, and other types of ligands. The superfamily is so diverse that many members lack sequence similarity, although they all span the cell membrane seven times with an extracellular N and a cytosolic C terminus. We analyzed a divergent set of GPCRs and found distinct loop length patterns and differences in amino acid composition between cytosolic loops, extracellular loops, and membrane regions. We configured GPCRHMM, a hidden Markov model, to fit those features and trained it on a large dataset representing the entire superfamily. GPCRHMM was benchmarked to profile HMMs and generic transmembrane detectors on sets of known GPCRs and non-GPCRs. In a cross-validation procedure, profile HMMs produced an error rate nearly twice as high as GPCRHMM. In a sensitivity-selectivity test, GPCRHMM's sensitivity was about 15% higher than that of the best transmembrane predictors, at comparable false positive rates. We used GPCRHMM to search for novel members of the GPCR superfamily in five proteomes. All in all we detected 120 sequences that lacked annotation and are potentially novel GPCRs. Out of those 102 were found in Caenorhabditis elegans, four in human, and seven in mouse. Many predictions (65) belonged to Pfam domains of unknown function. GPCRHMM strongly rejected a family of arthropod-specific odorant receptors believed to be GPCRs. A detailed analysis showed that these sequences are indeed very different from other GPCRs. GPCRHMM is available at http://gpcrhmm.cgb.ki.se.

Ancillary