Automated learning of generative models for subcellular location: Building blocks for systems biology

Authors

  • Ting Zhao,

    1. Center for Bioimage Informatics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
    2. Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
    Search for more papers by this author
  • Robert F. Murphy

    Corresponding author
    1. Molecular Biosensor and Imaging Center, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
    2. Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
    3. Department of Machine Learning, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
    • Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA
    Search for more papers by this author

  • This work was presented at the XXIII Congress of the International Society for Analytical Cytology, Quebec City, Canada, 20–24 May 2006

Abstract

The goal of location proteomics is the systematic and comprehensive study of protein subcellular location. We have previously developed automated, quantitative methods to identify protein subcellular location families, but there have been no effective means of communicating their patterns to integrate them with other information for building cell models. We built generative models of subcellular location that are learned from a collection of images so that they not only represent the pattern, but also capture its variation from cell to cell. Our models contain three components: a nuclear model, a cell shape model and a protein-containing object model. We built models for six patterns that consist primarily of discrete structures. To validate the generated images, we showed that they are recognized with reasonable accuracy by a classifier trained on real images. We also showed that the model parameters themselves can be used as features to discriminate the classes. The models allow the synthesis of images with the expectation that they are drawn from the same underlying statistical distribution as the images used to train them. They can potentially be combined for many proteins to yield a high resolution location map in support of systems biology. © 2007 International Society for Analytical Cytology

Ancillary