Deep phenotyping for precision medicine


  • Peter N. Robinson

    Corresponding author
    1. Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, Berlin, Germany
    • Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
    Search for more papers by this author

  • For the Deep Phenotyping Special Issue


In medical contexts, the word “phenotype” is used to refer to some deviation from normal morphology, physiology, or behavior. The analysis of phenotype plays a key role in clinical practice and medical research, and yet phenotypic descriptions in clinical notes and medical publications are often imprecise. Deep phenotyping can be defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The emerging field of precision medicine aims to provide the best available care for each patient based on stratification into disease subclasses with a common biological basis of disease. The comprehensive discovery of such subclasses, as well as the translation of this knowledge into clinical care, will depend critically upon computational resources to capture, store, and exchange phenotypic data, and upon sophisticated algorithms to integrate it with genomic variation, omics profiles, and other clinical information. This special issue of Human Mutation offers a number of articles describing computational solutions for current challenges in deep phenotyping, including semantic and technical standards for phenotype and disease data, digital imaging for facial phenotype analysis, model organism phenotypes, and databases for correlating phenotypes with genomic variation. Hum Mutat 33:777–780, 2012. © 2012 Wiley Periodicals, Inc.

What is a Phenotype?

The word phenotype is used with many different meanings. In biology, the most widely accepted definition of phenotype is “the observable traits of an organism.” Although at least five definitions of phenotype are in use in the literature on biology [Mahner and Karys, 1997], we will here take the definition of phenotype in biology to mean the collection of observable traits of an organism, comprising its morphology, its physiology at the level of the cell, the organ, and the body, and its behavior, comprising even characteristics such as the gene expression profiles in response to environmental cues [Nachtomy et al., 2007]. In medical contexts, however, the word “phenotype” is more often used to refer to some deviation from normal morphology, physiology, or behavior, and this is the definition that we will use in this article. Perhaps, the single most important responsibility of physicians is to observe the phenotype of their patients, be it by taking a medical history or by means of a physical examination, diagnostic imaging, blood tests, psychological testing, and so on, in order to make the diagnosis.

This is one of the areas where medicine comes close to being a science: we make observations about the phenotype, derive a hypothesis (called “diagnosis”), and test our hypothesis by prescribing a certain regimen of treatment, which may or may not be the optimal one for our patient. In as much as medicine is a science, our goal should be to make our diagnostic hypotheses in as accurate and timely a fashion as possible. However, making the diagnosis can be one of the most challenging tasks for the physician, and this is especially the case for rare diseases, with currently over 8,000 named diseases and presumably many thousands more waiting to be discovered and classified. In a survey of eight relatively common rare diseases such as Marfan syndrome, it was found that 25% of patients waited from 5 to 30 years for a diagnosis and that the initial diagnosis was wrong in 40% of the cases [EURORDIS, 2007]. Although exact statistics are not available, it is safe to assume that the situation is even worse for most of the other, even rarer diseases. Major clinical problems result from delayed or inaccurate diagnosis including delayed treatment, unnecessary diagnostic procedures, and a psychological burden on patients and families because of persistent uncertainty about the cause and prognosis of their clinical problems.

The study of phenotype in medicine also comprises the complete and detailed understanding of the spectrum of phenotypic abnormalities associated with each disease entity. With this knowledge in hand, physicians are able to decide whether any given sign or symptom with which a patient presents is related to some underlying disease or is an isolated feature, a decision which may help to stratify treatments and make the correct prognosis. Finally, knowledge about the full spectrum of phenotypic abnormalities associated with a disease may help to prevent complications or at least recognize them at a sufficiently early stage that effective treatments are available.

Why “Deep” Phenotyping?

Many phenotypic descriptions in medical publications describe the phenotype in sloppy or imprecise ways. For instance, a description such as “myopathic electromyography” is used instead of describing the reasons for this diagnosis (which can include reduced duration and reduced amplitude of the action potentials, increased spontaneous activity with fibrillations, positive sharp waves, or a reduced number of motor units in the muscle). There is no way of knowing what exactly the authors observed in their patients, and it may be extremely difficult or impossible to compare studies from different centers on the basis of such descriptions. Likewise, descriptions such as “still walking 25 years after onset” to describe a neurological phenotype may be evocative in a certain sense, but are likely to evoke a different picture in different readers, depending on their experiences, knowledge, and imagination.

Similarly, many current gene mutation databases record little or no phenotype information beyond the fact that a disease was diagnosed in the person carrying the mutation. This sort of information is useful for a diagnostician who is writing a report on a mutation, and finds a report in the database stating that some mutation has been previously found in an unrelated patient, and thus has at least some suggestive evidence for pathogenicity. However, it does not help much at all in understanding the natural history of the disease, the spectrum of complications of the disease, or genotype–phenotype correlations—all very useful clinical information.

In this article, deep phenotyping will be defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described, often for the purposes of scientific examination of human disease.

Why Should We Care About Precise Analysis of Human Phenotype?

The analysis of the medical phenotype as defined above is quite simply the daily work of practicing physicians: get the medical history, perform a physical examination, blood and laboratory tests. Routine, you might say. So what is to be gained by a deeper and more precise analysis of the phenotype? One of the goals of personalized medicine is to classify patients into subpopulations that differ with respect to disease susceptibility, phenotypic or molecular subclass of a disease, or to the likelihood of positive or adverse response to a specific therapy. The related concept of “precision medicine,” whose goal is to provide the best available care for each individual, refers to the stratification of patients into subsets with a common biological basis of disease, such that stratified medical management is most likely to benefit the patients [Committee on the Framework for Developing a New Taxonomy of Disease, 2011]. All medically relevant disease subclassifications can be said to have a distinct phenotype, with the understanding that a medical phenotype comprises not only the abnormalities described above, but also the response of a patient to a certain type of treatment (e.g., responsiveness of seizures to valproic acid can be considered to be a phenotype of certain forms of epilepsy). However, responsivity to treatment is by no means the only or even the most important characteristic by which clinically relevant subclassifications can be identified. There are countless examples of clinically actionable items that were discovered by a precise phenotypic analysis (Table 1).

Table 1. A Selective List of Clinically Actionable Findings That Were Identified by Careful Phenotypic Analysis of Groups of Patients with the Indicated Rare or Common Diseases
DiseaseKey Phenotype or FindingClinical Action
Familial thoracic aortic aneurysms and dissections (ACTA2 mutation)Premature onset of coronary artery disease (CAD) and premature ischemic strokesMonitoring for CAD and occlusive cerebrovascular disease [Guo et al., 2009]
PsoriasisIncreased risk of myocardial infarctionClinical management of metabolic and cardiovascular health [Davidovici et al., 2010]
Gastric ulcerHelicobacter infectionAntibiotic treatment [Tan and Wong, 2011]
Marfan syndromeThe prevalence of obstructive sleep apnea (OSA) is considerably increased in subjects with Marfan syndrome and may be a risk factor for aortic root dilatation in Marfan syndrome.Polysomnography or related studies to screen for OSA [Kohler et al., 2009; Rybczynski et al., 2010]
Bardet Biedl syndromeHighly penetrant early kidney failure associated with subtype of BBS (BBS16) caused by mutations in SDCCAG8Intensified screening for renal disease in children with BBS16 [Schaefer et al., 2011]
AsthmaPhenotypic features such as sputum eosinophilia or an interleukin-13 signature (increased serum periostin)Sputum eosinophilia may predict response to corticosteroids [Lemiere et al., 2006]. Presence of high pretreatment levels of serum periostin predicts response to lebrikizumab in patients not responsive to inhaled glucocorticoids [Corren et al., 2011]
DiabetesVarious atypical clinical features may suggest a monogenic form of diabetes [Slingerland, 2006]Nonstandard treatment may be available for some monogenic forms of diabetes. For instance, sulfonylurea therapy may be superior to insulin for patients with diabetes caused by KCNJ11 mutations [Pearson et al., 2006]

Additionally, we would like to know if individual mutations are associated with particular clinical courses. For instance, for mutations in cancer genes such as BRCA1 or BRCA2, we would like to know what percentage of patients with a given mutation can be expected to develop breast or ovarian cancer by a certain age. Additionally, we would like to understand whether the phenotype of responsiveness to treatment in cancer can be related to certain underlying molecular pathologies in the way that activating mutations in EGFR can predict response to tyrosine kinase inhibitors in nonsmall-cell lung cancer [Gately et al., 2012]. Answers to questions such as these are not likely to be forthcoming without detailed analysis of phenotypes and clinical course. Today, it is generally impossible to predict whether patients with particular mutations associated with some rare disease will have a particularly mild or severe course. We are even further away from understanding what role genetic modifiers play in determining clinical course and response to treatments.

On the other hand, better and more detailed descriptions of clinical manifestations will be essential in future efforts to understand the morbid anatomy of the human genome. For instance, there is ample evidence to suggest that, in general, the location of a mutation within the affected protein is an important determinant of the effect of the mutation. Missense mutations are enriched on the interaction interfaces of proteins associated with the corresponding disorders, and the disease specificity for different mutations of the same gene can be partially explained by their location within an interface [Wang et al., 2012]. Therefore, deep phenotype data, combined with ever increasing amounts of genomic data, appear to have an enormous potential to accelerate the identification of clinical actionable complications, of disease subtypes with prognostic or therapeutic implications, and in general to improve our understanding of human health and disease.

Are We Ready for a Human Phenome Project?

The Mouse Phenome Project was launched a decade ago to promote new phenotyping initiatives under standardized conditions and to collect the data in a central public database [Maddatu et al., 2012], and similar efforts are underway for the zebrafish [Cheng et al., 2011], the rat [Mashimo et al., 2005; Serikawa et al., 2009], and other organisms such as Toxoplasma gondii [Meissner and Klaus, 2009]. Driving all these projects is the conviction that a precise analysis of the phenotype is needed to take full advantage of findings from genetic and genomic in order to understand biology.

Calls for a Human Phenotype Project go back at least as far as 2003, when Nelson Freimer and Chiara Sabatti proposed an international effort to create human phenomic databases in a commentary in Nature Genetics [Freimer and Sabatti, 2003]. However, in the years between the publication of that commentary and this one, no truly comprehensive efforts toward understanding the human phenome have emerged that use standardized measures for capturing phenotypic manifestations of disease and correlating them to genotypes, environmental exposures, and response to treatment. With the current revolutions in DNA sequencing technologies likely to bring us previously unimaginable amounts of genotype data for patients with rare and common disease in the next decade, the need for a Human Phenotype Project has become all the more pressing. In fact, adequate methods for capturing and analyzing human phenotype data now appear as one of the major bottlenecks toward progress in our understanding of human genome biology.

In contrast to the situation a decade ago, computational tools are now being developed that are in the position to provide a foundation for a Human Phenotype Project. The authors of this special issue on the human phenome present approaches and solutions to many of the outstanding problems that will need to be tackled to enable a Human Phenotype Project. Many international groups in Human Genetics are beginning to realize the importance of the topic. For instance, the Human Variome Project [Cotton, et al., 2008; Cotton, et al., 2009], to which this author and many of the contributors to this special issue belong, has founded a Phenotype Interest Group that aims to capture international opinions on the topic and to develop recommendations and standards for capturing human phenotypic data and recording it in medical databases.

Standards are essential for data exchange in science and medicine; semantic standards ensure that descriptors (terms) with agreed-upon meanings are used to describe characteristics of patients or diseases in a consistent fashion. Ana Rath and Ségolène Aymé present recent work on developing a standardized rare disease classification that will become part of the International Classification of Diseases 11th Revision (ICD-11) [Rath et al., 2012]. The Elements of Morphology Consortium has developed standard terms and definitions for many aspects of dysmorphology in human genetics [Allanson et al., 2009; Carey et al., 2012]. The Human Phenotype Ontology (HPO), which was developed by the group of the guest editor, now has over 10,000 terms, each of which describes a phenotypic abnormality seen in human disease [Robinson et al., 2008]. The HPO includes cross-links to the Elements of Morphology definitions and provides a basis for computational analysis of the human phenotype by means of semantic links to disease genes and ontologies representing anatomy, biochemistry, pathology, small molecules, and other entities [Gkoutos et al., 2009]. Additionally, common data elements provide a standardized catalog of measurements, observations, and algorithms and disease descriptions (general and specific traits) that should be recorded in studies on the natural history or treatment of certain classes of disease. The PhenX Project is developing such measures in an effort to promote the ability to perform cross-study analysis [Pan et al., 2012]. Three-dimensional digital imaging and shape analysis of the face offer an innovative and promising approach to study phenotypes systematically, especially in genetic conditions involving facial dysmorphism [Hammond and Suttie, 2012].

Technical standards are required to link heterogeneous clinical data from different sources including clinical records, project databases, and public repositories. Consistent and intelligent structuring of this data enables efficient data access, exchange, and analysis, and approaches such as Observ-OM, an object model for any kind of phenotypic data, enable the access, sharing, integration, and archiving of complex phenotypic data [Adamusiak et al., 2012]. Much phenotypic information reflects abnormalities in anatomical structures, yet it can be quite difficult to visualize the interrelationships of anatomical structures to one another computationally. ApiNATOMY draws upon the topology of anatony ontology graphs to automatically lay out treemaps representing body parts as well as linked phenotype data [de Bono et al., 2012].

Current projects, such as the International Mouse Phenotyping Consortium, are generating phenotypic data and resources on an unprecedented scale, which will be extremely valuable to human genetics and medicine [Schofield et al., 2012]. This special issue includes two fascinating examples of how this kind of data can be used to address problems in human medicine, viz., for the prioritization of human disease genes [Chen et al., 2012] and for improving the understanding of human copy number variation diseases [Boulding and Webber, 2012].

The International Standards for Cytogenomic Arrays Consortium has promoted standards for chromosomal microarray analysis and phenotypes and currently collected data on over 28,500 cytogenomic array investigations [Riggs et al., 2012], and is thus one of the first examples of a Human Phenome Project covering a specific area of genetic medicine. The Personal Genome Project is aiming to enroll 100,000 informed participants from the general public who are willing to share their genome sequence and some personal and phenotypic information. Here, a prototype project involving metabolomic phenotyping coupled to the targeted analysis of a set of genes known to be involved in metabolic disturbances is presented [Thakuria et al., 2012]. Major efforts will be required to integrate global resources from research-oriented projects such as the International Mouse Phenotyping Consortium together with clinical genotype and phenotype data in order to achieve the full promise of precision medicine [Schofield and Hancock, 2012]. The new field of Knowledge Engineering for Health intends to bridge the gap between research and healthcare; the use of deep phenotype data is foreseen both to enable research based upon the practice and outcomes of clinical medicine, and also to guide decision making in stratified and personalized medicine contexts [Beck et al., 2012].

Finally, the changes brought about by new approaches to genomic diagnostics and precision medicine are likely to alter the way many fields of medicine are practiced. This is nowhere more true than in the field of human genetics and genomics. Although it is true that genomic diagnostics including exome and genome sequencing will make it much easier to make the correct diagnosis for patients affected by rare diseases, the many new challenges and opportunities offered by these technologies will probably make the field even more rewarding for clinicians [Hennekam and Biesecker, 2012].