Large-scale objective phenotyping of 3D facial morphology


  • For the Deep Phenotyping Special Issue


Abnormal phenotypes have played significant roles in the discovery of gene function, but organized collection of phenotype data has been overshadowed by developments in sequencing technology. In order to study phenotypes systematically, large-scale projects with standardized objective assessment across populations are considered necessary. The report of the 2006 Human Variome Project meeting (Cotton et al, 2007) recommended documentation of phenotypes through electronic means by collaborative groups of computational scientists and clinicians using standard, structured descriptions of disease-specific phenotypes. In this report, we describe progress over the past decade in three-dimensional (3D) digital imaging and shape analysis of the face, and future prospects for large-scale facial phenotyping. Illustrative examples are given throughout using a collection of 1,107 3D face images of healthy controls and individuals with a range of genetic conditions involving facial dysmorphism. Hum Mutat 33:817–825, 2012. © 2012 Wiley Periodicals, Inc.


Facial dysmorphism plays an important role in the diagnosis of genetic conditions and is often the first clue to a diagnosis. The development of the face is intricately linked to that of the brain [Cordero et al., 2011; Marcucio et al., 2011] and also the heart [Hutson and Kirby, 2003; Tzahor, 2009] through the migration of neural crest cells. Hence, phenotypic descriptions of the face feature prominently in case reports of children with developmental delay, in genetics textbooks, and in electronic databases such as OMIM and the London Dysmorphology Database. In order to address issues of imprecision and inconsistency in recording a phenotype, international experts in dysmorphology have recently developed standardized terminology [Biesecker and Carey, 2011]. Manual techniques for recording facial morphology as pioneered by Farkas are low cost, simple to undertake, and relatively noninvasive [Allanson, 1997; Farkas, 1998]. However, they require substantial patient cooperation, recalls for missed measurements, and take longer than is compatible with large scale facial phenotyping. In contrast, two-dimensional (2D) and three-dimensional (3D) digital photography offer rapid capture of facial images, almost permanent retention, and the opportunity for repeated measurement without the subject's presence. Photography is less invasive than manual anthropometry but still requires some patient cooperation and operator skill to capture high-quality images consistently. In anticipation of anthropometry, cooperative subjects have sometimes been annotated with anatomical landmarks prior to imaging—especially where palpation is needed to guarantee accuracy of landmark placement [Aynechi et al., 2011; Sforza et al., 2005]. Unfortunately, measurements derived directly from single 2D images are adversely affected by projection distortion and pose. 3D images compensate for such inadequacies because they are essentially independent of pose and can be inspected from any viewpoint. In this report, we describe progress over the past decade or so in 3D digital imaging and morphometric analysis of the face, and future prospects for large-scale facial phenotyping.

3D image Capture and Preparation before Analysis

Photogrammetric and laser scanning devices, the two most commonly used technologies, capture meshes of tens to hundreds of thousands of 3D points on a human face. The triangulated mesh of points constitutes a 3D surface. The fineness of the mesh, speed of image capture, surface coverage, and ease of use depend on the underlying technology employed and the features of the individual device. Early laser scanners required the subject to be rotated or the scanner moved relative to subject. The associated motion introduced serious artifacts, for example, ridges on the captured surface. Current laser devices still require the subject to hold a pose for longer than is suitable for subjects with communication difficulties or impulsive behavior affecting cooperation, pose, and facial expression. Photogrammetric devices, in contrast, capture images in a few thousandths of a second and, with practice, operators can even anticipate when a moving face is within view of the camera. Accuracy and consistency of different devices, practical guides to capture, and the reproducibility of manual landmarking are now well documented [Aldridge et al., 2005; Gwilliam et al., 2006; Heike et al., 2010; Kau et al., 2005; Weinberg et al., 2006; Wong et al., 2008].

Excellent face and skeletal surfaces can be segmented straightforwardly from computed tomography (CT) images of the head. In theory, magnetic resonance (MR) images should also be a good source of face surfaces. Unfortunately, head MR capture protocols are typically focused on the brain and much of the face surface is omitted unless specifically requested. Head restraints used in both MR and CT imaging distort soft facial tissues as does respiratory apparatus if anesthesia has been necessary for difficult subjects. However, for human face–brain studies, for example, the underlying surface of a 3D face photograph can be registered with a MR-derived face patch and hence be coregistered with segmented brain components (Fig. 1A). Cone-beam CT images have been used to coregister skull morphology with 3D facial photographs in a similar fashion [Cheung et al., 2011]. 3D ultrasound images need much more sophisticated preprocessing before usable craniofacial features and measurements are retrievable [Tsai et al., 2011].

Figure 1.

A: 3D face surface registered with brain segmented from MR image of same individual. B: Face surface annotated with 22 landmarks. C: Heat map comparison of average nose shape for Wolf–Hirschhorn syndrome and controls showing lateral expansion (blue, left; red, right) of nasal cartilages as depicted in D. D: Cartoons of nasal bone and cartilages. E: Face signatures of female individuals with Williams syndrome (WS). F: Scatter plot of face signature weights of female with WS normalized against both controls and against WS. G: Histogram of face signature weights of females with WS and female controls. H: Face signatures of outliers normalized against controls (row 1) and WS (row 2).

Whatever the imaging modality, captured face surfaces will typically require annotation with anatomical landmarks. This can be semiautomated for homologous anatomical locations but will be less accurate near poorly defined features or when faces are extremely dysmorphic [Asthana et al., 2011]. From landmarks, straightforward linear and angular measurements can be derived, but for more subtle curvilinear or surface-based shape analysis, it is necessary to map larger numbers of densely corresponded points as quasi-landmarks across a set of face surfaces. Dense surface modeling techniques [Hutton et al., 2003] induce tens of thousands of corresponded points from as few as 22 individual landmarks (Fig. 1B). During this induction, nonrigid registration of the face surfaces pulls them close together, rather like rubber masks, using a technique called thin-plate spline warping. Image registration of one form or another will be an essential component of any phenotyping based on collections of digital images [Rueckert and Schnabel, 2011]. Atlas-based approaches, typically used in human brain MR or animal model MR/micro-CT studies, employ deformable registration of a 3D volume. They avoid landmarks altogether and can automatically derive densely corresponded points across surfaces of interest or, of course, throughout an entire MR/CT volume [Kippenhan et al., 2005; Leung et al., 2011].

The paucity of large collections of normative facial morphology, especially in non-Caucasian populations, is beginning to be addressed but not in a systematic or coordinated fashion. The FaceBase consortium, funded in the USA by National Institutes of Health/National Institute of Dental and Craniofacial Research, has established a central data management and integrated bioinformatics hub to support craniofacial research [Hochheiser et al., 2011]. One of the 10 FaceBase constituent projects will construct a normative repository of 3D human face images and DNA samples for 3,500 healthy Caucasian individuals between 3 and 40 years. The landmark-derived measurements and single nucleotide polymorphism (SNP) data will be focused on mid-facial morphology and along with the raw images will be made available through the FaceBase repository. The UK “Avon Longitudinal Study of Parents and Children” has recently published an assessment of laser scans of the faces of 4,747 British Caucasian school children imaged at the age of 15.5 years [Toma et al., 2011]. Normative values exist for relatively few ethnic control groups and for a small number of dysmorphic syndromes [Farkas, 1998; Ferrario et al., 2005]. Recently published collections of normative facial dimensions for non-Caucasian populations using landmarks or direct anthropometry include a longitudinal study of facial growth of 458 Colombian mestizos from age 6 to 17 years [Arboleda et al., 2011] and smaller studies of Asian populations [Cheung et al., 2011; Ngeow and Aljunid, 2009]. A large survey of the facial form of 3,000 Chinese adults was motivated by the ergonomic design of respiration masks [Du et al., 2008; Luximon et al., 2011].

The security and criminal investigation industries have stimulated private and public collections of face images, both 2D and 3D, for use in research and product development of face recognition systems. In medically oriented phenotyping, the aim is to classify individual faces to a group with a homogeneous condition in support of diagnosis or to identify subgroups with similar characteristics in pursuit of genotype–phenotype correlations. In industrial face recognition, the primary aim is to find a match for a target among a database of known individuals, for example, to vet access or to identify felons. Until relatively recently, much research and certainly most commercial face recognition systems used 2D images, given the absence of cheap and compact 3D cameras or the requirement in crime detection to capture images serendipitously or covertly. Pose variation and lighting in 2D images proved problematic and performance has been disappointing. With the advent of 3D cameras, recognition accuracy has improved considerably with specific benefits for medical use of automated conversion of 2D images into 3D [Blanz and Vetter, 2003] to aid diagnosis [Learned-Miller et al., 2006] and automated landmarking to support large-scale phenotyping [Asthana et al., 2011]. Anthroface 3D [Gupta et al., 2010] combined classical anthropometry and sophisticated face recognition algorithms to perform impressively on the publicly available Texas 3D Face Recognition Database (http://live.ece.utexas/research/texas3dfr) of 1,149 2D and 3D facial images of 118 adults.

Average Facial Dysmorphism in Homogeneous Groups

Using a dense set of corresponded points, the average surface of a collection of faces is easily computed. If the surface texture or appearance of faces is available, as with 3D photographs, the visualization is very realistic but will reflect variation in lighting. Such shape-only and combined-shape-appearance visualizations can also be produced for smaller face patches covering perinasal, perioral, or periorbital regions. A particularly informative curvilinear shape is the mid-line profile, which, although a tiny fraction of the face surface, simultaneously captures upper face features (potentially linking to forebrain development), outward growth of the mid-face (reflecting influence of neural crest cell migration on bizygomatic arch, nasal bone, and cartilage growth), and perioral structures (philtrum, lips, and mandible orientation). Figure 2 (columns 1 and 2) contains portrait and profile views of the average surface shape of several homogeneous groups of faces. Columns 3 and 4 include combined shape and appearance average faces for the same conditions, and an age- and sex-matched control average face is shown in Column 5. Provided there are sufficient 3D images, the average face is not dominated by an individual face of the original set and is an excellent visualization of facial phenotypes of homogeneous groups such as microdeletion syndromes [Bhuiyan et al., 2006; Cox-Brinkman et al., 2007; Hammond et al., 2004, 2005; Kau et al., 2006; Shaweesh et al., 2004].

Figure 2.

. Columns 1 and 2: Portrait and profile views of average face shape. Columns 3 and 4: Portrait and profile views of average faces combining appearance and shape. Column 5: Portrait of control mean matching ethnicity, sex, and age of means in columns 1 and 2. Columns 6 and 7: Heat map comparison of affected mean to control mean matched as in column 5. Each row corresponds to a different homogeneous affected group: ASD, autism spectrum disorder; BBS, Bardet–Biedl syndrome; FX, Fragile X syndrome; SMS, Smith–Magenis syndrome; WHS, Wolf–Hirschhorn syndrome; WBS, Williams syndrome.

Another commonly used quantitative shape comparison of face surfaces is a color heat map reflecting location difference of the densely corresponded points, typically normal to the face surface but also parallel to lateral, vertical, and depth-wise axes. Columns 6 and 7 of Figure 2 show heat maps for average faces of affected group means using a scale unique to each comparison. Red/green/blue coloring indicates where the affected average face shape is contracted/coincident/expanded with respect to an age–sex–ethnicity-matched average control. Although 3D photogrammetry is noninvasive, Figure 1C and D demonstrates that subtle differences in soft tissues of the face due to underlying cartilaginous or bony shape differences are sometimes detectable in such heat maps [Hammond et al., 2012]. The heat map comparison in Figure 1C required one of the means to be scaled to the other because individuals with Wolf–Hirschhorn syndrome (WHS) have significantly reduced growth. Otherwise, the heat map comparison would have been predominately red, reflecting the diminutive size in WHS. In landmark-based studies using classical morphometric methods, geometric scale is taken out in the application of generalized Procrustes analysis. In dense surface modeling analysis, size is retained for discriminatory studies, and scaling is used appropriately when the focus is on shape, for example, when trying to understand differences in embryological development. In a recent study of fetal alcohol spectrum disorder, size was retained in discrimination studies aimed at supporting clinical diagnosis, as there is no secure marker for the condition and growth delay is an important feature of classical fetal alcohol syndrome. Scaling was applied in the comparison of subgroup means to demonstrate similarity and subtlety of shape difference across the fetal alcohol spectrum. Thus, it is important to consider both shape-only and size-and-shape analyses in facial phenotyping.

Perhaps the most revealing visualization of shape difference is a morph, or rapidly interpolated image sequence, between two surfaces. A collection of morphs between affected and control means is available online (∼sejjmfj). The morph between the Williams syndrome (WS) and control means demonstrates periorbital fullness, a shorter nose, temporal narrowing, fullness of the lips, and backward rotation of the mandible. For 22q11 deletion syndrome (22Q11DS), the observable differences are malar flattening, hypertelorism, smaller nares, smaller ears, backward rotation of the mandible, and slight upward and outward arching of the upper lip. Morphs comparing average faces of different groups are likely to be very useful in clinical training. A 3D morph of an individual's face to an average ethnicity–sex–age-matched control can be truly revelatory of subtle features that are undetectable in a static 2D image viewed with the unaided eye.

Face Signature—Normalized Facial Dysmorphism

Previously, a minimum of 60 individuals with a homogeneous disorder were used to compute an average face so as to avoid undue influence of constituent faces. Such a number may not be easily recruited for rare disorders. Moreover, in some conditions, the facial phenotype might be more heterogeneous than in microdeletions such as WS and WHS and so an average face is less meaningful. Both of these situations arose in a recent study of the very rare condition, fibrodysplasia ossificans progressiva (FOP). There, instead of computing an average FOP face, the notion of face signature was introduced where each face was normalized against healthy controls [Hammond et al., 2012]. Running means of 50 contiguously aged control faces were computed to provide (commonly aligned) ethnicity-matched, same-sex, and approximately same-aged reference faces. For each of 25,000 points on a subject's face, its displacement along the surface normal from the corresponding point on the average of the matched controls was normalized with respect to analogous displacements at the same point on faces of the matched controls. Thus, facial signature delineates regions of an individual's face where difference from the matched comparison group is statistically significant. When computing a face signature of an individual, their face is obviously omitted from the matched group used in the normalization.

The signature weight of an individual face, the square root of the sum of the squared normalized differences for all densely corresponded points, defines a relatively crude but useful estimate of the facial dysmorphism of an individual. Obviously, unusual facial expression will have a deleterious effect on this measure so it may be necessary to compute an average signature weight for several images of an individual where this is suspected.

Figure 1E shows face signature heat maps for 48 females with WS. The extremes of the red-blue heat map scale are “less than −3” and “greater than +3” standard deviations. A visual inspection of these face signatures demonstrates some of the well documented facial characteristics of WS: bitemporal narrowing (shown as red on the temples), periorbital fullness (blue around the eyes), malar flattening (red in the mid-face), full lips (blue), retrognathia (red on the chin). The distribution of signature weights in Figure 1F predictably demonstrates the greater facial dysmorphism in WS compared with healthy controls.

Large-Scale Face Phenotyping Using Signature Graphs

Visual inspection of face signatures detects dysmorphic features in individual faces and delineates trait variation and relative occurrence in a group of faces. Of course, visual inspection of a large number of face signatures would be prohibitively time consuming without some automated method for picking out similarities and differences. An automated partial ordering of face signatures is possible using a simple metric, face signature difference (FSD), defined as the Euclidean distance between the vectors representing the normalized differences across the densely corresponded points. A face signature graph can then be constructed with a set of face signatures as its vertices. A directed edge is drawn from each signature to another, possibly not unique, signature with the smallest FSD from the first, or equivalently the most similar dysmorphism to the first. The length of an edge between two vertices is the FSD between them. The shorter the edge, the more similar is the nature of the facial dysmorphism of the linked faces. The existence of an edge between signatures A and B does not guarantee significant similarity between the underlying faces, only that B's dysmorphism (normalized difference from its matched comparison mean) is more similar to that of A than any other face in the set. This single linkage mechanism generates a set of disjoint subtrees or clusters of linked face signatures that partitions the dataset into sets of faces with similar dysmorphism relative to the comparison cohort. Note that faces of different ages and/or sex but with similar signature are potentially part of the same connected subtree or cluster. Finally, it is natural to link two signature clusters by an edge linking the two closest signatures in the different clusters. If applied recursively, this aggregates clusters into superclusters and eventually superclusters into a fully connected face signature graph. A more formal description of face signature graphs is given in the Supporting Information.

Figure 4A shows a face signature graph for a small group of female individuals with WS. The graph has six clusters, which are interlinked as shown in the small inset graph. The signatures in the upper part of the graph have a predominantly red-green hue reflecting bitemporal narrowing and malar flattening. In cluster 4, a blue upper lip is more evident, reflecting the longer philtrum and prominent upper lip, the latter due to a more open bite and retrognathia. Signatures that end up as leaf nodes of the graph will typically be outliers with the strongest facial dysmorphism. For example, the signatures in clusters 5/6 reflect the generally much larger/smaller nature of the associated faces compared with others (see later section on identifying atypical signatures).

Figure 3.

Binary coloring of face signature graphs for the same set of controls and individuals with one of four syndromes. Bootstrap estimates of the mean dispersions ± 95% margin of error (1.96 × SD/√n) are expressed as a quotient: BBS:CTRL = 0.713 ± 0.004:0.051 ± 0.001 2Q11DS:CTRL = 0.875 ± 0.003:0.334 ± 0.006 WHS:CTRL = 0.356 ± 0.004:0.004 ± 0.000 WS:CTRL = 0.571 ± 0.008:0.086 ± 0.002. BBS, Bardet–Biedl syndrome; 22Q11DS, 22q11 deletion syndrome; WHS, Wolf–Hirschhorn syndrome; WS, Williams syndrome; CTRL, control. A:BBS; B:22Q11DS; C:WHS; D:WS

Figure 4.

Face signature graphs for female individuals with Williams syndrome normalized against controls (A) and female individuals with Williams syndrome (B).

In a signature graph containing multiple subgroups (such as WS and controls), it is difficult to identify subgroup connectivity. A simplified version of a signature graph uses color-filled circles to denote membership of subgroups. Figure 3D shows a signature graph for a combined group of controls (empty circles) and individuals with WS (filled circles), all normalized against healthy controls. The way affected signatures are dispersed among controls reflects both homogeneity and severity of their facial dysmorphism. The greater the homogeneity, the more the affected group forms large sets of same colored, linked vertices. The more severe the dysmorphism, the more likely affected signatures will be located peripherally as leaf nodes. Figure 3 shows analogous color-coded signature graphs for a fixed set of controls combined with different homogeneous genotypes. 22Q11DS (Fig. 3B) shows the greatest mixing of control and affected individuals, reflecting the milder nature and greater heterogeneity of its associated facial dysmorphism. At the other extreme, WHS shows little intermingling of control and affected signatures, reflecting the much reduced facial growth and the greater degree and homogeneity of the associated facial dysmorphism. Besides growth delay, obesity and undernourishment can cause an individual to be an outlier in a signature graph. With current trends in obesity in developing countries and poor nutrition in others, body mass index (BMI) should be considered as a confounding variable. This can be visualized by varying the shading density of the filled vertices representing signatures according to BMI. The same technique would also enable cognitive scores such as IQ to be overlaid.

Comparing the Homogeneity and Degree of Dysmorphism of Facial Phenotypes

In a binary colored form of a signature graph where there are two underlying categories, for example, background and target, the “pockets” of connected signatures of the same color form two separate partitions, one for each category. The size and number of these pockets are influenced by how separate or intermingled the background and target datasets are in the graph. This is influenced by the homogeneity and degree of difference of the target facial phenotype from that of the background used in the normalization. The “disorder” associated with such partitions can be quantified by defining dispersion in the form of a Shannon-like entropy measure [Sethna, 2007]:

equation image

where P1,…,Pk partition the control (or the target) dataset of size n. If the target or control subgroup stays totally connected in a signature graph then its associated partition has one member, namely the subgroup itself. Hence, k = 1, |P1| = n, and its dispersion, dispersion({P1}), is 0. This almost happens, for example, to the control subset in the control–WHS signature graph when only one control is drawn into the WHS subset, very much reflecting the extreme nature of the WHS facial phenotype. On the other hand, if in the signature graph, each individual in a subgroup is only ever connected to an individual of a different color, that is, the subgroup is partitioned into a set of singletons, k = n, each |Pi| is 1 and its dispersion, namely dispersion({{s1},..,{sn}}), is 1. This would suggest that the facial phenotype is very heterogeneous.

The dispersion of a background control group reflects the dysmorphism of a target-affected group in that the more it is dispersed in the signature graph, the more background-like the target subset must be. Similarly, the dispersion of a target-affected group reflects the homogeneity of the constituent signatures in that lower dispersion indicates greater intraconnectivity in the target group in the signature graph. It is informative, therefore, to report the disorder and control dispersions together in the form of a quotient. For the examples of Figure 3, dispersion quotients and 95% CIs are estimated from iterative random

BBS:CTRL = 0.713 ± 0.004:0.051 ± 0.001 22Q11DS:CTRL = 0.875 ± 0.003:0.334 ± 0.006

WHS:CTRL = 0.356 ± 0.004:0.004 ± 0.000 WS:CTRL = 0.571 ± 0.008:0.086 ± 0.002

The dispersion quotient for 22Q11DS, for example, confirms that its facial phenotype is more heterogeneous and less dysmorphic than both WS and WHS.

All face signature graphs are drawn using the GraphViz software [Ellson et al., 2002].

Identifying Atypical Facial Phenotypes Using Face Signatures

In a previous 3D face study, a child with an atypical facial phenotype provided important evidence to support the hypothesis that the gene GT2FIRD1 played a role in the facial dysmorphism of WS [Tassabehji et al., 2006]. Normalizing the face shape of affected individuals against a collection of individuals with the same genotype can identify those with an atypical phenotype. For example, corresponding to the signature graph in Figure 4A where controls form the comparison group is another (Fig. 4B) where affected individuals form the comparison group for normalization. This signature graph places individuals with a typical WS facial phenotype at its core (e.g., cluster 3) and those with more atypical or exaggerated WS features at its periphery (e.g., cluster 6).

A scatter plot of signature weights for WS females normalized against both controls and a large set of WS females highlights the same six atypical individuals (Fig. 1F) who find themselves at the periphery of both signature graphs. Figure 1H shows both forms of their face signatures. Two individuals are highlighted as control outliers because they have WS facial features and have very small faces. They are only just on the periphery of the WS normalization. In contrast, the other four individuals are not considered as control outliers but are very definitely WS outliers. Therefore, they may be of interest in phenotype–genotype studies.

Classification of Facial Phenotypes Using Landmark-Based Anthropometry

For dysmorphic syndromes with known genetic causes, molecular analysis is the appropriate way to confirm a clinical diagnosis. Where definitive testing is not available, a patient's facial appearance may suggest multiple possibilities for a diagnosis or perhaps none at all. How might digital models of face shape help to classify an individual's facial phenotype? In general, single linear facial measures are unlikely to discriminate well between controls and a syndrome or between different syndromes. Multiple measurements, following normalization, can be combined to determine a craniofacial index of dysmorphology to give an average profile for each syndrome against which an individual can be compared [Ward et al., 1998]. Combining measures provides a richer description of dysmorphology but the loss of the associated 3D geometry ultimately limits their potential. For example, philtrum length and inner canthal separation might be useful discriminators in isolation or in tandem. It is likely, however, that greater discrimination is achievable using the local geometry, the 3D juxtaposition, of the landmarks delimiting the measures.

Landmarks on 3D face surfaces and derived measurements found no significant difference in facial asymmetry between controls and syndrome-affected individuals [Shaner et al., 2000]. This is not surprising because the 30 syndrome-affected subjects were of mixed ethnicity and affected by one of 18 different conditions. Landmark-based analyses have established strong discriminating features in a series of elegant studies of male–female and control–schizophrenia face shape differences [Buckley et al., 2005; Hennessy et al., 2004]. These morphometric studies employ a statistical analysis technique, principal component analysis (PCA), in order to compact the number of variables defining landmark positions to a smaller set of principal components (PCs) or modes of shape variation. For example, 24 3D landmarks use 72 coordinate values to record a face. PCA revealed that just three modes explained the most discriminating face shape differences [Hennessy et al., 2006].

Euclidean distance matrix analysis (EDMA) uses interlandmark distances to compare the mean shape of groups [Lele and Richtsmeier, 2001; Richtsmeier et al. 2002]. An object's shape is represented by the matrix of linear distances between all possible landmark pairs, resulting in n(n − 1)/2 linear measurements for n landmarks. The matrix of measurements for each individual is scaled, and an average shape matrix for a group is calculated. A comparison of shape is then based on arithmetic differences between the average shape matrices of two groups. EDMA has been used successfully in a wide range of studies: shape analysis of human and animal skulls [Du et al., 2010; Perlyn et al., 2006], a brain study of families affected by cleft lip and palate [Weinberg et al., 2009], and in a recent facial phenotyping study of children with autism [Aldridge et al., 2011].

An advantage of studies that use only landmarks to derive linear or angular measurements is that the associated statistical analysis is straightforward. However, there are two major drawbacks. First, the detection of shape difference is restricted by the positioning of landmarks. There may be subtle curvilinear or surface-based shape differences in regions where reliable landmark placement is impossible. Second, the visualization of comparisons of the different juxtapositions of landmarks is limited to wireframe-like diagrams that are difficult to relate to anatomical features.

Classification of Facial Phenotypes Using Densely Corresponded Points

As with landmark-based studies, differences between positions of densely corresponded points on a set of surfaces and those on their overall mean can be subjected to PCA. The term dense surface model (DSM) refers to the resulting set of PCs or PCA modes accounting for the shape variation in the surfaces included [Hutton et al., 2003]. Thus, a DSM is a form of point distribution model where a large number of densely corresponded surface points are induced or interpolated using a sparse set of manually placed landmarks. Each surface in a DSM can be reconstructed as a weighted linear sum of the PCs. The averages of the corresponding DSM weightings of any subset of the surfaces used to build the DSM synthesize the average of that subset. Similarity between two surfaces, or between a surface and subgroup average, can be computed as the square root of the sum of squares of differences of the DSM weightings.

It is possible to compute the proportion of face shape variation covered by a single DSM mode, and typically the modes are ordered in terms of increasing coverage. A DSM of individuals with a wide age range, by far the greatest amount of variation, often over 80%, reflects overall size or growth of the face. Subsequent modes may correspond to oval/round face shape variation (e.g., 5%) or differences in ear and mandible position (e.g., 2%). Depending on the mix of faces, the amount of coverage varies and additional shape complexities will be involved.

The later modes resulting from the PCA, those corresponding to extremely small amounts of shape variation, can be ignored and typically only those leading modes covering in total 95% to 99% are retained in a DSM. Frequently, as few as 50–100 modes are required to cover 99% of shape variation in a set of faces. Thus a face can be represented by an ordered sequence of 50 or so numbers. This is a huge data compaction, reducing the representation of a face surface from 75,000 parameters (25,000 3D points) down to 50 or so DSM mode values. A simple and intuitively appealing way to compare an individual face with two sets of faces is to calculate how close, in terms of the 50 or so mode values, that face surface is to the average face surfaces of each set. Whichever of the average faces is closest determines the classification of the individual. This so-called closest mean classification algorithm has achieved control-syndrome discrimination rates of between 85% and 95% for Cornelia de Lange [Bhuiyan et al., 2006], Noonan, Smith–Magenis, Velocardiofacial, and Williams syndromes. By considering face patches, it is also possible to identify regions of the face that are the most discriminating [Hammond et al., 2004, 2005]. Other pattern recognition techniques such as linear discriminant analysis and support vector machines are also employed in discrimination studies [Dalal and Phadke, 2007]. Once a DSM has been generated, the set of PCA representations of faces can be combined with behavioral, cognitive, and physiological data to investigate phenotype–genotype correlations.


Photogrammetric cameras capable of capturing accurate 3D face surfaces became commercially available in the late 1990s. Laser-based devices have a longer history. Over the past decade, techniques for analyzing 3D face surfaces have matured considerably. However, large-scale facial phenotyping cannot be fully effective without the systematic collection of substantially more normative data, especially in non-Caucasian ethnicities. Some individual research programs are addressing this issue in a direct fashion but there is little coordination internationally or across the wide range of interested parties in anthropology, orthodontics, developmental biology, ergonomics, genetics, maxillofacial surgery, molecular biology, and pediatrics. Truly portable, cheap, and compact devices would obviously encourage more data capture. Cost significantly inhibits medical genetics departments from having their own device to image patients routinely as part of clinical examination. Fortunately, 3D photogrammetric cameras are increasingly used by orthodontists, and by maxillofacial and plastic surgeons, for treatment planning and for posttreatment evaluation. Clinical geneticists could benefit if dental and surgical colleagues agreed to house their 3D cameras in a shared medical photographic facility.

There are a variety of methods and software tools available now for shape analysis, for example, Klingenberg's MorphoJ (, and there are useful links at SUNY Stony Brook ( Most of the techniques described in this article are applicable to shapes other than that of the face as well as to species other than human. With the availability of tomographic data, more mixed studies of morphology of face, skull, and brain should be possible with coanalysis of behavioral, cognitive, and physiological phenotypic data, as well as paired studies of homogeneous patient groups and associated animal models [Tobin et al., 2008].

Where image acquisition is limited by rarity or recency of discovery of a genotype, normalization against appropriate comparison groups is effective in delineating small group and individual facial dysmorphism with a view to identifying endophenotypes and atypical individuals. If an affected cohort contains a facial endophenotype then it should be identifiable in the vertex partitioning induced by a signature graph and thus could help identify individuals who are phenotypically similar and with a common genetic etiology. Dual normalization against control and affected groups isolates atypical phenotypes as outliers in terms of signature weight or as leaf nodes of signature graphs. More generally, face signature graphs have the potential for organizing large numbers of individuals into a more coherent and ordered panorama of face shape variation from which traits and their relative occurrence can be determined. In studies of genotype–phenotype correlations, phenotypic or genotypic data can be overlaid as different, or variations in intensity of, vertex colorings of a signature graph. Individual clusters or larger hyperclusters of face signatures may highlight individuals with similar facial dysmorphism who could have a common genetic etiology.

The recognition of syndromes is not usually based on the presence of major malformations such as a cleft palate or heart defect, but on combinations of minor malformations and minor variants. Therefore, clinical experience and knowledge of normal ranges of morphological features continue to be essential for evaluating dysmorphic features. Furthermore, the identification of atypical individuals for phenotype–genotype correlation studies cannot succeed without the involvement of vigilant clinicians able to identify affected children who are inconsistent with expected behavioral or morphological phenotypes [Carey, 2006]. There is increasing awareness of the value of facial phenotyping even among disciplines where dysmorphology has not previously featured. For example, 3D facial phenotyping played a role recently in the detection of novel structural variants in patients with epilepsy [Kasperavičiūtė et al., 2011; Poduri and Lowenstein, 2011] and has recently been proposed as a tool for use alongside epigenetic investigations into twinning and developmental asymmetries [Baynam et al., 2011]. This is yet further motivation for stimulating the use of facial phenotyping beyond its natural home territory of clinical genetics and pediatrics.


The families and volunteers who agreed to have their faces imaged and the support groups and clinicians who provided face-scanning opportunities are gratefully acknowledged. The face signature graph techniques, developed by P.H. during a sabbatical hosted by Professor Andrew Wilkie (Wellcome Institute of Molecular Medicine, University of Oxford), have benefited from stimulating comments by Raoul Hennekam, Louisa Petchey, and Sanjay Sisodiya.

Disclosure Statement: The authors declare no conflict of interest and no financial interest in this study.