Dr. Bookstein is professor of statistics, professor of psychiatry and behavioral sciences (PBS), and scientific director of the Fetal Alcohol and Drug Unit (FADU), University of Washington (UW), Seattle, Washington, and professor of anthropology at the University of Vienna, Vienna, Austria. For a quarter of a century he has been identified with the development of methods for statistical analysis of data from medical images such as these, methods that seem to be scientifically more fruitful in application to issues of fetal alcohol medicine than anywhere else.
Dr. Streissguth, professor emerita of PBS, UW, was one author of the original 1973 paper in Lancet announcing the discovery of fetal alcohol syndrome. Ever since then, as founding director of FADU, she has been involved in research and clinical work on patients with fetal alcohol syndrome and fetal alcohol spectrum disorders and hopes that brain studies such as this will eventually enlighten clinical practice.
Dr. Connor is assistant professor of PBS, UW, and clinical director of FADU. He is a neuropsychologist with interests in cognitive, mental health, and neuroimaging assessment of individuals with prenatal alcohol damage.
Dr. Sampson is research professor of statistics and director of statistical consulting for the Department of Statistics, UW. He is a long-time collaborator with Professor Bookstein on applications of multivariate statistical methods in fetal alcohol research; also, he leads a research group on spatiotemporal modeling in environmetrics, focusing on air quality data.
Since 1973, it has become clear that exposure of otherwise normal human fetuses to high levels of alcohol damages a substantial number of the exposed brains in a wide variety of ways nowadays referred to collectively as the fetal alcohol spectrum disorders (FASDs). Often this damage can be seen directly in brain images obtained much later in life. This assertion actually summarizes two of the literature's claims at the same time: first, that there are differences between groups of damaged and undamaged people in the statistical distribution of single or multiple measurements of brain images (e.g., Riley et al.,2004); second, that decision rules based on these measurements can discriminate between people with and without FASD diagnoses to some clinically useful extent (e.g., Bookstein et al.,2002a). For the first kind of argument, we look at differences of average or covariance structure of vectors of variables between our groups, with the associated conventional statistical significance tests. To demonstrate the possibility of classification, we examine sensitivity, specificity, and accuracy instead. Behind either class of numerical tactics lies the broader domain of explanation. Over the same 30 years, reductionistic animal experiments have steadily elucidated the pathways that link the brain damage to the behavioral deficits that are so much easier to observe in a medical context.
In order to study the damage done by alcohol to one particular part of the human brain, the cerebellum, this article weaves together the two biometric themes, effect estimation and detection, with the third theme, the explanatory one. We are working with the same data resource we have mined for several years: conventional magnetic resonance (MR) brain images of 120 diagnosed FASD patients compared to those of 60 unexposed peers of the same age range and sex. From the images of the 180 cerebellums, computer-aided hand digitization extracted a biometric surface representation that can be exploited in several ways. In the most conventional two-dimensional view, the midsagittal section through vermis, there is hardly any discernible effect of alcohol at all. By contrast, statistical analysis of a full three-dimensional (3D) surface finds clear differences between unexposed and patient samples in both the size and the shape of the cerebellum. Closer examination of this 3D signal suggests the construction of a new 2D representation that conveys most of the information about the alcohol effect without the complex visualizations entailed in the three-dimensional analysis. This helpful simplification takes the form of a visual horizon or silhouette view of the cerebellum, a tactic that was popular a century ago in anthropometrics but that has not been used much since the onset of 3D computer vision applications in medical imaging.
The statistics of either the 3D surface or the 2D silhouette, analyzed by simple and powerful new morphometric methods, show that the biometrics of the cerebellum in FASDs are strongly consistent with the core finding, hypoplasia, from the animal experiments. An exploratory statistical analysis of allometry (the effect of size changes on shape) confirms that we have captured nearly all of the biometric signal by one simple explanation based on the typical FASD size deficit.
Thus, the argument of this article intermixes some science (the neuroteratology of alcohol) with some methodology (advances in the biometrics of size and shape, as well as innovations in the simplification of surfaces). The sections to follow alternate between these ingredients. We begin with a brief review of what is known about the effect of fetal alcohol exposure on the brain in general and the cerebellum in particular. After that, we show a method of digitizing the cerebellar surface with respect to an axis along the aqueduct that simplifies the statistical analyses on which we will be relying, without weakening them. Some preliminary findings about size measures of this structure suggest that its shape should be expressed using silhouettes instead of surfaces. When we do so, using the silhouette in a plane perpendicular to the organ's aqueductal axis, we see that the analysis of alcohol's effect on the size and shape of the cerebellum reduces to a simple and familiar statistical model for that view that can be interpreted directly in simple biological terms and simple two-dimensional diagrams. Returning to the biology of the problem, we identify the biometrical factor as a direct empirical operationalization of the primary observable biological effect of the alcohol teratogenesis (to wit, hypoplasia of brain tissue). If we pretend we did not know about the alcohol diagnosis, the FASD cerebellum appears to be just an inexplicably small cerebellum having the appropriate shape for its small size.
In the context of fetal alcohol studies, this interpretation of our finding bears implications for the nosology of FASDs, for prenatal monitoring, and for endophrenological studies of the way behavioral deficits arise from the insults to form. Our concluding discussion draws together some of these implications for detection of FASDs with more speculative comments about the implications of the finding for studies in other domains.
In 2002, this journal published our study (Bookstein et al.,2002a) of a different part of the brains of this same 180-subject sample. Those findings have had a substantial impact in domains rather far from the teratological context, including, for instance, the American system of capital punishment (Bookstein,2006). Nevertheless, that earlier article does not much overlap with this one (except in the conclusion that fetal alcohol exposure can be detected in the brain at much later ages), as the cerebellar findings here differ from the callosal findings of 2002 in several respects. The 2002 study found a difference only of variance; here there is a difference of average form as well. The 2002 study did not pursue any hypothesis of allometry, and the data set there was in effect two-dimensional (a nearly flat curve) to begin with. The present report, based on data from a different part of these brains, exploits a newly three-dimensionalized computer-aided digitizing technique, a recently invented multivariate statistical tactic, and a new respect for the virtues of simplification not only in the communication of quantitative anatomical findings but in the course of their generation.
NEUROTERATOLOGY OF ALCOHOL IN BRIEF
It is well known that alcohol is teratogenic and that the brain is the organ most vulnerable to its effects (for an overview of what is known in both animals and humans, see Streissguth,1997). Documented sites of damage, besides the cerebellum we study here, include hippocampus, basal ganglia, cortex, and corpus callosum. Animal studies show specific damage to glial cells and myelin, neurons, apoptotic mechanisms, and dopamine circuits, and clinical studies, both autopsy and in vivo, show many of the same regions and forms of damage in humans. Clearly the effects are widespread and can be demonstrated from early embryonic days. For a good current summary of the work on specific neurotoxic mechanisms, including the implications for prevention or intervention, see Chen et al. (2003), and for a general review of the domain, up-to-date except in respect of ignoring the publications from our group, see Riley et al. (2004) or Riley and McGee (2005). The majority of teams studying humans with prenatal alcohol damage are using MR images, but the only group attempting to convert these into a detection protocol seems to be our own (Bookstein et al.,2002a). Other modalities of imaging are under active study at this time (fMRI, SPECT, PET, diffusion tensor imaging), but the grouped effects have likewise not yet been converted into useful diagnostic protocols.
Effects on brain function can be detected at lower levels of exposure than effects on structure. A wide variety of these neurobehavioral effects, too, are known from experimental animal models, from human studies of diagnosed patients with FASDs, and from longitudinal prospective studies of human subjects at a range of exposure levels. The effects on the typical course of a human life are primarily consequences of the behavioral problems; in modern societies, these effects are devastating. The linkage between the alcohol-induced brain anomalies and the behavioral deficits has been demonstrated via correlational analysis in animals, and MRI studies in humans are beginning to explore these possibilities as well (Sowell et al.,2001; Bookstein et al.,2002b).
The Cerebellum per se
One may choose to investigate the cerebellum in FASDs for a variety of reasons: because it is known to be entailed in a steadily increasing number of cortical pathways, because it is high on the list of brain parts known to be affected by prenatal alcohol exposure (e.g., West,1986), or just because it is so easy to see in MR images. Within the literature of animal experimentation, the cerebellum is unusual in the concentration of reports on one single mechanism of teratological damage, namely, death of Purkinje neurons (e.g., Dikranian et al.,2005) with consequent disruption of cerebellar cortical structure in ways that could explicate the delayed motor development and other behavioral sequelae that are seen in humans with FASD (Connor et al.,2006). The principal theme of all the experimental results is specific cell loss, consistent with the human findings regarding brain size shortfall and also with the findings of the present analysis regarding the simple statistical structure of the group differences. Indeed, the current animal research studies, which all begin by assuming the facts of cell death and consequent loss of mass, generally aim at exploring a variety of reasonable mechanisms by which that damage obtains: for instance, disruption of surface neurotrophin regulators (Light et al.,2002), disruption of the cyclin-dependent kinase system (Li et al.,2002), disruption of the nitric oxide pathway (Bonthius et al.,2004), glial cell maturation (Gonzáles-Burgos and Alejandre-Gómez,2005), or disruption of apoptosis-regulating proteins (Siler-Marsiglio et al.,2005). While our work cannot speak to any of these pathway models, we will confirm the simplicity of the principal experimental finding, loss of cerebellar mass.
SUBJECTS AND IMAGE-DERIVED DATA
Our earlier publication in this journal (Bookstein et al.,2002a) has already reviewed the sample of 180 adolescents and adults and the brain images by which we assess the neuroanatomical damage to the 120 FASD subjects, and so we can be relatively brief here. From a large registry of persons of all ages diagnosed with fetal alcohol syndrome (FAS) or fetal alcohol effects (FAEs) through the year 1998, we selected 15 of each diagnosis for each sex separately in two age ranges: adolescents (aged 14 through 17 at the time of our data collection) and adults (aged 18 and older). Each group of 30 diagnosed patients was matched for age range, sex, and ethnicity to a comparison group of 15 unexposed volunteers recruited from the general Seattle community. The study combined two major data collection activities: a fairly conventional 3D MR brain image, mildly T1-weighted, at voxel size 0.85 × 0.85 × 1.5 mm3, and 5 hours' worth of laboratory testing and interviews tapping a very broad list of neuropsychological domains known or suspected to be damaged in the fetal alcohol spectrum disorders. Prior analyses of this data record have dealt with subcortical landmark point configurations, with the size and shape of the midline corpus callosum, with the prevalence of executive function deficits in the patient population, and with endophrenology (the correlation between behavioral deficits and details of the shape of the brain). The present report is our first that concerns data from the cerebellum.
Data Collection From Cerebellar Surface: An Intentional Simplification
The human brain, somebody said recently, is “the most complex system in the known universe,” and so the sciences trying to understand it have quite a steep wall to climb. Studies of congenital insults play the same role in these pursuits that experimental studies of disordered development played in the rise of classic experimental embryology: understanding the effects of certain specific causes may make it easier to describe the origins of other patterns of variability that we happen to find interesting. The same excessive complexity that applies to theories of the brain applies as well to their images, which are the most complex structures that medical image analysis ever deals with. Such complexity, alas, necessarily interferes with the far simpler sorts of statistical processing (mean differences, detection rules) through which our theories must be filtered if they are to be of any medical utility.
Our work here therefore does not draw on any of the currently cutting-edge approaches to nonrigid registration of brain images, a topic on which an entire bibliographic essay could well be written (see the special issue on biomedical imaging published by this journal: http://www.wiley.com/legacy/products/subject/life/anatomy/bioimage_toc.html). Specifically, the data to be analyzed below were gathered by computer-aided hand digitization of the visually identifiable cerebellar surfaces of each of the 180 study cerebellums in turn, under the guidance of a carefully designed 328-point template. Except for one single landmark point (Tip4V, tip of the fourth ventricle), all of the points collected are what the literature calls “semilandmarks.” Semilandmarks are locations that convey the biometric variation of a curve or surface by a selection of several points lying on the curve or surface that are individually nearest to the predetermined points of a template under one or another reasonable definition of what “nearest” means. This project uses the characterization of semilandmarks as jointly minimizing bending energy with respect to the template: the relaxation reviewed by Gunz et al. (2004) (for applications to craniofacial bony surfaces). This approach approximately preserves the relative spacing of the points among themselves while greatly improving the efficiency of subsequent statistical analyses and the efficacy of visualizations. Our specific morphometric algorithm is a generalization for surfaces of a technique for curves first published for a realistic 2D data analysis right here in this journal (Bookstein et al.,1999), and the detailed technique to be introduced presently is the 3D analogue of the manually managed semilandmark relaxation procedure used for the callosal midline analysis in an earlier phase of this project.
To ease both the construction of a template and the interpretation of our results, we designed a convenient standardized cerebellar Cartesian coordinate system (Figs. 1 and 2) having a familiar landmark (Tip4V, tip of the fourth ventricle, intersection of the crosshairs in Fig. 1) as origin and aligned with the midsagittal tangent line to the posterior brain stem as direction of the craniocaudal axis. This direction, called the axial direction in the following discussion, is the direction that will be projected out of the surface in order to arrive at the curve called “equator” below. The left-right direction is taken along the obvious direction of symmetry for the head as a whole, and the third coordinate, the anteroposterior, is computed as the perpendicular to the craniocaudal direction in the midsagittal plane. (Evidently this is a different direction from the more common gravitational anteroposterior taken parallel to the floor or to the Frankfurt horizontal axis of the skull.) These axes are the horizontal and vertical of the panels in Figure 1.
For analysis of the midsagittal section per se, two semilandmarks are set on the two intersections of the cerebellar surface with the craniocaudal axis of this construction, and a fourth at the most posterior point of the midsagittal cerebellar outline. Twelve further points are placed around the curve of intersection at roughly even spacing. See the right panel of Figure 1, which is in this midsagittal plane. We are not claiming that the polygon delineated here approximates the cerebellar surface in-between at all accurately, nor that these points are anatomically homologous in terms of any principled cerebellar atlas (cf. the discussion of Schmahmann's below); neither assumption is required for the analyses to follow. Likewise, choice of the case used as template is immaterial, as long as it is typical in its topology.
We constructed a decimated template in the form of intersections of the same exemplar surface with a pencil of 16 regularly spaced planes through the axis just introduced. (Here the word “pencil” is a delightful technical term in geometry. In this context, it connotes a one-parameter family of planes, a set all having the same equation that involves one free parameter you may take as the angle that the plane through the axis makes with the anteroposterior direction of the head.) The meridional nature of these locations (a total of 627 triangles on 328 points) is hinted at in the view at left in Figure 1, in which the craniocaudal (axial) direction of the midplane image is now perpendicular to the paper. Notice that the surface here is incomplete in the acute angle between the cerebral peduncles; we are not digitizing that part of the surface. All three points on the axis of this construction (Tip4V and the intersections with the cerebellar surface axially above and below it) are present in all of these sections, but need not be redigitized. Except at the cut edges, every meridian is bridged to its neighbors by a system of triangles making up the “ribbon strips” of Koenderink (1990), a useful way of representing simply curved surface patches. There results the polyhedral surface viewed from above in Figure 2.
The locating (digitizing) of these 328 points per cerebellar surface was carried out interactively via Edgewarp, a large package in C++ and OpenGL coded over the last decade by Dr. W.D.K. Green of the Statistics Department at the University of Washington. The current public release of Edgewarp, which carries out all the computations here using mouse-driven operations, is available for free download as object modules for Linux and Mac systems (ftp://brainmap.stat.washington.edu/edgewarp3d/). Instructions for using the pulldown menus for this specific application are available from the senior author.
Once the template was constructed, by carefully scripted digitizing and triangulation of a template form, one of the authors (P.D.C.) manually warped the lofted surface onto every other subject in our sample of 180, according to the following four steps. One, Tip4V is located, and the direction through it specified that it is parallel to the posterior brain stem. (Along with the visually obvious axis of bilateral symmetry, this direction sets the coordinate system we are using.) Then five more points are located: the posteriormost point of the midsagittal cerebellar section is located (typically, as in the case here, it is craniad to Tip4V in our system), the most lateral points left and right in the axial plane through Tip4V, and the two places where the axis of the construction pierces the cerebellar surface above and below the registration point. Two, the template surface just figured above is warped into the new subject's coordinate system by thin-plate spline using the six points just specified. Three, in the midplane and every other plane of the warped pencil, the points of the warped template are projected perpendicularly onto the apparent cerebellar outline by the operator's hand. Projection does not go into the depths of sulci, but remains on the surface (the locally convex completion) of the visually apparent cerebellar cortical tissue edge. Finally, at the conclusion of this operation, when all of the semilandmarks appear to be lying on the visible image surface in the meridional sections through them, they are all relaxed (slid jointly) along the surface of the tesselation of the faceted surface they themselves span to positions of minimum bending energy with respect to the template form.
MORPHOMETRIC STATISTICAL METHODS
We analyzed the 328-point data resource just sketched, and also two of its interesting subsets (the 16 semilandmarks of the midsagittal section, and the 23 semilandmarks of the equator), by the Procrustes methods that have recently become standard for analysis of labeled Cartesian locations like these. That method was reviewed in our earlier articles (Bookstein et al.,1999,2002a) in this journal, in several textbooks (e.g., Dryden and Mardia,1998), and in many other articles strewn over the recent literature of quantitative morphology. (A Google search on the phrase “Procrustes analysis” now brings up about 33,000 retrievals.) The general strategy is to reduce all the forms to a common coordinate system of deviations from the average shape, by standardization of centroid location, orientation, and scale in any order. The standardized landmark coordinates turn into an otherwise conventional vector of variables that can be analyzed by standard multivariate methods, and any patterns unearthed can be visualized (most conveniently, by thin-plate spline deformation grids) back in the space in which they were originally digitized. Statistical significance tests of shape difference are by permutation testing (Good,2000) of Procrustes distance between average shapes taken between groups that randomly divide the original collection of shapes into two parts instead of according with the actual diagnosis in the data.
We attend carefully to the notion of geometrical scaling, centroid size, that is built into this toolkit (Bookstein,1991). It is this variable that will ultimately embody the finding of hypoplasia that links our analyses to the experimental animal literature. Centroid size is the square root of the total squared distance of all the landmark or semilandmark points from their common centroid case by case, hence the name. This particular measure is strongly correlated with most other candidates for a general size measure, such as two-dimensional area or three-dimensional surface area or volume, but it has powerful formal (mathematical) properties as well that we will rely on in the course of the multivariate computations here. In particular, centroid size plays a unique role in multivariate models of allometry.
The reader accustomed to multivariate Gaussian (normal) models may be less familiar with the allometric models that are the single most powerful explanatory tool in the biometric style of explanation of form. These models have a long history, beginning with Karl Pearson's work of the 19th century on correlations among suites of multiple length measures. Good reviews include Huxley (1932), Gould (1965), Blackith and Reyment (1971), Reyment and Jöreskög (1993), and chapter 4 of Bookstein (1991). Allometry is the statistical study of one specific biological process, the determination of shape by growth or size change. When shape is measured by many different variables, as it is in our tradition of geometric morphometrics, all those dependencies are usually modeled jointly in the conventional form of parallel regressions with different predictor ranges (Fig. 3). These analyses used to proceed by extraction of relative warps (principal components of shape) followed by group-by-group regressions on size in this manner.
Only recently (indeed, well after publication of our previous article in this journal) was it recognized that there is a better way to carry out the same modeling: by restoring centroid size to the shape representations underlying the Procrustes method. In this new approach (Mitteroecker et al.,2004), allometry is visualized via the principal components of an extended morphometric representation that includes size information and shape information simultaneously and commensurately. The trick is to add a column to the Procrustes shape coordinate matrix for the (natural) logarithm of centroid size, and then carry out otherwise conventional principal component analyses in this newly augmented size-shape space. This approach incorporates every classic approach to allometry as one or another graphical analysis deriving from the one single computation.
Software for geometric morphometrics is available from a wide range of purveyors (see, again, the Google retrieval mentioned above). As the size-shape computations were still somewhat experimental when we carried out our analyses, we used Splus, a commercial statistical programming package, as our computing environment.
Cerebellar vermis (midsagittal section)
Analysis of the cerebellar vermis was by geometric morphometric analysis of the 16-point configuration shown at right in Figure 1. There were no findings here of any shape effect of FASDs within age-sex classes. (Examination of the raw data indicates that this owes to substantial variability of the posterior-inferior part of the outline in this plane.) The size comparisons involving vermis, shown in Figure 4, are barely statistically significant for the two adult samples (for the male, t = 2.4; for the female, t = 2.2), and not at all for the two adolescent samples (|t| < 1). All t values are of n = 15 unexposed against n = 30 FASDs. (Because of differences in variance, the significance tests associated with the t ratios would need to be computed by resampling methods, as in Figure 11 and its associated discussion.)
Full 328-point surface
The situation is entirely different for the full 328-semilandmark surface data set. The clinical three-group design now shows substantial size signals in all four of our design quadrants (male or female, adults or adolescents), as seen in Figure 5. Note the excellent separation now, especially for the adult females; note also that the size distribution for the FAE subsamples always lies intermediate between that for the unexposed and that for the FAS subsamples. The comparisons show nearly double the signal-to-noise ratios as those of Figure 4: t ratios are 3.97, 4.89, 3.20, and 2.79 for the adult males, adult females, adolescent males, and adolescent females, respectively.
There is no paradox here. There is plenty of opportunity for a strong nonmidsagittal morphometric signal to materialize in the size and shape of left and right half-cerebellums. In fact, we find (in inelegantly detailed displays not shown here) strong signals in every coordinate direction for the data off the midplane, with many t ratios exceeding 5 (corresponding to effect sizes of greater than 1 standard deviation). These measures are not diameters of the form, but specific components of the centroid size formula itself in the three Cartesian directions of our coordinate system. The dimension of largest size effect variability is the dimension of bilateral symmetry. A simplification would be in order if it does not do too much damage to the signal strength we are seeking to exploit. As part of such a simplification, we would want to minimize redundancy of the required data representation. Axial centroid size (e.g., within-subject variance of the axial coordinate) is correlated with left-right centroid size for our age-sex groups, but anteroposterior centroid size is not thus correlated. If we wished to focus our attention on just a single pair of Cartesian dimensions, it would be the combination of the left-right axis with the anteroposterior (in other words, omitting the direction of the aqueductal axis itself).
Reducing the complexity of the data
Hints can be gleaned from the classic anatomical monographs on the cerebellum. We have recourse to that of Schmahmann et al. (2000), which notes a horizontal sulcus running quite near to the widest circuit of these organs perpendicular to the axis. The horizontal sulcus is an approximately planar curve that could well convey much of the information available from the cerebellar surface as a whole, were it only visible in our images. Alas, typically it is not visible in images having the 1 mm3 voxels that characterize these MRs. But there is a visual resource near the horizontal sulcus: the apparent outline of these organs as viewed from a great distance out along the axis we have assigned them. The curve in question has the mathematical characterization of a curve on the surface where the surface tangent plane passes through the point of viewing. By analogy with what you see from the top of a mountain, it is also called the visual horizon of the surface with respect to locations a long way away along the aqueductal axis. We can refer to it less technically as a silhouette of the cerebellum in the direction of the axis or, even more conveniently, as its equator.
We thereby recognize in this construction a very old method indeed, dating back to the dawn of scientific photography. Curves like these lay at the foundation of the most well-articulated classical biometric application, to the craniofacial skeleton. Volume 2 of Rudolf Martin's magisterial Lehrbuch der Anthropologie (1928) incorporates an enormous range of clever extractions of quantities from skulls viewed in just this way: as outlines of photographs from a sufficient distance along aligned coordinate directions. The familiar “three ladies of the anthropology laboratory” (norma lateralis, norma frontalis, and norma verticalis) produce their outlines as just such visual horizons. The curve we seek, the one we are calling the equator of the cerebellum, has in fact already been shown in Figure 2; it is the outline of the green-tinted surface there. (In this geodetic metaphor, the “north and south poles” corresponding to the equator lie on the axis of the coordinate system, the points above and below Tip4V in Fig. 1.)
Furthermore, the sizes and shapes of these curves are already embedded in our 3D surface representation just as much as were those of the midsagittal vermis. To produce the equator from an aqueduct-registered cerebellar form, we just discard the axial coordinate of the points originally digitized in 3D at points of their meridians that are, on average, farthest from the axis. There are 23 of these points, including the original semilandmark that was most posterior in the midsagittal plane (the unpaired landmark in the figures to follow). Properly speaking, these are not plane curves (i.e., outlines from flat slices) but instead places on the surface where the implied normal to the (crudely) digitized surface lies perpendicular to the axis of the coordinate system.
Findings for the equator
Turning to the 23 points on the equatorial outline, we compute their size and shape by the usual Procrustes methods. The centroid sizes for this simplified data set show almost as strong a signal as those of the full 328-point surface representation (Fig. 6); t values are 2.75, 4.54, 2.31, and 2.74 for the same four quadrant-specific comparisons, and the remarkable discriminatory power for size for the adult female perseveres.
Beyond the size measure for the equatorial curves lie the customary Procrustes shape scatters. Figure 7 shows these for the adult subsample of our study, those who will ultimately show the greater signal magnitude in the final analyses that follow. We see here, in the distribution of the × symbols on both sides of the central bundle of ● symbols, the same hypervariance of shape in the FASD subgroup that we reported for their callosal outlines in 2002. When we draw the average shapes at their average sizes by age-sex group (Fig. 8), it is gratifying to see the same display for all four of these design quadrants. The average effect of having an FASD diagnosis is invariant across our four age-sex groupings.
Explanatory Finding: Allometry
The information about shape in Figure 8 should be considered in light of the size differences in Figure 6. While ordinary shape comparisons do not find significant differences, there is an alternative approach available from the geometric toolkit that considers the possibility of correlations between shape and size: the allometric model already introduced. The method of size-shape relative warps described above can test whether there is information in the shape domain that can complement classifications based merely on measured size. Figure 9 shows the plot of the first two size-shape relative warps for the full surface data set of 328 semilandmarks. In the panel at upper left, we show (by the dashed line) the projection of the geometric size axis on these two components. Size aligns almost equally with these first two components and is almost entirely located within this plane (more than 92% of its variance is explained by just these two factors). We see that there is indeed shape information available as well as size difference. Comparing the separate panels of Figure 9 to the corresponding ranges in Figure 5, we see that there are more points to the left of the ● symbols in Figure 9 than there were in Figure 5. The shape analysis sharpens the size-only discrimination in three of the four age-sex subgroups, excepting only the adolescent female.
The same approach applies in the simplification of the data to the equator curve alone. Figure 10 shows the size-shape relative warp analysis corresponding to the size-only displays in Figure 6. Again, size projects almost totally onto the first two dimensions of size-shape space (95% of its variation is captured there), but now the clustering of the unexposed in the adult subsamples is quite pronounced. The clustering of the adult females in the upper right panel of Figure 10, in particular, is as tight as what we published in 2002 (for adult males) by combining behavioral and anatomical data. Here we have approached that degree of concentration of the signal of normality by manipulation of anatomical observations alone.
As Figure 10 shows, the main effect of the prenatal alcohol damage on the equator of the cerebellum is to shift not only size, as we already noted, but also the size-correlated parts of shape; and size here lies mostly on relative warp 1 (the one plotted horizontally). Temporarily restricting ourselves to just the adult subsamples, we consider the effect of alcohol on every single principal component of the equatorial shape: all 43 of them (an approach that one would never use for any other statistical purpose). Bear in mind that the computation here makes no reference to the grouping structure of the study design. Figure 11 indicates that when appropriately corrected for the multiple comparisons, there is no difference between the unexposed and the FASD subgroups, by nonparametric Wilcoxon test, on any principal component after the first. The FASD subsample (the + and × symbols) are vertically centered in the same place as the unexposed (the ● symbols) not only in the top row of Figure 10 but in every other figure corresponding to the other possible vertical axes. In other words, the morphological effect of the size difference is not supplemented by any other systematic effect of the alcohol exposure.
In the context of our graphical models (Fig. 3), we see that the allometric finding does not take on the conventional form of the diagram at left, but instead manifests the less familiar form shown at right. The FASD cerebellum differs from the unexposed by a variable extent of the effect of pure size difference (the horizontal axis here), together with additional variation showing no mean tendency to differ between groups in any direction perpendicular to the long axis of the ellipse shown in the figure.
It is instructive to display this first size-shape relative warp in the form of a thin-plate spline deformation grid (another component of the standard geometric toolkit). Figure 12 shows the effect of changes (of somewhat greater amplitude than the data actually afford) in both directions from the grand mean form over all 180 cerebellums of this study. The average here corresponds to the size and the shape of either a small unexposed brain or a relatively large alcohol-exposed one. The form on the left then sketches the equatorial form expected for an unrealistically small FASD case, while the form on the right corresponds to the expected form of an unrealistically large cerebellum from either diagnostic group. Inside either deformed outline is the mathematically smoothest estimate of the field of tissue deformations (growth excesses or deficiencies) consistent with the shift to the outline in question.
Corresponding to each of these setups (two size-shape relative warps, plus an original centroid size variable) is a discriminant function analysis. The discrimination is quadratic because our FASD cerebellums show differences of variance as well as of averages compared to those of the unexposed subjects. The equator-only analysis just diagrammed classifies adult males as unexposed or FASD with three errors each among the unexposed, the FAE, and the FAS, and classifies the adult females with the same count of nine errors now distributed as 4, 3, 2. (Errors are from the full cross-validated analysis in which each subject is classified according to the classifier derived from the 44 others in the corresponding study quadrant. The predictors for the analysis included centroid size and those two dimensions of correlated shape, SSRW1 and SSRW2, separately by quadrant.) Analysis for the full surface data set of 328 points shows about the same accuracy. The role of the shape representations is to add some redundancy and also some biological verisimilitude and plausibility to the size classification as regards FASD versus unexposed.
If we restrict ourselves to just one quantity—the horizontal axis in Figure 10, which, recall, was computed without reference to diagnostic groupings—we can distinguish the unexposed from the FASD adult cerebellums with a total of only 22 errors out of the total sample of 90. (The cut corresponding to this discrimination falls at about 0.05 on the horizontal axis of the panels in the figure; it misclassifies the 9 of the unexposed to its left, along with the 13 exposed to its right.) This performance, while not as solid as the formal quadratic discriminant analysis, is quite respectable for a single size-shape score (the deformation pattern depicted in Fig. 12). The corresponding separation purely by equatorial centroid size, as in Figure 6, can be set to a nonsignificantly lower total count of errors (20, versus the 22 for the analysis including shape), but omitting mention of the shape information has the rhetorical effect of blocking the tie to the explanatory setting of this finding. Correction of cerebellar size for net brain size is not likely to improve this accuracy, as net brain size is itself known to be reduced in FASD.
In this data set, the effect of the FASD diagnosis is the same as the effect of size allometry itself. The graphics are clearest in the two-dimensional equatorial view of Figure 13. At left is shown the net effect of small centroid size via a threefold extrapolation of the mean difference between the forms larger than the unexposed adult minimum centroid size (about 49; see the horizontal axis in Fig. 6) and the others. This grid is indistinguishable from the small end of the first size-shape relative warp in the preceding figure. But, also, it is indistinguishable from the comparison of the FASD cerebellar equators of small size to the FASD equators of large size (center panel). By comparison (right panel), there is no shape difference at all between the equatorial form of the unexposed cerebellums and that of the FASD cerebellums that are large for their diagnosis.
In other words, as far as these data can show, the entire effect of the prenatal alcohol exposure (as encoded in the diagnostic grouping) is carried by one single mediating variable, the centroid size of the equatorial silhouette. This finding is wholly in keeping with the simplicity of the animal literature, which finds only one ultimate expression of the effect of prenatal alcohol exposure on the cerebellum that might be observed at the gross imaging level: the phenomenon of reduced cerebellar mass. The different morphogenetic mechanisms reviewed in the current experimental literature survey an assortment of pathways that result in that outcome, with an eye toward possible interventions that might intercept those pathways; but it remains the case that net cerebellar size mediates the effect of alcohol equivalently in the experimental and the human observational domains. While animal studies of behavioral deficit are strongly consistent with observational studies of analogous deficits in humans (Goodlett and West,1992), except for the pioneering study of Sulik et al. (1981) it is much less common to encounter any consilience between morphological investigations in the two domains. The finding here of a one-dimensional mediation of the alcohol effect on the cerebellum in both animals and humans is simple and gratifying.
Corresponding to so simple a decision rule is an equally simple data resource, the equatorial silhouette. Intentionally to discard so much of the complexity of the available medical image data, as we have done here, is an unusual tactic. We know of no other research finding beginning with a 3D data analysis that has ended up recommending a 2D analysis instead. In compensation, so simple a quantitative finding articulates very effectively with reductionist explanations (such as apoptosis) of that loss of cerebellar cortical mass. The experimental literature, in its centering on one single trauma (hypoplasia), is a good match to a statistical exegesis centering on just one factor (that size shortfall).
Our finding of a shortfall in size that is steeper for the FAS group than the FAE group is in keeping with several earlier publications (cf. Riley et al.,2004; Riley and McGee,2005), although we do not confirm earlier findings of specific differences in the vermis (Sowell et al.,1996; Autti-Ramo et al.,2002; O'Hare et al.,2005). The issue remains open as to why the humans with the less extensive facial signs seem to show, on average, the less extensive cerebellar damage as well; they did not do so for our earlier analysis of just the callosal midcurve (Bookstein et al.,2002a).
We did not locate or trace any of the cerebellar gyri and sulci (Schmahmann et al.,2000). Any additional information they bear could only add power to the analysis here, but the strength of the mapping-free finding sets a high standard against which to judge claims of a more supple, adaptive, or accurate digitization in view of the considerably greater cost entailed. Our standardized coordinate system was introduced purely for convenience of digitizing: an axis to rotate around. The direction of the central axis, evidently set with reference to the brain stem, need not be involved in cerebellar architecture or ontogeny in any characteristic way. Still, there is a clear map for future histological or cytoarchitectonic research in the grid patterns of Figures 12 or 13.
There is also the matter of the equatorial curve data acquisition per se, data that in principle could be acquired quite independently of the remaining information about cerebellar surface form. Wholly automatic algorithms for analyzing the cerebellar surface, while likely respecting the landmark properties of Tip4V, might well use a quite different heuristic for assigning preliminary correspondences than the one here, which eased the manual closest-point relaxation described under “Image-Derived Data” above. We encourage our community to add the possibility of silhouettes to its usual toolkit of medical image strategies, and to construct tools for the more widely adopted software packages. More conventional would be an intensified effort to capture the information in the horizontal sulcus, which lies so conveniently near to the curve we have used. But our equator is always visible and easy to locate in real images once the coordinate system is set, whereas sulci are often unclear, discontinuous, or incomplete. There is no analogous sulcus going “over the top” of the cerebellum, or we would be recommending more work on digitization of such a structure as well.
The finding here simplifies not only the report of the neuroanatomical dysmorphology but also the subsequent correlational studies of structural damage against functional and behavioral deficit that are thereby suggested. If cerebellar size is sufficient to capture the extent of dysmorphology in the cerebellum, it ought to correlate substantially with deficits in behaviors that involve cerebellar tissue. Such examinations can then be univariate, not multivariate, which affords them much greater statistical power (or, equivalently, grants them adequate power in much smaller sample sizes). We have such measures in our current study and have correlated them with corpus callosum dysmorphology in earlier publications (Bookstein et al.,2002b). Correlations of cerebellar size with the same measures will be the topic of a later publication, but we cannot resist hinting here that to the variability of size deficit within the FASD subgroup do indeed correspond strongly correlated patterns of variability of behavioral deficits. It is worth noting in this connection that an earlier publication of ours on brain-behavior correlations in this sample (Bookstein et al.,2002b) found a dimension for motor shortfall, variable in extent, within the FASD group that was not itself correlated with diagnosis (FAS vs. FAE) within the FASD category.
Size is the conceptually simplest quantity to measure in a medical image, and equatorial size is even simpler than most, as the equator falls so close to a plane curve. We have recently had considerable success (Bookstein et al.,2005) in extending our adult callosal methodology into the delivery suite by analysis of perinatal intracranial ultrasound. Given the strength of the simple cerebellar size detection rule shown here, it becomes equally important to establish a reliable mode of measuring perinatal cerebellar size. Assessments of this quantity during the immediate postnatal period could offer both pediatrician and parent information about likely brain damage in offspring who are known to have been alcohol-exposed, information strongly relevant to the child's later development. Green (2004) argues that classical eyeblink conditioning will be a particularly rich source of findings and mechanisms that relate cerebellar damage to behavioral deficits. Also, information gathered before birth might sustain potential interventions, some of which, such as maternal counseling, show great promise in other venues.
Fetal alcohol spectrum disorder is not, of course, the only developmental disease that involves a growth deficiency. The role of size shortfall as a unitary mediating variable should certainly be considered in animal studies, which in averaging size differences over experimental groups evidently must lose power, and in observational human studies as well of such problems as iron deficiency, prenatal malnutrition, or the intrauterine growth retardation due to such causes as maternal smoking. Causes of variation of the alcohol-induced size deficit within experimental or naturally observed groups are manifold and are currently being studied by a variety of methods (among them, genetic investigations of the dependence of metabolic pathways on alleles and correlational analyses of the timing and mode of exposure). The methodology demonstrated here could add information from images to the biometric reasoning underlying findings in most studies like those.
That methodology is conceptually simple, and simple in one version of a verbal summary (“the animal finding of cerebellar hypoplasia is confirmed in humans”), but it is nevertheless a good deal more nuanced in its pattern analyses, and the simple qualitative sentence is not a fair representation of what has been accomplished here. Instead, a change of rhetoric is necessary, a change that is generally necessary whenever a finding derived from an experimental design in an animal study is to be matched against a nonexperimental set of observations from a human study. To be as persuasive as an animal study, an observational data set needs to be a great deal more aware of the details of variation. In the present context, where neither dose nor diagnosis ever was under our control, it is much more important than it would be in the analogous animal study that we can assert all of the following as fact: cerebellar size is decreased in about three-quarters of a sample of human patients with FASD, preferentially in one view geometry more than in others, more in some regions than in others, and in a way correlated with shape differences (and, it turns out, with behavioral deficits as well). The verification of the experimental finding in the nonexperimental human context thus relies on a pattern analysis that is much more quantitative than what is required for the animal studies themselves: a sophisticated technical toolkit combining morphometrics and multivariate statistical analysis, to replace the machinery of controlled studies randomizing animals over calibrated conditions of differential dose. We do not so much confirm the consensus of the animal literature, in fact, as we ramify its specification in various ways that, together, compensate for the absence of experimental control (see also Bookstein and Sampson,2005). By adopting this explicitly quantitative pattern-oriented rhetoric, we modify the description of the phenomenon in a genuinely useful way, one that may lead to the generation of many additional hypotheses for new studies in animals as well as in humans. In a spirit of reductionism, these might include investigations of morphogenetic dysregulation as a function of position (cf. O'Hare et al.,2005); in the other direction, one might turn to hypotheses concerning regional or even global cerebellum-correlated neuropsychological function and dysfunction. The logic of morphometric pattern detection applies equally to experimental and to observational studies.
The implications of the present finding are thus potentially quite broad, not only for research into the effects of fetal alcohol exposure but also for the methodology of medical image measurement in disease-specific applications. Medical images, especially brain images, are enormously complex, but some of the statistical methods that apply to them can result in simple findings together with confirmation of the appropriateness of that simplicity. Other fields are accustomed to drastically simplifying the high complexity of image data in this way whenever the simplification is helpful to the translational purpose of verifying causal hypotheses (see, for instance, Sachse,2004, on the corresponding strategy for representations of the geometry of the heart muscle in cardiac biomechanics). This sort of principled simplification is worth considering routinely for medical image-derived geometrical representations as well, especially representations of the brain.
When animal experimentation likewise converges on a simple underlying narrative, as it has for the cerebellar studies in our field, the possibility that simple statistics correspond to a simple mechanism should not be overlooked. In the dysmorphologies, the modified model of allometry at right in Figure 3, translated into the language of ordinary laboratory statistics (organ size as a covariate of other measured effects), can help bridge the gap between the domain of laboratory findings and the domain of observational findings about humans, with immediate benefits for cogency and translational urgency in both domains.
The research reported here and the preparation of this manuscript was supported in part by United States Public Health Service (USPHS) grant AA-10836 to the University of Washington and DA-021519 to the University of Michigan. Development of our Edgewarp software was supported by those grants, by National Institutes of Health grant EB-001957 to the University of Michigan, by Defense Advanced Research Projects Agency (DARPA) contract W81XH-04-2-0012 to the University of Michigan, and by grant P200.093/1-VI/2004 from the Austrian Council for Science and Technology to the Department of Anthropology, University of Vienna, Vienna, Austria. An earlier version of these remarks was presented at the 2006 annual meeting of the American Association for Physical Anthropology, Anchorage, Alaska, on 10 March 2006.