Using CRANID to test the population affinity of known crania
Dr. Varsha Pilbrow, Department of Anatomy and Neuroscience, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, Vic. 3010, Australia. E: email@example.com
CRANID is a statistical program used to infer the source population of a cranium of unknown origin by comparing its cranial dimensions with a worldwide craniometric database. It has great potential for estimating ancestry in archaeological, forensic and repatriation cases. In this paper we test the validity of CRANID in classifying crania of known geographic origin. Twenty-three crania of known geographic origin but unknown sex were selected from the osteological collections of the University of Melbourne. Only 18 crania showed good statistical match with the CRANID database. Without considering accuracy of sex allocation, 11 crania were accurately classified into major geographic regions and nine were correctly classified to geographically closest available reference populations. Four of the five crania with poor statistical match were nonetheless correctly allocated to major geographical regions, although none was accurately assigned to geographically closest reference samples. We conclude that if sex allocations are overlooked, CRANID can accurately assign 39% of specimens to geographically closest matching reference samples and 48% to major geographic regions. Better source population representation may improve goodness of fit, but known sex-differentiated samples are needed to further test the utility of CRANID.
CRANID (Wright, 2010) is a statistical program developed by Richard Wright of the University of Sydney to infer the source population of unprovenienced crania in forensic, archaeological and repatriation cases. The potential validity of the program is justified by the consistent finding that cranial measurements when treated multivariately accurately reflect the broad geographic patterning of human populations (Howells, 1973, 1989; Relethford, 1994, 2009; Ousley et al. 2009). This provides confidence in the premise underlying the development of CRANID that posterior probabilities from post hoc discriminant analyses of a global craniometric dataset should allow us to infer the geographic origin of crania of unknown provenience. Howells' (1996) freely downloadable cranial database forms the basic comparative resource for CRANID. This is augmented by samples from the UK, Italy, Denmark, West Asia, India, Patagonia, and indigenous Australia to make a total of 3163 crania from 39 populations differentiated into 74 male and female geographic samples (Wright, 2010).
Another similar program called FORDISC, developed by Richard Jantz and Stephen Ousley of the University of Tennessee (Jantz & Ousley, 1993), also uses discriminant analyses to classify skulls of unknown origin. Like CRANID, FORDISC uses Howells' dataset as a reference sample but with additional samples from the American Forensic Data Bank and the Terry and Hamann-Todd Collection. FORDISC is used widely internationally but it has particular relevance to the American context because the American Forensic Data Bank forms a large proportion of the reference materials (Ubelaker et al. 2002). CRANID has greater validity in Australia and Europe because of greater representation of indigenous Australian and European reference crania.
Studies evaluating the validity of FORDISC have reported mixed success. Classification accuracy was poor when contemporary and archaeological samples were used in the analyses (Ubelaker et al. 2002; Williams et al. 2005), but supporters of FORDISC argue that mismatch between test samples and the samples represented in the FORDISC database, incorrect measurements or insufficient variables could impede accurate attribution (Ubelaker et al. 2002; Hubbe & Neves, 2007; Ousley et al. 2009). When samples were selected from Howells' dataset, and variable numbers were altered to include larger and smaller subsets of variables, FORDISC's classification was still largely inaccurate (Elliott & Collard, 2009). There are no published studies empirically testing the validity of using CRANID to classify crania of known origin. The purpose of this paper is to test whether CRANID can provide accurate attributions for crania of known geographic origins from the collections of the University of Melbourne.
Materials and methods
Twenty-three skulls of known geographic origin were selected from the Berry collection of the University of Melbourne. Most were obtained during the tenure of Richard Berry, Chair and Professor of Anatomy (1905–1929), through trade with collecting institutions or other collectors (Pardoe, 2004; Jones, 2006). The geographic origin of the skulls is known (Table 1) and penned on the skulls, but sex is not indicated. We chose adult, undamaged and non-deformed skulls. We used complete eruption of third molar to indicate adult status, as this takes place at around 20.5 years of age (AlQahtani et al. 2010).
Table 1. Results of CRANID analyses showing original locality of specimen, closest reference samples in the database, LDA and NNDA attributions
|1||516-200584 St. Mary's Abbey, Yorka||London Medieval, Poundbury UK Rom|| |
Zulu S Africa F 481
Zulu S. Africa M 403
|2||516-200266 Scotcha||London Medieval, Poundbury UK Rom||Norse Norway M 0.93|| |
Norse Norway M 690
Norse Norway F 345
San Cruz I Calif M 310
|3||516-200585 Scotch-Cullodena||London Medieval, Poundbury UK Rom|| |
Norse Norway M 0.67
London Medieval M 547
Norse Norway M 460
|4||516-200581 Laplandera||Norse, Norway; Denmark, Neolithic|| |
|5||516-200269 Egyptiana||Egypt 26-30 dynasty||Beduin W Asia MF 0.78|| |
Beduin W Asia MF 527
Peru Youyos F 518
|6||516-200576 York Castle||London Medieval, Poundbury UK Rom|| |
|7||516-200587 British||London Medieval, Poundbury UK Rom|| |
Maori New Zealand M 633
|8||516-200645 British||London Medieval, Poundbury UK Rom|| |
|9||516-200620 Assyrian||Lachish, Beduin||Norse Norway M 0.98|| |
Norse Norway M 978
Berg Austria M 452
|10||516-200293 Solomons||Tolai New Britain, Guam|| |
|11||516-200646 Admiralty Islands||Tolai New Britain, Guam|| |
Phillipines M 0.40
Philippines_M 380 Tolai_New_Britain_F 351
|12||516-200699 Isabel Solomon Islands||Tolai New Britain, Guam|| |
San Cruz I Calif M 0.41
|13||516-200577 Nothingham Abbey||London Medieval, Poundbury UK Rom|| |
Poundbury UK Rom M 0.71
Norse Norway M 403
|14||516-200277 Irish||London Medieval, Poundbury UK Rom|| |
Peru-Youyos F 633
San Cruz I F Calif 620
|15||516-200582 St. Mary's Abbey, York||London Medieval, Poundbury UK Rom|| |
Poundbury, UK Rom M 0.40
|16||516-200586 British Tiegnmouth||London Medieval, Poundbury UK Rom||Poundbury, UK Rom M 0.75|| |
Norse Norway M 460
|17||516-200677 Egyptian||Egypt 26-30th dynasty|| |
|18||516-200705 Ocean Island||Tolai New Britain, Guam|| |
Sydney M 0.53
|Sydney M 633|
|19||516-200275 Kingunan, Rabaul New Britain||Tolai New Britain, Guam|| |
Tolai New Britain M 0.64
Tolai New Britain F 761
Tolai New Britain M 395
|20||516-200642 Central Division Papua||Tolai New Britain, Guam||Tolai New Britain M 0.97|| |
Tolai New Britain M 734
|21||516-200678 New Ireland||Tolai New Britain, Guam||Tolai New Britain M 0.99|| |
Tolai New Britain M 1073
Tolai New Britain F 469
|22||516-200702 Nauru||Tolai New Britain, Guam|| |
Guam Latte Period F 937
Guam Latte Period M 422
|23||516-200706 New Guinea||Tolai New Britain, Guam|| |
Tolai New Britain F 0.71
Beduin W Asia MF 0.12
|Tolai New Britain F 644|
We took 29 measurements on each skull following the directions in the CRANID manual (Wright, 2010) and Howells (1989). To ensure accuracy and reliability of measurements, the second author (V.P.) tested the first author, L.K.'s landmark recognition and measurement definitions 6 months after data collection. All landmarks were recognized and measurements taken as defined by Howells (1989) and the CRANID manual. V.P. also re-measured the skulls previously measured by L.K. We used an independent samples t-test to compare both sets of measurements. The differences were not statistically significant (P < 0.05). The mean measurement error between both sets of measurements ranged between 0.0 and 2.3 mm (between 0.0 and 4.4%), with standard errors of the mean difference ranging from 0.74 to 2.47.
The specimens came from the UK, Lapland, Assyria, Egypt and Papua New Guinea (Table 1). The test specimens did not have exact geographic matches with the samples represented in CRANID. To test classification accuracy we did not prescribe ideal matches, but reviewed the population attributions provided by CRANID. If the CRANID attributions were geographically the closest reference samples available in CRANID, we accepted the attribution as accurate local population attributions. If the CRANID attributions were further away from the geographically closest reference samples, but still within the wider geographic region from where the test sample emanated, we accepted the attributions as accurate wider regional attributions. Thus, Berg, Austria and Poundbury, UK were acceptable as broad geographical matches for the Laplander skull; any European populations were acceptable as matches for the skulls from the UK; West Asian populations of Beduin and Egypt provided acceptable matches for the Egyptian and Assyrian skulls; and Australasian populations were accepted as wider regional matches for the skulls from Papua Guinea. In the absence of known sex we disregarded sex attributions provided by CRANID, potentially allowing for greater classification accuracy.
CRANID uses two statistical methods, linear discriminant analysis (LDA) and nearest neighbour discriminant analysis (NNDA) to infer ancestry. LDA is a parametric test that uses the weighted sum of the values of the cranial measurements of an individual and compares these with the mean values of the populations in CRANID. Probabilities of group membership are used to estimate the most likely source population. As suggested by the manual we made note of all populations with probabilities rounded to 0.1 (or those with at least 10% membership probability), but to be considered accurate we used a summed probability of > 0.5. That is, if the analysis returned several populations within close geographical proximity of the test specimen, each with low attribution probabilities, we considered the classification to be accurate if the sum of the attribution probability was > 50%. This provides a conservative likelihood that the accuracy of attribution is greater than would have occurred by chance (Jantz & Ousley, 2005 suggest using probabilities > 0.7 or 0.9).
The non-parametric NNDA compares the unstandardized canonical variate scores of the cranial dimensions to those in the database, identifying the closest matching crania as nearest neighbours. As the number of matches is dependent on the sample size of the populations in the database (the higher the sample, the greater the probability of matches), a weighted score is computed for each nearest neighbour. As with the reporting of LDA, CRANID suggests 300 as a minimum cut-off point for reporting weighted scores, based on available sample sizes. We followed this, but to be considered accurate, we used a summed weighted score of > 500. This score falls in the middle of the reported range for weighted scores and provides a conservative score for accepting classification accuracy. Mismatches in the population attributions from LDA and NNDA suggest that the parametric assumptions of LDA are violated and the LDA results are less reliable.
CRANID also computes the mean distance of a skull from the centroid of the database, and the mean distance from its nearest neighbor. If these distances are beyond two standard deviations from the means for the database it is suggested that the skull does not have a good statistical fit with the database. We reported evidence for lack of goodness of fit. We used the distributable version of CRANID, which does not correct for overall size, but takes size and shape into account.
Table 1 shows the original locality of the test specimens, the geographically closest matching reference samples available in CRANID, and the results of the LDA and NNDA. The first five specimens (rows 1–5) have a poor statistical fit because the distances from their nearest neighbours fell between 2 and 3 standard deviations from the mean for the database. CRANID's attribution for the first specimen (row one) is largely inaccurate. The next four specimens have > 0.5 summed probability of being accurately attributed to the broad geographic regions of Europe and West Asia, but not to geographically closest local populations. The NNDA attributions for specimen 3 (Scotch-Culloden) are accurate for general and local populations. Specimens 6–23 have a good fit with the CRANID database. The first six of these (rows 6–12) have < 0.5 summed probability of being accurately classified into the geographically closest local population or wider geographic region by LDA. The NNDA attributions are similar, except for specimen 12, which is accurately classified as having populations from New Britain and Australasia as nearest neighbours. The next five specimens (rows 13–17) have a mismatch between the accuracy of the LDA and NNDA attributions. The first four specimens, 13–16 have > 0.5 probability of being accurately classified into geographically closest local population and wide geographic region by LDA, but not according to the NNDA attributions. The NNDA attribution provides accurate classification of specimen 16 into general population, but the geographically closest local population attribution is weak. Specimens 17 and 18 have < 0.5 probability of accurate attribution to geographically closest local populations but they are accurately classified into regional populations. Specimen 17 from Egypt has Egyptian and Beduin populations as nearest neighbours, thus it is accurately classified by NNDA. Specimens 19–23 have probabilities of > 0.5 of being accurately classified into geographically closest local population and wide geographic region. The NNDA attributions are also largely accurate. In summary, disregarding sex attributions, of the specimens with good statistical fit, nine specimens (13–16, and 19–23) are accurately classified into geographically closest local populations and 11 specimens (13–23) are accurately classified into major geographical region by LDA, providing a classification accuracy of 39 and 48%, respectively. The corresponding NNDA classification accuracy is 26% for geographically closest local populations, with six specimens (17, 19–23) accurately classified, and 39% for major region, with nine specimens (12, 16–23). To consider a regional comparison, four of the 11 European specimens, one of the three West Asian specimens, and six of the nine Australasian specimens are accurately classified while also showing good statistical fit with CRANID.
Wright (2010) reports an LDA classification accuracy of 68.2% for the 74 sex-differentiated reference samples in CRANID. In contrast, the classification accuracy in our study is no more than 39% for local groups and no more than 48% for regional groups. Accuracy rates could have been even lower if sex attributions were taken into account because male and female group attributions were summed if needed to provide the 0.5 summed attribution probability.
Five crania showed lack of goodness of fit with the database. Several possibilities are cited in the manual to account for lack of fit and incorrect attribution: incorrect measurements, deformed or extreme cranium, poor representation of the source population in the database and mixed ancestry. It is worth considering each of these possibilities in turn. As outlined above, we used stringent inter-observer repeatability tests to ensure that measurements were taken accurately. We are confident that lack of goodness of fit was not due to errors in measuring. We also ensured that none of the crania in our study was intentionally or pathologically deformed.
Poor representation of source population is a likely reason for poor statistical fit and incorrect attribution. All test specimens fell within 2 standard deviations from the centroid for the database, but for the specimens with poor statistical fit, the distances from the nearest neighbour were within 2–3 three standard deviations from the mean for the database. They had high probabilities of attribution (between 0.5 and 0.9), although not to the geographically closest available reference samples. This suggests that the exact source populations were not represented in CRANID and the variability in the available samples did not accommodate that of the test specimen. Many of the geographically closest reference samples for the specimens in this study came from ancient populations, e.g. Iron Age Lachish, Neolithic Denmark, Medieval London, Roman Poundbury, 26–30th Dynasty Egypt and Latte Period Guam. Although in some cases CRANID still selected these as the geographically closest attributions, secular changes in the contemporary test specimens could have precluded them from being assigned to ancient populations (Jantz & Ousley, 2005).
The question of mixed ancestry is also pertinent and could impede CRANID accuracy. However, this possibility needs to be considered against inherent high levels of polymorphism and within-group variation in human populations. On average, roughly 90% of global human craniometric variation occurs within local populations (Relethford, 1994, 2009), leaving 10% to be apportioned into larger geographic regions. This pattern reflects an historical pattern of gene flow among humans. This could make it difficult to assign a skull accurately to a particular local population. Discriminant function analysis is designed to minimize within-group variation and maximize among-group variation to provide group separation (Manly, 1994). This allows a large sample of human crania to be classified into predetermined geographic regions with high accuracy, as reported (Relethford, 2009), but in post hoc analyses a single cranium may not be assigned to a population with great confidence. It is also known that regions such as Australasia have had a complicated settlement history and mixed ancestry (Melton et al. 1995; Kayser et al. 2008; Wollstein et al. 2010). This could make it difficult for CRANID to assign a skull accurately to its source of origin.
Adaptation and natural selection could also confound attempts at determining ancestry through craniometrics. Aspects of facial shape are known to be affected by selection due to climate, especially in people living in extremely cold northern latitudes (Roseman, 2004; Harvati & Weaver, 2006; Smith et al. 2007; von Cramon-Taubadel & Lycett, 2008) There is also a relationship between cranial size and climate, suggestive of Bergmann's thermoregulatory rule (Harvati & Weaver, 2006; Hubbe et al. 2009). Similarly, cranial and mandibular shape is affected by masticatory stress (Paschetta et al. 2010; von Cramon-Taubadel, 2011).
At the same time there is evidence for population structure in human cranial morphological diversity that fits with expectations of neutral genetic variation. Strong positive correlations between craniometric and geographic distances among populations suggest that isolation by distance models (Relethford, 1994, 2004) with iterative bottleneck dispersals out of Africa (Manica et al. 2007; von Cramon-Taubadel & Lycett, 2008; Betti et al. 2009) can explain modern human patterns of craniometric diversity. Focusing on regions of the cranium that reflect population structure and history may provide better resolution of ancestry (Lockwood et al. 2004; Harvati & Weaver, 2006; von Cramon-Taubadel, 2009; Hubbe et al. 2009; Smith, 2009). Another approach would be to undertake shape-based analyses, as cranial size is influenced by adaptive responses to climatic variables (Harvati & Weaver, 2006; Smith et al. 2007) and masticatory functions (Paschetta et al. 2010; von Cramon-Taubadel, 2011). We used the freely distributable version of CRANID, which uses only size-related variation. Wright (2010) provides shape-based discriminant analyses at a fee, which may offer better resolution.
Our study sample mimics a forensic or archaeological situation where CRANID may be called upon to estimate ancestry. It suggests that if the test crania fall outside the range of variation of the reference samples, belong to contemporary populations, come from mixed ancestry or are affected by adaptation and natural selection CRANID may not be able to provide accurate estimation of geographic origin. These caveats severely restrict the utility of CRANID. Improving source population representation and focusing on regions of the cranium that reflect population structure and history may provide better resolution. Finally, because sex was unknown we were not able to test the utility of CRANID to classify skulls into sex-differentiated groups. Future studies may benefit from using known sex samples.
We thank Richard Wright for providing the program CRANID, making available several updates and being extremely prompt and helpful in responding to queries. The interpretations and conclusions are entirely ours. We thank Melbourne University's Department of Anatomy and Neuroscience for providing access to the cranial material. We thank Colin Pardoe, Peter Brown, and Chris Briggs for discussions on the provenience of the skulls, and Jason Ivanusic for helpful suggestions on the paper.