A link between urban living and disease is seen in recent and historical records, but the presence of this association in prehistory has been difficult to assess. If the transition to urban living does result in an increase in disease-based mortality, we might expect to see evidence of increased disease resistance in longer-term urbanized populations, as the result of natural selection. To test this, we determined the frequency of an allele (SLC11A1 1729 + 55del4) associated with natural resistance to intracellular pathogens such as tuberculosis and leprosy. We found a highly significantly correlation with duration of urban settlement—populations with a long history of living in towns are better adapted to resisting these infections. This correlation remains strong when we correct for autocorrelation in allele frequencies due to shared population history. Our results therefore support the interpretation that infectious disease loads became an increasingly important cause of human mortality after the advent of urbanization, highlighting the importance of population density in determining human health and the genetic structure of human populations.
Infectious disease has played a key role in determining human population health in recorded history, but the extent of disease-based mortality and morbidity in prehistory remains uncertain. An increase in infectious disease load has been proposed to follow the transition to urban living as the result of increases in population density, pathogen mobility through long-distance trade, and pathogen exposure through animal husbandry and irrigation (Roberts and Buikstra 2003; Barnes 2005). However, support for this link remains elusive as most infections do not cause skeletal lesions or remodeling, and the beginnings of urbanization typically pre-date writing in a given region. Should an association between urban and disease histories exist, we would expect that populations living in regions with a long history of urban settlement would have evolved disease resistance to a greater extent than those without.
Polymorphisms in the SLC11A1 gene (Liu et al. 1996) (formally Natural Resistance-Associated Macrophage Protein 1, NRAMP1) have been shown by meta-analysis to associate with susceptibility to TB in humans (Bellamy et al. 1998; Li et al. 2006) and may also associate with susceptibility to other infectious diseases such as leprosy, leishmaniasis, and Kawasaki disease (Govoni and Gros 1998). Conversely, disease resistance alleles have been proposed to associate with autoimmune diseases, raising the possibility that balancing selection has maintained polymorphism in SLC11A1 (Searle and Blackwell 1999). To investigate the possibility that the process of urbanization has, through increased disease load, shaped the distribution of a well-attested infectious disease resistance polymorphism, we determined the frequency of the protective 1729 + 55del4 variant of SLC11A1 (Liu et al. 1996) in 17 populations with a range of urbanization histories (Table 1). To control for general genetic relationships due to shared demographic history among populations, only populations where classical marker data (Cavalli-Sforza et al. 1994) are available, for either the populations typed or for a neighboring group, were examined. We considered only Old World populations, primarily because the extent and epidemiology of many pathogens, and in particular TB, is unclear prior to European colonization of the New World. The pattern of infection observed in the archaeological record suggests that TB was rare but present, or that native New World TB strains were differently pathogenic (Gomez i Prat and de Souza 2003). In addition, the New World regions with a longer history of urban settlement are often those with the greatest European settlement, and thus admixture (Wang et al. 2007), and classical marker data for the Americas are limited to only five population groups (Cavalli-Sforza et al. 1994), preventing us from making an extensive study of the history of disease-resistance in these regions using our current methodology.
Table 1. Population data used in this study. For the equivalent populations used in the FST analysis, three letter codes are as given in Cavalli-Sforza et al. (1994). Frequency values are from this article unless otherwise referenced.
Regional histories of population acculturation to urban settlement are complex and individual, but as a single quantitative proxy measure of the historical extent of urbanization, we identified the oldest recorded date of the first city or other significant urban settlement in the region of the population sampled (Table 1). In determining these dates, we comprehensively searched the archaeological and historical literature for the different study regions. We identified the oldest date for a site that was either described as a major town or city, or for which there was alternative evidence for high-density settlement. Some settlements have been described as towns, but lack evidence of permanent occupancy at high density and were thus ignored. For example, the native oppida of Iron Age Britain have been interpreted as regional centers for administration, with a defensive or seasonal role. In such centers the population size would have fluctuated appreciably. Others have permanent settlement at high population density, yet lack evidence for the existence of craft specialists or centralized administration (e.g., Çatal Höyük, see Mellaart 1967). Dates for these sites were included, as we are interested in population density, rather than evidence of social structure. The dates are taken from the founding of the settlement, rather than any subsequent increase in size or elevation to superior status. We recognize that the proxy employed may be an inaccurate measure of the extent of exposure to urbanization under a variety of conditions, including: (1) if urbanization is discontinuous after the founding of the first major settlement, (2) if archaeological and historical records are insufficiently intact to provide accurate data on early urbanization, (3) if the establishment of urban settlements did not impact on all components of the sampled population within a region, (4) or if population replacement has occurred since the origins of urbanization. However, none of these conditions should systematically bias our analysis in favor of a positive correlation between length of urban settlement and allele frequency; rather they will serve to weaken any apparent correlation, if present.
Materials and Methods
DNA samples from 12 populations were typed, and these data were combined with previously published results to define the global distribution of the 1729 + 55del4 alleles (Table 1).
A 140/144 base pair section of the SLC11A1 gene, which included the 1729 + 55del4 marker, was PCR-amplified using one HEX-labeled primer (ATGCCTTGGGAATGGATGAG) and one unlabeled primer (GGTTGGCTGGTCTCAGGAAC) in a volume of 10-μl containing 200 μM dNTPs, 10 mM Tris-HCl (pH9.0), 0.1% Triton X-100, 0.01% gelatin, 50mM KCl, 1.5 mM MgCl2, 0.13 U Taq polymerase (HT Biotech, Cambridge, UK), and 0.15 μM of each primer. Cycling parameters were a pre-incubation step at 92°C for 3 min followed by 40 cycles of 92°C for 45 sec, 60°C for 45 sec, and 72°C for 45 sec. An aliquot of 1 μl of the PCR product was mixed with 10 μl of deionized formamide and a molecular size standard manufactured in-house, containing an FAM-labeled 142 base pair fragment. PCR product sizes were resolved on an ABI 3700.
All statistical analysis was carried out using the statistical package “R” (URL: http://www.R-project.org/). Logistic and linear regressions were performed using the “lm” and “glm” functions and support for model fit was assessed using the “drop1” function. This returns the probability of obtaining as large an increase in the apparent fit of the model after adding the trend term (urbanization time), under the null hypothesis that the trend term makes no difference to the fit. Partial Mantel tests were performed using the “vegan” library within “R.”
Results and Discussion
Proxy urbanization dates and SLC11A1 allele frequencies were ranked and found to be strongly correlated (Spearman's rank test P < 0.0013, ρ= 0.715; Fig. 1). Because allele frequencies are proportions, the most appropriate null model for these data is a logistic regression with a binomial error distribution (Fig. 2A), which gives a highly significant fit (P= 2.2 × 10−16, χ2= 80.76, df = 1). To test for presence of potentially influential points, we calculated influence (Cook's distance, see Cook 1977) and leverage scores (Fig. 2B) and identified three potential outliers: India, Cambodia, and Gambia (Fig. 2A, B). Removal of these points did not greatly affect the significance of the fit to a logistic model (P= 2.2 × 10−16, χ2= 70.33, df = 1). Crucially, the two estimated logistic model parameters represent important evolutionary variables. First, the y-axis intercept of the fitted logistic model (0.812) equals the expected frequency of SLC11A1 when the duration of urbanization is zero, and therefore estimates the pre-urbanization equilibrium allele frequency. Second, under a classic deterministic model of positive selection, the logistic shape parameter (3.06 × 10−4) equals the selection coefficient acting on the SLC11A1 locus since urbanization × the number of human generations per year. Assuming an average generation time of 25 years, this gives an estimated selective coefficient of 0.0075. We further note that a straightforward linear regression (Fig. 2A) between allele frequency and urbanization time is also significant (P= 0.00106; R2= 0.47).
The correlation between proxy urbanization dates and SLC11A1 allele frequencies could be attributed to shared demographic histories among populations. To control for this possibility, we used classical marker FST values as measures of among-population genetic variation, then employed the partial Mantel procedure (Smouse et al. 1986) to test if differences in urbanization time predict differences in SLC11A1 frequency after background genetic relationships among populations (FST) have been accounted for. The effect of urbanization time was significant (P < 0.003). We note that although it is possible that some classical markers may be involved in resistance to infectious disease, this would have a conservative effect on our analysis because it would serve to reduce the variation in SLC11A1 frequency differences that can be solely explained by variation in urbanization times.
Alternative methods for detecting signatures of directional selection on SLC11A1 are unlikely to be as powerful as the one employed here (Sabeti et al. 2006). The global distribution of the 1729 + 55del4 allele and the possibility of balancing selection through a proposed association with autoimmune disease (Searle and Blackwell 1999) would weaken the power of population differentiation based tests, as well as suggest that this polymorphism may be too ancient for selection to be detected by methods based on haplotype conservation. Furthermore, because the SLC11A1 1729 + 55del4 allele was already the major allele (globally) prior to the proposed selection in response to urbanization, selection detection methods based upon reduction in genetic diversity or a high frequency of derived alleles would be ineffective (Sabeti et al. 2006). However, long-term balancing selection on SLC11A1 would increase the probability of detecting directional selection using the methods employed here because it would result in similar allele frequencies across global populations, prior to urbanization. To examine this possibility further, we compared estimates of Tajima's D statistic (Tajima 1989) for the SLC11A1 gene, and 100 kbp flanking each end of the coding region, to an empirical null-distribution of Tajima's D, both based on the Perlegen dataset (Carlson et al. 2005). Although this dataset is biased toward common variation, leading to a positive bias in the Tajima's D statistic (Tajima 1989), we note that, consistent with balancing selection in the East Asians and Europeans, Tajima's D for the SLC11A1 gene region was typically high (in the top 31% and 7% of the empirical null-distribution, respectively).
Various pathogens may have played a role in shaping the global SLC11A1 1729 + 55del4 allele distribution but it is likely that TB is the single most important selective agent. This view is based on the strength of evidence of a role for SLC11A1 in TB resistance (Li et al. 2006), as well as the worldwide distribution and relative ease of transmission of the disease. TB has traditionally also been associated with living in proximity to domestic cattle, as cattle are one of many organisms that act as a host for Mycobacterium bovis, a host-generalist species that can cause a pathologically indistinguishable form of tuberculosis in humans. Cattle are unusual among M. bovis hosts in acting as a transmissible reservoir for the bacteria. This is of particular importance when humans and cattle live in close proximity, a likely scenario in many early farming or pastoralist communities. We therefore repeated the tests performed above, but substituted the timing of the first urban settlement with the time since the appearance of domestic cattle in the different regions (Davis 1987; Meadows 1993; Payne and Hodges 1997; Peters et al. 1999; Hassan 2000; MacDonald and MacDonald 2000; Marshall 2000; Cymbron et al. 2005; Bellwood 2006) (see Table 1). Although significant correlations were found (Spearman's rank test P < 0.003, ρ= 0.659; logistic regression with a binomial error distribution P= 2.6 × 10−11, Chi2= 44.45; linear regression P= 0.0123; partial Mantel test on differences in cattle domestication dates P < 0.0139), we note that all correlations were weaker than for urbanization time. It should also be noted that data on the timing of the first regional use of domestic cattle are, at present, poorly resolved. There are a variety of reasons for this; wild and domestic cattle are not always readily differentiable (Davis 1995), routine collection and analysis of animal bone from archaeological sites is a relatively recent development, and organic remains do not survive as well as most construction materials. Although key differences can be identified in the relative timings of urbanization and cattle exploitation in sub-Saharan Africa, where domestic cattle remains are found well before evidence of large-scale settlement (Blench and MacDonald 2000), the pattern of the spread of animal domestication in much of the Old World appears similar to that of urban settlement. Consequently, although we do not reject a role for zoonotic transmission of infectious disease in establishing present patterns of resistance, it is probably best interpreted as a correlate of the urbanization process rather than a determinant in its own right.
Although our study is concerned with the effects of time since urbanization on SLC11A1 1729 + 55del4 allele frequency over broad regions, it remains possible that systematic differences between long-term urban and long-term rural populations persist within those broad regions today. Testing this is not possible using the samples analyzed here. The samples from Greece, England, Italy, China, Ethiopia, Moroccan Berber, and Sudan were all collected from single towns/regions (Athens, Southwell, the Tyrol, Maoxian, Addis Ababa, Ifrane, and Khartoum, respectively). The Malawi sample did include individuals from three regions (Lilongwe, Kanengo, and Mzuzu) but we found no differences in their allele frequencies (using Fisher's exact test). Although the Anatolian Turk sample was collected in Istanbul, we did have information on their place of birth and found that a diversity of different geographic origins (within Turkey) is represented. However, we note that only one heterozygote was observed (from western Turkey), the rest being del/del genotype. A similar situation occurred with the Iranian sample; although all individuals sampled were resident in Tehran, we did have information on their place of birth. However, all Iranians sampled were del/del homozygotes. Thus for both the Turkish and Iranian samples, a lack of variation precludes detection of regional differentiation. For the Yakut sample, individuals came from a large number of geographic locations, each of which was represented by too few individuals to permit meaningful tests of regional differentiation. For the Swedish Saami sample, we had little geographic information other than that they were collected from the north of Sweden. Nonetheless, it would be interesting to test if modern urban and rural populations differ in SLC11A1 1729 + 55del4 allele frequencies within the same regions.
In summary, our findings indicate that the distribution of SLC11A1 polymorphism in the Old World can be seen as a previously unrecognized example of selection in response to the process of human niche construction (Odling-Smee et al. 2003). Further, the method we employ here makes novel use of historical data to explain the distribution of allele frequency and proposed selective forces, an approach that may prove to be useful in inferring selection in other situations.
Associate Editor: A. Read
We thank V. Jansen and C. Roberts for comment and discussion, M. Weale for advice on data analysis, and K. Veeramah and E. Caldwell for laboratory assistance. We also thank N. Bradman, E. Bekele, D. Bolnick, M. Ibrahim, T. Loukidis, P. P. Pramstaller, T. Parfit, F. Berrada, P. Nasanen-Gilmore, A. Tarekegn, T. Helenius, F. Osman, A. Götherström, L. Goldfarb, E. Jelinek, and X. Zhou for help with sample collection and DNA extraction. Funding for the research was provided by the Natural Environment Research Council and the Arts and Humanities Research Council (Centre for the Evolution of Cultural Diversity).