• Ian Barnes,

    1. School of Biological Sciences, Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 9BE, United Kingdom
    Search for more papers by this author
  • Anna Duda,

    1. Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, United Kingdom
    Search for more papers by this author
  • Oliver G. Pybus,

    1. Department of Zoology, University of Oxford, South Parks Road, Oxford, OX1 3PS, United Kingdom
    Search for more papers by this author
  • Mark G. Thomas

    1. Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, United Kingdom
    2. Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvagen 18D, SE-752 36 Uppsala, Sweden
    3. E-mail:
    Search for more papers by this author


A link between urban living and disease is seen in recent and historical records, but the presence of this association in prehistory has been difficult to assess. If the transition to urban living does result in an increase in disease-based mortality, we might expect to see evidence of increased disease resistance in longer-term urbanized populations, as the result of natural selection. To test this, we determined the frequency of an allele (SLC11A1 1729 + 55del4) associated with natural resistance to intracellular pathogens such as tuberculosis and leprosy. We found a highly significantly correlation with duration of urban settlement—populations with a long history of living in towns are better adapted to resisting these infections. This correlation remains strong when we correct for autocorrelation in allele frequencies due to shared population history. Our results therefore support the interpretation that infectious disease loads became an increasingly important cause of human mortality after the advent of urbanization, highlighting the importance of population density in determining human health and the genetic structure of human populations.

Infectious disease has played a key role in determining human population health in recorded history, but the extent of disease-based mortality and morbidity in prehistory remains uncertain. An increase in infectious disease load has been proposed to follow the transition to urban living as the result of increases in population density, pathogen mobility through long-distance trade, and pathogen exposure through animal husbandry and irrigation (Roberts and Buikstra 2003; Barnes 2005). However, support for this link remains elusive as most infections do not cause skeletal lesions or remodeling, and the beginnings of urbanization typically pre-date writing in a given region. Should an association between urban and disease histories exist, we would expect that populations living in regions with a long history of urban settlement would have evolved disease resistance to a greater extent than those without.

Polymorphisms in the SLC11A1 gene (Liu et al. 1996) (formally Natural Resistance-Associated Macrophage Protein 1, NRAMP1) have been shown by meta-analysis to associate with susceptibility to TB in humans (Bellamy et al. 1998; Li et al. 2006) and may also associate with susceptibility to other infectious diseases such as leprosy, leishmaniasis, and Kawasaki disease (Govoni and Gros 1998). Conversely, disease resistance alleles have been proposed to associate with autoimmune diseases, raising the possibility that balancing selection has maintained polymorphism in SLC11A1 (Searle and Blackwell 1999). To investigate the possibility that the process of urbanization has, through increased disease load, shaped the distribution of a well-attested infectious disease resistance polymorphism, we determined the frequency of the protective 1729 + 55del4 variant of SLC11A1 (Liu et al. 1996) in 17 populations with a range of urbanization histories (Table 1). To control for general genetic relationships due to shared demographic history among populations, only populations where classical marker data (Cavalli-Sforza et al. 1994) are available, for either the populations typed or for a neighboring group, were examined. We considered only Old World populations, primarily because the extent and epidemiology of many pathogens, and in particular TB, is unclear prior to European colonization of the New World. The pattern of infection observed in the archaeological record suggests that TB was rare but present, or that native New World TB strains were differently pathogenic (Gomez i Prat and de Souza 2003). In addition, the New World regions with a longer history of urban settlement are often those with the greatest European settlement, and thus admixture (Wang et al. 2007), and classical marker data for the Americas are limited to only five population groups (Cavalli-Sforza et al. 1994), preventing us from making an extensive study of the history of disease-resistance in these regions using our current methodology.

Table 1.  Population data used in this study. For the equivalent populations used in the FST analysis, three letter codes are as given in Cavalli-Sforza et al. (1994). Frequency values are from this article unless otherwise referenced.
Populations usedSLC11A1 1729+55del4 insertionUrbanizationDomestic cattle
NameEquivalent for FST comparisonNumber of chromosomesFrequencyRankReferenceSettlement site usedDateRankReferenceDate (BP)RankReference
IraniansIRA 961.00 1=This studySusa, others3250 BC 2(Nicholas 1991)10,000  2(Peters et al. 1999)
ItaliansITA 941.00 1=This studyTarquinia, others720 BC 7(Haynes 2000)9900 =3(Cymbron et al. 2005)
Anatolian TurksNEA 960.990 3This studyÇatal Höyük6000 BC 1(Mellaart 1967)10,400  1(Peters et al. 1999)
EnglishENG 960.979 4This studyColchester55 AD10(Ottaway 1992)1800=14(Davis 1987)
KoreansKOR 900.956 5(Kim et al. 2003)Tong’gorou100 BC 9(Portal 2000)5500  8(Payne and Hodges 1997)
IndiansIND3080.948 6(Roy et al. 1999)Harappa2725 BC 3(Kenoyer 1991)9900 =3(Meadows 1993)
GreeksGRK 940.947 7This studyKnossos, others1700 BC 5(Dickinson 2002)9000  5(Cymbron et al. 2005)
JapaneseJPN1800.922 8(Abe et al. 2003)Nara710 AD12(Befu 1971)1800=14(Payne and Hodges 1997)
SichuaneseSCH 960.885 9This studySanxingdui2000 BC 4(Xu 2001)7000  6(Bellwood 2006)
EthiopiansEAF 960.86510This studyAksum100 AD11(Philipson 2001)5000 =9(Marshall 2000)
BerberBER 960.85411This studyCarthage800 BC 6(Lancel 1994)5000 =9(Hassan 2000)
GambiansWAF8340.84212This studyBathurst1816 AD14(Wright 2004)6000  7(MacDonald and MacDonald 2000)
YakutsNTU 840.82113This studyYakutsk1632 AD13(Lantzeff 1943)0=16NA
S SudaneseNIL 920.77214This studyJuba1919 AD17(Holt and Daly 1988)5000 =9(Marshall 2000)
CambodiansMNK2120.76915(Delgado et al. 2002)Angkor Borei300 BC8(Stark 1998)4000 13(Bellwood 2006)
SaamiLAP 940.76616This studyKiruna1900 AD16(Brunnsjö et al. 1975)0=16NA
MalawiansBAN 940.73417This studyBlantyre1880 AD15(Pike and Rimmington 1965)5000 =9(MacDonald and MacDonald 2000)

Regional histories of population acculturation to urban settlement are complex and individual, but as a single quantitative proxy measure of the historical extent of urbanization, we identified the oldest recorded date of the first city or other significant urban settlement in the region of the population sampled (Table 1). In determining these dates, we comprehensively searched the archaeological and historical literature for the different study regions. We identified the oldest date for a site that was either described as a major town or city, or for which there was alternative evidence for high-density settlement. Some settlements have been described as towns, but lack evidence of permanent occupancy at high density and were thus ignored. For example, the native oppida of Iron Age Britain have been interpreted as regional centers for administration, with a defensive or seasonal role. In such centers the population size would have fluctuated appreciably. Others have permanent settlement at high population density, yet lack evidence for the existence of craft specialists or centralized administration (e.g., Çatal Höyük, see Mellaart 1967). Dates for these sites were included, as we are interested in population density, rather than evidence of social structure. The dates are taken from the founding of the settlement, rather than any subsequent increase in size or elevation to superior status. We recognize that the proxy employed may be an inaccurate measure of the extent of exposure to urbanization under a variety of conditions, including: (1) if urbanization is discontinuous after the founding of the first major settlement, (2) if archaeological and historical records are insufficiently intact to provide accurate data on early urbanization, (3) if the establishment of urban settlements did not impact on all components of the sampled population within a region, (4) or if population replacement has occurred since the origins of urbanization. However, none of these conditions should systematically bias our analysis in favor of a positive correlation between length of urban settlement and allele frequency; rather they will serve to weaken any apparent correlation, if present.

Materials and Methods

DNA samples from 12 populations were typed, and these data were combined with previously published results to define the global distribution of the 1729 + 55del4 alleles (Table 1).

A 140/144 base pair section of the SLC11A1 gene, which included the 1729 + 55del4 marker, was PCR-amplified using one HEX-labeled primer (ATGCCTTGGGAATGGATGAG) and one unlabeled primer (GGTTGGCTGGTCTCAGGAAC) in a volume of 10-μl containing 200 μM dNTPs, 10 mM Tris-HCl (pH9.0), 0.1% Triton X-100, 0.01% gelatin, 50mM KCl, 1.5 mM MgCl2, 0.13 U Taq polymerase (HT Biotech, Cambridge, UK), and 0.15 μM of each primer. Cycling parameters were a pre-incubation step at 92°C for 3 min followed by 40 cycles of 92°C for 45 sec, 60°C for 45 sec, and 72°C for 45 sec. An aliquot of 1 μl of the PCR product was mixed with 10 μl of deionized formamide and a molecular size standard manufactured in-house, containing an FAM-labeled 142 base pair fragment. PCR product sizes were resolved on an ABI 3700.

All statistical analysis was carried out using the statistical package “R” (URL: Logistic and linear regressions were performed using the “lm” and “glm” functions and support for model fit was assessed using the “drop1” function. This returns the probability of obtaining as large an increase in the apparent fit of the model after adding the trend term (urbanization time), under the null hypothesis that the trend term makes no difference to the fit. Partial Mantel tests were performed using the “vegan” library within “R.”

Results and Discussion

Proxy urbanization dates and SLC11A1 allele frequencies were ranked and found to be strongly correlated (Spearman's rank test P < 0.0013, ρ= 0.715; Fig. 1). Because allele frequencies are proportions, the most appropriate null model for these data is a logistic regression with a binomial error distribution (Fig. 2A), which gives a highly significant fit (P= 2.2 × 10−16, χ2= 80.76, df = 1). To test for presence of potentially influential points, we calculated influence (Cook's distance, see Cook 1977) and leverage scores (Fig. 2B) and identified three potential outliers: India, Cambodia, and Gambia (Fig. 2A, B). Removal of these points did not greatly affect the significance of the fit to a logistic model (P= 2.2 × 10−16, χ2= 70.33, df = 1). Crucially, the two estimated logistic model parameters represent important evolutionary variables. First, the y-axis intercept of the fitted logistic model (0.812) equals the expected frequency of SLC11A1 when the duration of urbanization is zero, and therefore estimates the pre-urbanization equilibrium allele frequency. Second, under a classic deterministic model of positive selection, the logistic shape parameter (3.06 × 10−4) equals the selection coefficient acting on the SLC11A1 locus since urbanization × the number of human generations per year. Assuming an average generation time of 25 years, this gives an estimated selective coefficient of 0.0075. We further note that a straightforward linear regression (Fig. 2A) between allele frequency and urbanization time is also significant (P= 0.00106; R2= 0.47).

Figure 1.

Geographical distribution of populations. Right-hand squares indicate the age of regional history of urban settlement, left-hand squares indicate the frequency of the protective, insertion allele of the SLC11A1 1729 + 55del4 locus.

Figure 2.

(A) Logistic (solid line) and linear (dashed line) regressions of SLC11A1 1729 + 55del4 allele frequency against time since first urban settlement. The three most influential sampled populations (Cameroon, Gambia, and India) are labeled. The dot-dashed line indicates the logistic regression fit when the three most influential sampled populations are removed from the analysis. (B) Leverage scores plotted against influence scores (measured using Cook's distance, see Cook 1977). The three most influential sampled populations are labeled.

The correlation between proxy urbanization dates and SLC11A1 allele frequencies could be attributed to shared demographic histories among populations. To control for this possibility, we used classical marker FST values as measures of among-population genetic variation, then employed the partial Mantel procedure (Smouse et al. 1986) to test if differences in urbanization time predict differences in SLC11A1 frequency after background genetic relationships among populations (FST) have been accounted for. The effect of urbanization time was significant (P < 0.003). We note that although it is possible that some classical markers may be involved in resistance to infectious disease, this would have a conservative effect on our analysis because it would serve to reduce the variation in SLC11A1 frequency differences that can be solely explained by variation in urbanization times.

Alternative methods for detecting signatures of directional selection on SLC11A1 are unlikely to be as powerful as the one employed here (Sabeti et al. 2006). The global distribution of the 1729 + 55del4 allele and the possibility of balancing selection through a proposed association with autoimmune disease (Searle and Blackwell 1999) would weaken the power of population differentiation based tests, as well as suggest that this polymorphism may be too ancient for selection to be detected by methods based on haplotype conservation. Furthermore, because the SLC11A1 1729 + 55del4 allele was already the major allele (globally) prior to the proposed selection in response to urbanization, selection detection methods based upon reduction in genetic diversity or a high frequency of derived alleles would be ineffective (Sabeti et al. 2006). However, long-term balancing selection on SLC11A1 would increase the probability of detecting directional selection using the methods employed here because it would result in similar allele frequencies across global populations, prior to urbanization. To examine this possibility further, we compared estimates of Tajima's D statistic (Tajima 1989) for the SLC11A1 gene, and 100 kbp flanking each end of the coding region, to an empirical null-distribution of Tajima's D, both based on the Perlegen dataset (Carlson et al. 2005). Although this dataset is biased toward common variation, leading to a positive bias in the Tajima's D statistic (Tajima 1989), we note that, consistent with balancing selection in the East Asians and Europeans, Tajima's D for the SLC11A1 gene region was typically high (in the top 31% and 7% of the empirical null-distribution, respectively).

Various pathogens may have played a role in shaping the global SLC11A1 1729 + 55del4 allele distribution but it is likely that TB is the single most important selective agent. This view is based on the strength of evidence of a role for SLC11A1 in TB resistance (Li et al. 2006), as well as the worldwide distribution and relative ease of transmission of the disease. TB has traditionally also been associated with living in proximity to domestic cattle, as cattle are one of many organisms that act as a host for Mycobacterium bovis, a host-generalist species that can cause a pathologically indistinguishable form of tuberculosis in humans. Cattle are unusual among M. bovis hosts in acting as a transmissible reservoir for the bacteria. This is of particular importance when humans and cattle live in close proximity, a likely scenario in many early farming or pastoralist communities. We therefore repeated the tests performed above, but substituted the timing of the first urban settlement with the time since the appearance of domestic cattle in the different regions (Davis 1987; Meadows 1993; Payne and Hodges 1997; Peters et al. 1999; Hassan 2000; MacDonald and MacDonald 2000; Marshall 2000; Cymbron et al. 2005; Bellwood 2006) (see Table 1). Although significant correlations were found (Spearman's rank test P < 0.003, ρ= 0.659; logistic regression with a binomial error distribution P= 2.6 × 10−11, Chi2= 44.45; linear regression P= 0.0123; partial Mantel test on differences in cattle domestication dates P < 0.0139), we note that all correlations were weaker than for urbanization time. It should also be noted that data on the timing of the first regional use of domestic cattle are, at present, poorly resolved. There are a variety of reasons for this; wild and domestic cattle are not always readily differentiable (Davis 1995), routine collection and analysis of animal bone from archaeological sites is a relatively recent development, and organic remains do not survive as well as most construction materials. Although key differences can be identified in the relative timings of urbanization and cattle exploitation in sub-Saharan Africa, where domestic cattle remains are found well before evidence of large-scale settlement (Blench and MacDonald 2000), the pattern of the spread of animal domestication in much of the Old World appears similar to that of urban settlement. Consequently, although we do not reject a role for zoonotic transmission of infectious disease in establishing present patterns of resistance, it is probably best interpreted as a correlate of the urbanization process rather than a determinant in its own right.

Although our study is concerned with the effects of time since urbanization on SLC11A1 1729 + 55del4 allele frequency over broad regions, it remains possible that systematic differences between long-term urban and long-term rural populations persist within those broad regions today. Testing this is not possible using the samples analyzed here. The samples from Greece, England, Italy, China, Ethiopia, Moroccan Berber, and Sudan were all collected from single towns/regions (Athens, Southwell, the Tyrol, Maoxian, Addis Ababa, Ifrane, and Khartoum, respectively). The Malawi sample did include individuals from three regions (Lilongwe, Kanengo, and Mzuzu) but we found no differences in their allele frequencies (using Fisher's exact test). Although the Anatolian Turk sample was collected in Istanbul, we did have information on their place of birth and found that a diversity of different geographic origins (within Turkey) is represented. However, we note that only one heterozygote was observed (from western Turkey), the rest being del/del genotype. A similar situation occurred with the Iranian sample; although all individuals sampled were resident in Tehran, we did have information on their place of birth. However, all Iranians sampled were del/del homozygotes. Thus for both the Turkish and Iranian samples, a lack of variation precludes detection of regional differentiation. For the Yakut sample, individuals came from a large number of geographic locations, each of which was represented by too few individuals to permit meaningful tests of regional differentiation. For the Swedish Saami sample, we had little geographic information other than that they were collected from the north of Sweden. Nonetheless, it would be interesting to test if modern urban and rural populations differ in SLC11A1 1729 + 55del4 allele frequencies within the same regions.

In summary, our findings indicate that the distribution of SLC11A1 polymorphism in the Old World can be seen as a previously unrecognized example of selection in response to the process of human niche construction (Odling-Smee et al. 2003). Further, the method we employ here makes novel use of historical data to explain the distribution of allele frequency and proposed selective forces, an approach that may prove to be useful in inferring selection in other situations.

Associate Editor: A. Read


We thank V. Jansen and C. Roberts for comment and discussion, M. Weale for advice on data analysis, and K. Veeramah and E. Caldwell for laboratory assistance. We also thank N. Bradman, E. Bekele, D. Bolnick, M. Ibrahim, T. Loukidis, P. P. Pramstaller, T. Parfit, F. Berrada, P. Nasanen-Gilmore, A. Tarekegn, T. Helenius, F. Osman, A. Götherström, L. Goldfarb, E. Jelinek, and X. Zhou for help with sample collection and DNA extraction. Funding for the research was provided by the Natural Environment Research Council and the Arts and Humanities Research Council (Centre for the Evolution of Cultural Diversity).