SEARCH

SEARCH BY CITATION

Keywords:

  • human pigmentation;
  • skin color;
  • positive selection;
  • genetic adaptation;
  • Perlegen database;
  • SNP;
  • EHH test

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web Resources
  10. References

Phenotypic variation between human populations in skin pigmentation correlates with latitude at the continental level. A large number of hypotheses involving genetic adaptation have been proposed to explain human variation in skin colour, but only limited genetic evidence for positive selection has been presented. To shed light on the evolutionary genetic history of human variation in skin colour we inspected 118 genes associated with skin pigmentation in the Perlegen dataset, studying single nucleotide polymorphisms (SNPs), and analyzed 55 genes in detail. We identified eight genes that are associated with the melanin pathway (SLC45A2, OCA2, TYRP1, DCT, KITLG, EGFR, DRD2 and PPARD) and presented significant differences in genetic variation between Europeans, Africans and Asians. In six of these genes we detected, by means of the EHH test, variability patterns that are compatible with the hypothesis of local positive selection in Europeans (OCA2, TYRP1 and KITLG) and in Asians (OCA2, DCT, KITLG, EGFR and DRD2), whereas signals were scarce in Africans (DCT, EGFR and DRD2). Furthermore, a statistically significant correlation between genotypic variation in four pigmentation candidate genes and phenotypic variation of skin colour in 51 worldwide human populations was revealed. Overall, our data also suggest that light skin colour is the derived state and is of independent origin in Europeans and Asians, whereas dark skin color seems of unique origin, reflecting the ancestral state in humans.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web Resources
  10. References

The skin is the largest organ in the human body and serves as a barrier between the organism and the environment (Jablonski, 2004). It is involved in a wide range of critical roles in maintaining body integrity, including defense against pathogens, homeostasis, thermoregulation, and protection against harmful effects of UVB radiation. Skin pigmentation, which is the key factor for protecting against the harmful effects of UVB radiation, is determined both by the interaction of genetic factors including pigmentation genes and hormones, as well as by environmental factors such as the individual's age, the region of skin (Jablonski, 2004) and the amount of UV exposure (Tadokoro et al. 2005). The pattern of melanosome distribution within the epidermis and the quantity, as well as the type, of melanin comprise the primary determinants of colour (Thong et al. 2003).

Differences in skin pigmentation are observed within and between human populations (Jablonski, 2004). The geographic distribution of phenotypic variation in skin pigmentation tends to show sharp gradients between populations of different continents (Parra et al. 2004). Almost 85% of the total variance of skin colour is explained when human populations are grouped at the continental level (Relethford, 2002). Therefore, skin colour has been traditionally used to classify human individuals into groups (Romualdi et al. 2002), despite the fact that the evolutionary and functional mechanism shaping pigmentation differences are not sufficiently understood.

A large number of hypotheses involving genetic adaptation have been suggested to explain the phenotypic variation of human skin pigmentation, including protection against the harmful effects of UVB radiation, heat load, concealment, resistance against pathogens and resistance against cold injury (for a review see Robins, 1991). In addition, sexual selection has been proposed to explain the lighter constitutive pigmentation of females relative to males (Aoki, 2002). Many genes were previously suggested as candidates for human skin pigmentation due to their involvement in human pigmentation disorders such as oculocutaneous albinism (OCA), or in the pigmentation of model organisms, such as the OCA2/‘p’ gene, KITLG and SLC45A2 (for a review see Slominski et al. 2004). However, only in a few cases have the patterns of genetic variation in genes associated with skin pigmentation been correlated with changes in melanin content (i.e. Akey et al. 2001; Shriver et al. 2003) and, to the best of our knowledge, only a few genes associated with human skin pigmentation [MC1R (Rana et al. 1999; Harding et al. 2000), SLC45A2 (Soejima et al. 2006) and SLC24A5 (Lamason et al. 2005)] have been specifically tested for evidence of positive selection. A recent study based on a full genome scan for signatures of positive selection using the database of the International HapMap project database found evidence for only four genes being associated with skin pigmentation OCA2, MYO5A, DTNBP1 and TYRP1 (Voight et al. 2006).

Limited knowledge is available on the evolutionary history of human skin pigmentation. Dark skin pigmentation has been suggested as the ancestral trait in humans (Jablonski & Chaplin, 2000). If this is true, light skin pigmented populations could have arisen after humans spread from Africa into the rest of the world, according to the “Out of Africa” hypothesis about 100,000-150,000 years ago (Cavalli-Sforza & Feldman, 2003). On the other hand, it could be imagined that light skin had already arisen in Africa, for instance in the Khoisan who appear in the most basal branch of a tree of worldwide Y chromosome diversity (Underhill et al. 2000), and who have somewhat lighter skin colour than other African groups (Jablonski & Chaplin, 2000). What also remains unclear is whether light skin pigmentation arose independently and more than once in different populations (i.e. Europeans and Asians), as well as whether some dark skinned populations (e.g. New Guineans) derived secondarily from already lightly pigmented populations and acquired dark pigmentation as a secondary trait (Diamond, 2005).

To shed light on the evolutionary genetic history of human skin pigmentation we performed a survey of genes putatively associated with pigmentation in humans to search for evidence of positive selection. We inspected a set of 118 putative pigmentation genes in the Perlegen SNP dataset (Hinds et al. 2005), comprising Europeans, Africans and Asians, from which 55 genes contained suitable SNP information for detailed analysis.

Material and Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web Resources
  10. References

Gene Ascertainment

We ascertained putative mammalian genes involved in the skin pigmentation pathway from the literature prior to June 2005 (Slominski et al. 2004; Imokawa, 2004) and from gene expression databases (Hill et al. 2004; UniGene Build #188), while focusing on genes related to skin pathologies with a modifying effect on skin pigmentation. The recently described SLC24A5 gene (Lamason et al. 2005) was not included, since the gene ascertainment was carried out prior to Lamason et al's publication. We then used the Perlegen database (Hinds et al. 2005), comprising Africans, Europeans, and Asians, to study the genetic variability of SNPs within each gene, as well as in the respective surrounding regions of the human genome. This database was chosen over the HapMap database as the SNP ascertainment bias is higher in the HapMap dataset than in the Perlegen dataset (Clark et al. 2005).

Testing for Interpopulation Differentiation

We applied a sliding window approach in order to detect regions of unusual patterns of genetic variation between populations within the putative skin pigmentation genes.

This approach is based on comparing the observed value of a particular statistic in one of the windows with the observed value in windows of the same size in the rest of the genome. Regions showing an excess of statistically significant empirical p values suggest interesting candidates for further statistical analyses (Marques-Bonet et al. 2005). Since the average SNP density in the Perlegen dataset is one SNP in each two kb (Hinds et al. 2005), each window was centered on each SNP with 2.5 kb on both sides of the SNP, thus including (on average) three SNPs per window. An alternative approach, based on the number of markers within the window, was not applied because the distribution of the Perlegen SNPs is not homogeneous (Hinds et al. 2005).

In addition, since each window centres on one SNP, therefore different sliding windows could refer to the same set of SNPs. To avoid this, only sliding windows that were non-overlapping were considered as informative. Furthermore, if the number of considered SNPs was small, all windows would either contain one SNP or contain the same set of SNPs, thus not being informative. We applied a cutoff of ten SNPs for considering a gene for further analyses. This minimum number of SNPs was chosen based on the fact that, even if the SNPs are homogeneously distributed along the chromosomes, the number of non-overlapping windows encompassing 10 SNPs is still extremely small (i.e. four non-overlapping windows).

For the sliding window analysis we applied the informativeness of assignment (In) index (Rosenberg et al. 2003), computed for each SNP. In is a genetic distance measure based on the information content a locus contains to differentiate groups of individuals (Lao et al. 2006). This measure ranges from zero to the natural logarithm of the number of groups of populations considered (three in this case) (Rosenberg et al. 2003). An interesting feature of this statistic over other statistics such as Fst is that it is more informative with respect to the number of populations that are differentiated by a particular marker. However, it can be considered equivalent to Fst when this statistic is computed between pairs of populations (Rosenberg et al. 2003). The mean In was computed considering all SNPs within each window, and this value was compared with an empirical distribution of mean In obtained in windows of the same size and with the same number of markers from 10,145 genes (>10 SNPs) from the entire genome and described in the Perlegen database (genes on the X chromosome were excluded because of the different effective population size for this chromosome (Schaffner, 2004)).

We computed how often the number of statistically significant sliding windows found for the candidate genes could be found for other genes in the Perlegen database. This allowed us to obtain estimates of the statistical significance of the number of significant sliding windows for gene. We defined two stringent cutoffs for considering a sliding window statistically significant: α= 0.02 and α= 0.01. We computed for each gene the probability that a window is statistically significant given that α is the cutoff with:

  • image

where Ws is the number of non-overlapping statistically significant windows and WN is the total number of windows. We then applied the empirical distribution of this statistic based on the 10145 Perlegen genes, to compute the probability of observing an equal or greater frequency of statistically significant windows for each of the 55 candidate genes, and excluded genes not showing statistical significance (p>0.05).

EHH Test

The Extended Haplotype Homozygosity (EHH) test is based on the expectancy that increasing the frequency of a neutral variant requires a long time, allowing for the length of haplotype sharing to decay due to recombination; this is of special interest in the case of recent (∼10,000 years) moderate selective sweeps (Sabeti et al. 2002). We applied the EHH test to the significant candidate genes from the sliding window analysis. For every gene core haplotypes were defined centered on the region(s) with the largest number of significant sliding windows. The EHH statistic was computed for the surrounding SNPs until it reached a value < 0.1. We corrected for the fact that patterns of recombination are not equally distributed throughout the genome by using the recombination estimates computed for the Perlegen dataset (Myers et al. 2005). Both the phase of the haplotypes and the recombination units between markers were obtained from the Perlegen website (http://www.perlegen.com). To quantify how often a particular EHH value was found in the genome we performed the following approach: i) we selected core haplotypes from the genome that were of similar length (at least equal or over 90% of the length) and similar frequency as the observed core haplotype (frequency of observed core haplotype +/−0.1), ii) the EHH value was computed for SNPs at the same genetic distance in cM from the core haplotype as the observed EHH, and iii) we counted how often the computed EHH in other regions of the genome was larger than the EHH observed for the gene. The empirical p value was then computed by dividing this value by the total number of EHH statistics computed in the rest of the genome. Since the EHH depends not only on genomic factors but also on the number of markers considered, we only used regions with less than or equal numbers of SNPs in the region including the core haplotype and its surroundings. It should be noted that this correction is extremely conservative, but ensures that statistically significant values are not influenced by the SNP distribution. Thus, we obtained an empirical distribution based on the rest of the genome. Bifurcation diagrams as well as EHH plots were computed with the Sweep software (Sabeti et al. 2005).

Additional SNP Ascertainment and Genotyping

We analysed the biological processes of each significant candidate gene from the EHH test in the Gene Ontology Database (Ashburner et al. 2000) and the NCBI UniGene expression database (Wheeler et al. 2006). We ascertained genes for additional SNP ascertainment where our uncertainty about the biological process that had driven the positive selection was smallest; that is, we ascertained genes whose genetic variation can mainly be explained by their association with pigmentation but not by other biological processes in humans. SNPs for genotyping were selected according to the following criteria: i) it lay within the gene and the core haplotype and ii) showed large genetic distances between Europeans, Africans and Asians from the Perlegen dataset. If possible we concentrated on non-synonymous SNPs that changed the amino acid in the respective protein; however, some of the ascertained SNPs were within introns. Each SNP was typed in the CEPH-HGDP comprising 1064 samples from 51 human globally distributed populations, including all continental regions: America, Central and East-Asia, Europe, Middle East, North Africa, Sub-Saharan Africa and Oceania (with the exception of Australia). We excluded from the statistical analysis 16 individuals because of previously identified labelling errors or sample duplications in the CEPH-HGDP (Rosenberg et al. 2002, 2003; Mountain & Ramakrishnan, 2005). SNP genotyping was performed using TaqMan technology: 3 ng of DNA were dried in the open air in 384-well plates (Applied Biosystems clear optical reaction plate) and amplified in a total volume of 2 μl using 1 μl ABsolute QPCR rox mix (ABgene, Epsom, UK), and either 0.1 μl of 20x Custom TaqMan® SNP Genotyping Assays (Applied Biosystems) or 0.05 μl of 40x TaqMan® SNP Genotyping Assays (Applied Biosystems), initially for 15 min at 95°C followed by 40 cycles of 15 sec at 95°C and 1 min at 60°C in a GeneAmp 9700 PCR machine (Applied Biosystems) with a subsequent end-point read on an Applied Biosystems 7900HT Fast Real-Time PCR System as recommended by the manufacturer. The SNPs rs2762464, rs1800414, rs1448484, rs16891982 were typed using the commercially available TaqMan® SNP Genotyping Assays C__15931130_10, C___8866240_10, C___8866152_20 and C___2842665_10 (Applied Biosystems), respectively. Primer and probe sequences for rs3782974, typed using Custom TaqMan® SNP Genotyping Assays, are provided in Table 1S in the supplementary material.

MDS, STRUCTURE and Multiple Regression Analyses

I n was computed for each pair of populations considering all SNPs at the same time, and the genetic distance matrix between pairs of populations was plotted by means of Multidimensional Scaling (MDS, Kruskal & Wish, 1990). Additionally, a STRUCTURE analysis (Pritchard et al. 2000) was performed considering 3, 4 and 5 groups. We tested the amount of variation explained among groups of populations by means of AMOVA (Excoffier et al. 1992) using the Arlequin 3.0 software (Excoffier et al. 2005). The degree of skin pigmentation was computed for each of the 51 HGDP populations used for SNP typing from the index map of R. Biasutti (Parra et al. 2004). This map interpolates the skin pigmentation index for regions without available data (Jablonski, 2004), and therefore the results should be considered with care. We performed a multiple regression analysis between the level of skin pigmentation and the frequency of each of the five SNPs in the CEPH-HGDP populations using STATISTICA software (StatSoft, 2001) by assuming an additive model. In addition, we used published data on skin reflectance obtained at 685 nm wavelengths, available for only a subset of 19 CEPH-HGDP or neighbouring populations (Jablonski & Chaplin, 2000). This measure provides the most reliable means of determining skin melanin concentration but does not summarize the skin colour, which is related to other factors (Robins, 1991). With this second dataset we also performed a multiple regression analysis assuming an additive model.

Correlation Between Genetic, Phenetic and Geographic Distances

We compared two distance matrices by means of a Mantel test using the PASSAGE 1.1 software (Rosenberg, 2001). The first matrix was based on the In value between pairs of populations and the second was based on the difference of skin colour measures. Each correlation was performed keeping geography constant by using a geodesic distance matrix assuming an Out of Africa model (Ramachandran et al. 2005) in all the Mantel tests. Two different models for the origin of skin pigmentation were considered when computing the skin pigmentation matrix. In the first it was assumed that the phenotypic similarity in the degree of skin pigmentation between each pair of populations was due to the presence of the same genetic variants. The skin pigmentation distance (D) between each pair of populations was then computed as the absolute difference of the pigmentation measure (P) of each population:

  • image

The second model assumed that dark skin pigmentation is the ancestral state and is of unique origin, and that the similar phenotype between Europeans and East Asians is due to the presence of different genetic variants (that is, the same phenotype has appeared independently in both continents). In order to account for this hypothetically independent origin, the total phenotypic distance was computed as the sum of the phenotypic distances of each population to the African population, which (according to this model) would represent the ancestral skin pigmentation colour. Given the skin colour diversity that exists within the African continent (Relethford, 2000), we computed the mean P value of sub-Saharan African populations and used it in the comparisons between the European and East Asians populations:

  • image

where inline image refers to the mean P of the sub-Saharan African populations. Since American populations are more related to Asian populations than to Europeans (Cavalli-Sforza & Feldman, 2003), the same phenotypic distance was computed when comparing American populations with European populations.

The phenotypic distance matrix was computed twice, one considering the Biasutti measure of skin pigmentation and other considering the Jablonski et al. 685 nm reflectance measure (see previously). These data are available as Supplementary Table 1.

Results

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web Resources
  10. References

We applied a six-step hierarchical approach to identify candidate pigmentation genes with signatures of local positive selection that can most plausibly explain global phenotypic variation of human skin colour.

In Test for Population Differentiation in Genes Putatively Involved in Skin Pigmentation

First, we identified 118 genes putatively involved in skin pigmentation (see Supplementary Information for complete list of genes) and recovered their SNP information from the Perlegen database. Second, we excluded 63 candidate genes with non-informative SNP content from the Perlegen database (<10 SNPs per gene; see Materials and Methods). Third, we applied a sliding window approach based on the informativeness of assignment (In) index (Rosenberg et al. 2003) to the 55 candidate genes that contained suitable SNP information, to search for candidate pigmentation genes with large differences between the three continental populations. We computed the probability of observing an equal or higher frequency of statistically significant sliding windows for all other genes in the genome. Eight genes showed statistical significance at two stringent sliding window p values (0.02 and/or 0.01; see Table 1): SLC45A2, TYRP1, DCT, PPARD, DRD2, EGFR, OCA2 and KITLG. The first three genes were statistically significant using both cutoffs, and were strongly statistically significant when the α threshold for considering a sliding window statistically significant was lowered to 0.01.

Table 1.  Genes putatively involved in human skin pigmentation, their chromosomal position, evidence of association to skin pigmentation and the empirical p value when comparing the frequency of statistically significant non-overlapping sliding windows (with p value <0.02 and p value< 0.01; see Material and Methods)
GeneLocationPigmentation evidenceReferencep value of the sliding window with statistical significance < 0.02p value of the sliding window with statistical significance < 0.01
SLC45A2 5p13.3OCA4(Newton et al. 2001)0.0040.002
TYRP1 9p23OCA3(Boissy et al. 1996)0.0020.017
DCT 13q32Slaty mutation(Jackson et al. 1992)0.020.009
OCA2 15q11.2-q12OCA2(Brilliant, 2001)0.0520.041
KITLG 12q22Steel phenotype in mice(Bennett & Lamoreux, 2003)0.049>0.1
EGFR 7p12Dsk5 mutation in mice. Restricted to footpads. Increased number of melanocytes(Fitch et al. 2003)0.0350.041
PPARD 6p21.2-p21.1Expressed in vitro in melanocytes.(Kang et al. 2004)0.0410.02
DRD2 11q23Melanogenesis. Knockout mice with dark coat(Slominski et al. 2004)0.0560.028

EHH Test in Candidate Pigmentation Genes with Significant Population Difference

Fourth, to search for traces of positive selection the EHH test was applied to the eight significant genes from the sliding window analysis using the respective region with the largest statistically significant sliding window as the core haplotype (for p-value computation see Methods). Statistically significant EHH values were observed for six of the eight genes analyzed (OCA2, TYRP1, DCT, KITLG, DRD2 and EGFR) at both sides of the respective core haplotype, and at least in one of the populations analysed (Figures 1 and 2; Supplementary Figure 1). SLC45A2 showed only three SNPs with statistically significant EHH (Supplementary Figure 1a–c), namely in the European population in a core haplotype with a frequency >93% of all the chromosomes. In the case of PPARD, only the Asian population had a marginal p value (p = 0.01) in one SNP 0.006 cM downstream of the core haplotype (Supplementary Figure 1m–o). For OCA2, statistically significant EHH values from the core haplotypes were found at 0.1 cM in Europeans in the first region (around rs1800414), as well as in Europeans and Asians at >0.2 cM in the second region (around rs1448484). Interestingly, in the latter case the main core haplotype reached frequencies >70% in both populations but with completely different haplotypes (see Supplementary Figure 1d-f and Figure 1g–i). For TYRP1, one core haplotype with a frequency of 56% in the European population showed a statistically significant larger EHH than expected, both upstream (0.03 cM) and downstream (0.2 cM; see Figure 1a–c). Both the African and Asian populations showed a highly frequent core haplotype (70% and 100%, respectively), that was marginally statistically significant for only short distances (0.006 cM) in the case of the African population when compared with the rest of the genome. For DCT, two frequent core haplotypes were detected (see Figure 2a–c): the first one was highly frequent in the Asian population (comprising almost 75% of all the chromosomes) whereas the second one was present in the European and the African populations. However, only in the case of Asia (at 0.19 cM) and Africa (at 0.19 cM) could we find statistically significant EHH values (see Figure 2a–c). The EHH analysis performed in the defined core haplotype of the KITLG gene showed one large and frequent core haplotype for European and Asian populations (representing 77% and 83% respectively) that was highly statistically significant at large distances (0.1 cM in the case of Europeans and 0.21 cM in the case of the Asian population, see Supplementary Figure 1j–l). For EGFR, both the Asian population and the African population showed statistically significant EHH values. In the case of the Asian population the statistically significant EHH values were found at 0.027 cM downstream and 0.03 upstream (see Supplementary Figure 1p–r). In the case of the African population, statistically significant EHH values were found at 0.067 cM upstream (see Supplementary Figure 1p–r). These two populations also showed statistically significant EHH values in the case of DRD2, at 0.02 cM upstream in the case of Asia and 0.01 cM upstream in the case of Africa (see Supplementary Figure 1s–u).

image

Figure 1. Sliding window and haplotype analyses performed in the genomic region of the TYRP1 gene. Extended homozygosity versus genomic distance to the core haplotype defined from rs2733834 to rs17346748 in the core haplotypes with a frequency >50% (a), statistical significance of the EHH values obtained in the case of core haplotypes with a frequency >50% (see text for details) (b), and haplotype bifurcation plot of core haplotypes with a frequency >50% in at least in one population (c) with the putative positively selected haplotype highlighted in red.

Download figure to PowerPoint

image

Figure 2. Sliding window and haplotype analyses performed in the genomic region of the DCT gene. Extended homozygosity versus genomic distance to the core haplotype defined from rs3782974 to rs7987802 in the core haplotypes with a frequency >50% (a), and statistical significance of the EHH values obtained in the case of core haplotypes with a frequency >50% (see text for details) (b), and haplotype bifurcation plot of core haplotypes with a frequency >50% in at least in one population (c) with the putative positively selected haplotype highlighted in red.

Download figure to PowerPoint

Population Relationships Based on Core Haplotype SNPs from Pigmentation Genes

Fifth, in order to study the geographic distribution of genetic variants most likely associated with skin pigmentation, we checked for the expression patterns of these eight genes in the NCBI UniGene Database (Wheeler et al. 2006) as well as looking at the current knowledge of their functional involvements in the Gene Ontology Database (Ashburner et al. 2000). Four of the genes (SLC45A2, TYRP1, DCT and OCA2) are mainly associated with pigmentation and are highly preferentially or exclusively expressed in skin tissue. KITLG plays important roles in organ morphogenesis and cell proliferation, and is widely expressed in a large number of tissues including skin. PPARD is also involved in a large number of biological processes, including apoptosis, embryo implantation and glucose and lipid metabolism, and is widely expressed in tissues including skin. DRD2 plays an important role in the nervous system development and in the synaptic transmission, and is mainly expressed in nerve tissues and in skin, whereas EGFR has a large number of different cellular functions and is expressed in a large number of tissues including skin. We therefore ascertained SNPs from SLC45A2, TYRP1, DCT and OCA2 (one SNP from each of the identified significant regions, and in the case of OCA2 two SNPs, see Methods for Procedure) and typed them in the CEPH-HGDP samples, which comprises 51 different worldwide populations (Table 2).

Table 2.  Five SNPs ascertained from the four best candidate genes involved in human skin pigmentation, with their informativeness of ancestry (In) from the three Perlegen populations, percentage of the maximal obtainable In (ln(2)), information on differentiated populations by means of genetic variation foundt in the gene, the ancestral state of each SNP based on NCBI data, and the position of the SNPs within each gene.
GENESNP Id I n % max. InGenetic differentiation (frequent allele)Ancestral alleleSNP position
DCT rs37829740.21430.9Asian (A) vs. RestTIntron
OCA2 region 1rs18004140.33147.78Asian (G) vs. RestAExon, non-synonymous
OCA2 region 2 rs14484840.33748.57European and Asian (T) vs. RestCIntron
TYRP1 rs27624640.19127.6European (T) vs. RestAIntron
SLC45A2 rs168919820.55680.25European (G) vs. RestCExon, non-synonymous

Figure 3 shows a MDS plot (with a stress of 0.082) of the genetic distance matrix computed between pairs of populations, considering all five SNPs at the same time and all 51 populations from the CEPH-HGDP. Sub-Saharan African populations cluster together, being rather close to the “Melanesian” population from the Bougainville Island of Papua New Guinea (PNG). Europeans, Middle Easterners, North Africans, Asians, the second sample from PNG, and Native Americans cluster in a second group. Within the latter group, the distribution of populations in the plot follows a West to East gradient, with somewhat of a separation of European populations from Pakistani/S-Asians/Middle Eastern/N-African from East Asians from Native Americans. Results from the individual-based STRUCTURE analysis confirmed the visual groupings from the population-based MDS plot in respect of the two major clusters when considering 3 pre-defined groups (Figure 4), and with further separation of Native Americans, Europeans, East Asians, and C/S-Asians/Middle Eastern/N-Africans, while keeping the separate Africa-Bougainville cluster when using 5 pre-defined groups (data not shown). The amount of genetic variation explained among the two clusters was FCT = 0.33, p < 0.00005; for three clusters (considering Native Americans separately) was FCT = 0.28, p < 0.00005; and for all five clusters was FCT = 0.4, p < 0.00005. These values are extremely high compared with those observed in neutral variants (Romualdi et al. 2002).

image

Figure 3. MDS representation of a matrix of In values between pairs of populations using the genetic variation observed with five SNPs ascertained from four genes putatively involved in human skin pigmentation and analysed in the CEPH-HGDP samples (see text for details). Stress = 0.082. The Biasutti skin pigmentation index of each population is included.

Download figure to PowerPoint

image

Figure 4. Ternary plot based on the proportion of membership of CEPH-HGDP populations computed with the STRUCTURE program using five SNPs ascertained from four genes putatively involved in skin pigmentation (for details see Material and Methods). An admixture model was assumed. The clustering shown is based on 3 pre-defined groups. Burning period = 50,000; number of MCMC after burning = 100,000.

Download figure to PowerPoint

Correlation of Skin Color and SNP Variation of Pigmentation Genes

In a sixth step to examine whether the genotypic variation we observed in these five pigmentation genes correlates with the phenotypic variation in the skin colour two different measures of the degree of pigmentation were used (see Materials and Methods and Table 3). Multiple regression analysis, assuming an additive model between the Biasutti skin colour dataset (Parra et al. 2004) and the genetic diversity in the five SNPs (considering all 51 populations), showed a statistically significant and highly positive correlation (r = 0.9, p < 0.000005) implying that the genetic diversity of these five SNPs was able to explain 82% of the total adjusted variance observed. Similar results were obtained when considering the measures of skin reflectance observed at 685 nm (Jablonski, 2004), (r = 0.88, p = 0. 0016; 70% of total adjusted variance explained, see Supplementary Figure 2), although only 19 groups could be used due to reflectance data limitations.

Table 3.  Beta standardized slopes of each SNP and associated p value when performing a multiple regression between the frequencies of one of the alleles of each SNP in the CEPH-HGDP populations, and two different measures of skin pigmentation for the same populations (see Materials and Methods for details). The percentage of adjusted variance was 0.80 in the case of the Biasutti dataset, and 0.69 in the case of the 685 nm dataset. In both cases the multiple regression was statistically significant (p < 0.01).
SNP markerBiasutti dataset685 mm dataset
rs37829740.1470.1240.04070.899
rs14484840.480.000037−0.3690.192
rs18004140.3620.0016−0.270.208
rs27624640.0160.901−0.5020.283
rs1689192-0.5620.000361.3179150.068

Furthermore, a Mantel test was used to test which evolutionary scenario was the most plausible, given the genetic variation observed in these five SNPs and the geographic distribution of skin pigmentation in the human populations analyzed. The first evolutionary scenario assumed that phenotypic similarities are due to the same genetic variants and did not make assumptions about the ancestral state of skin pigmentation, whereas the second assumed that dark skin pigmentation is the ancestral state and that the skin pigmentation phenotype in Europeans and East Asians is due to the presence of different genetic variants (see Material and Methods). The correlations between the genetic and phenotypic distance matrices were positive and statistically significant in both evolutionary scenarios, independent of the skin pigmentation dataset used (see Table 4). However, the observed correlation in the first model strongly decreased when the sub-Saharan African populations were excluded from the analysis (from r = 0.38 to r = 0.02 in the case of the Jablonski dataset, and from r = 0.52 to r = 0.2 when using the Biasutti dataset), but such a decrease was not observed when applying the second evolutionary model.

Table 4.  Mantel test correlation values obtained by comparing a distance matrix between pairs of populations based on the In value obtained from five SNPs, and distance matrix based on a measure of skin pigmentation (see Materials and Methods) for two evolutionary scenarios (see text).
Evolutionary scenarioWorld populationsWithout sub-Saharan populations
685 mm dataset, 19 populations (p value)Biasutti dataset, 51 populations (p value)685 mm dataset (p value)Biasutti dataset (p value)
 
Scenario 10.38 (p = 0.002)0.52 (p = 0.001)0.03 (p = 0.848)0.2 (p = 0.002)
Scenario 20.45 (p = 0.001)0.67 (p = 0.001)0.67 (p = 0.001)0.76 (p = 0.001)

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web Resources
  10. References

We have surveyed the literature for genes putatively associated with skin pigmentation in mammals and used the Perlegen SNP database in order to i) identify major genes carrying genetic variation associated with phenotypic differences in skin pigmentation between populations, ii) test whether the observed patterns of variability of candidate genes within each population are compatible with a model of neutral evolution, iii) test whether the origin of the phenotypic variation of skin pigmentation is unique in human populations.

Genetic Covariance with the Skin Pigmentation Phenotype

It is already known that the large skin pigmentation differences between continental groups of human populations do not resemble neutral variation in these populations (Relethford, 2002). Thus, pigmentation genes with significant differences between the three continental Perlegen populations studied here are the best candidates to explain the phenotypic variance in pigmentation between these populations. Testing for interpopulation differentiation was performed by means of a sliding window approach using the In statistic. The In statistic was preferred to other widely used statistics (such as Fst values between pairs of populations) because it reduces the number of comparisons per SNP, and it is not biased against any evolutionary model of population differentiation. This is of particular importance in the case of skin pigmentation, where similar phenotypes between particular populations, i.e. some European and East Asian populations, could suggest the same evolutionary origin, whereas a different distribution pattern of melanosomes within the keratinocytes has been suggested for Europeans and Asians (Thong et al. 2003).

Using the In-based sliding window approach we identified eight candidate genes for explaining phenotypic differences between the three populations: TYRP1, DCT, KITLG, SLC45A2, OCA2, EGFR, PPARD and DRD2. TYRP1 is involved in the stabilization and maintenance of tyrosinase protein levels and could be involved in regulating melanosome maturation (Alaluf et al. 2003); mutations in TYRP1 cause oculocutaneous albinism in humans (Sturm et al. 2001). DCT is essential in the conversion of DOPAchrome to DHICA in the eumelanin pathway (Wang & Hebert, 2006). In both cases the constitutive levels of protein expression are typically higher in African individuals, and increase in the skin of Europeans, East Asians and particularly African populations after UV exposure (Tadokoro et al. 2005; Alaluf et al. 2003). KITLG is essential to the normal development and migration of melanocyte lineages (Wehrle-Haller, 2003) and plays an important role in UVB-induced pigmentation (Imokawa, 2004); furthermore, mutations in this gene produce the “steel” phenotype in mice (Bennett & Lamoreux, 2003). Both the SLC45A2 and OCA2 proteins are putative melanocyte membrane transporters, and appear to be critical for the proper maturation, processing and trafficking of tyrosinase to post-Golgi melanosomes (Wang & Hebert, 2006); furthermore, both have been associated with oculocutaneous albinism (Sturm et al. 2001). In addition, OCA2 is thought to be involved in iris pigmentation (Frudakis et al. 2003). EGFR plays an essential role in the control of both normal and malignant cell growth (Ji et al. 2006), and the mutation Dsk5 in the mouse Egfr gene is associated with dark skin in the footpads due to increase in the number of melanocytes (Hill et al. 2004). PPARD is expressed in melanocytes (Kang et al. 2004) as well as in brain and colon, and might play a role in cholesterol efflux (Oliver et al. 2001), colon cancer (He et al. 1999), embryo implantation (Lim et al. 1999), preadipocyte proliferation (Hansen et al. 2001) and epidermal maturation (Matsuura et al. 1999). DRD2 is associated with physiological functions related to locomotion, hormone production and drug abuse (Usiello et al. 2000), and knockout Drd2 mice have darker coats than wild type mice (Yamaguchi et al. 1996).

Presence of Positive Selection in the Ascertained Pigmentation Candidate Genes and the Evolutionary History of Skin Pigmentation

It has been suggested that the phenotypic variation in skin pigmentation in human populations is mainly driven by positive selection (Robins, 1991). We formally tested for the presence of signatures of positive selection by means of the EHH test in the subset of genes described above. Only TYRP1, DCT, KITLG and OCA2 showed strong signatures of positive selection, mainly in Europeans (OCA2, TYRP1 and KITLG) and East Asians (OCA2, DCT, KITLG, EGFR and DRD2), but not in Africans. Of the two genes (OCA2 and KITLG) showing evidence for positive selection in Europeans and Asians, only for KITLG was this based on the same core haplotype in both populations, whereas different haplotypes suggesting different selection episodes were found for OCA2. Surprisingly, we found no convincing evidence of positive selection for SLC45A2, despite a recent study that revealed signatures of positive selection based on sequence data (Soejima et al. 2006). One possible explanation is that the power to detect positive selection by means of EHH is low when the frequency of the core haplotype is extremely high (Sabeti et al. 2002), as in the case of SLC45A2.

Our results support the hypothesis that skin pigmentation has not evolved neutrally in the human species, but that populations out of Africa have undergone positive selection for skin pigmentation. Furthermore, these recent selective events would have occurred (at least partially) independently in Europeans and Asians. We cannot assess from the genetic data whether the environmental factors that drove the positive selection in the two continental regions were the same or not. In addition, the presence of weak traces of positive selection in Africans (in contrast to strong evidence in Europe and Asia), as found here, would imply that dark skin colour is the ancestral state of human populations. Nevertheless, it has to be noted that by applying the EHH test only recent signatures of positive selection, e.g. up to ∼10,000 years (Sabeti et al. 2002), are traceable, and it could be expected that natural selection involving human pigmentation goes back further in time than this.

Our findings and conclusions contrast with those reported recently by Izagirre et al. (2006). Based on three different evolutionary analyses they found signatures of positive selection in Europeans and Africans, but not in East Asians (and for different genes than these highlighted in this and other studies). Following our approach, some a priori interesting candidates for explaining differences in skin pigmentation (i.e. ASIP or M1CRBonilla et al. 2005; Garcia-Borron et al. 2005; Harding et al. 2000) had to be discarded when applying the sliding window cut off due to limited resource data, and the number of genes we studied is comparatively small. Therefore, we cannot discard the presence of positive selection in Africans in other skin pigmentation genes not considered in our study. However, this cannot explain why Izagirre et al. did not find signatures of positive selection in East Asians. In our opinion, their results should be treated with caution, since they not only disagree with our results but also with those of other studies (Soejima et al. 2006; Voight et al. 2006). The results and conclusions from Izagirre et al. (2006) are strongly biased by several factors. First, the applied SNP ascertainment was biased against SNPs with a major allele frequency (MAF) of >0.1 (and preferentially 0.2) in at least one population. This strategy will tend to bias the neutral distribution of genetic variation towards higher Fst values, thus reducing the power for detecting outliers. In addition, this strategy necessarily excludes SNPs that are differentially fixed in different populations, such as rs16891982 in SLC45A2 (Nakayama et al. 2002) or rs1426654 in SLC24A5 (Lamason et al. 2005). Second, they applied the Fst pairwise approach that on one hand creates a multiple test problem, and on the other creates a bias towards detecting differences between Asians/Europeans on one side and Africans on the other side. Third, they applied a phylogenetic approach for gene ascertainment, which from our point of view makes limited sense when investigating a phenotype such as human pigmentation that shows variation between human populations.

Skin Color and the Dispersal History of Modern Humans

Clearly, analysing three populations from Africa, East Asia, and Europe is not informative when trying to decipher whether dark skin pigmented populations out of Africa (i.e. in Oceania) have retained the ancestral state or have acquired new variants and reverted to the ancestral pigmentation phenotype. Additional analyses were performed with a selected subset of five SNPs from four ascertained pigmentation genes (OCA2, TYRP1, DCT, and SLC45A2) in the CEPH-HGDP panel. It is noteworthy that only two out of five SNPs (from SLC45A2 and OCA2) lead to amino acid changes, and thus serve as putative functional variants for the pigmentation phenotype. The other three SNPs analysed may be associated with the phenotype by linkage disequilibrium with the true, underlying, and still unknown, genetic variant that causes the phenotypic variation. Nevertheless, these five SNPs were able to explain a large fraction of the phenotypic skin pigmentation variation observed between human populations (70–80% depending on the skin pigmentation dataset when assuming an additive model). Since this analysis was performed using population genetic and phenotypic frequencies (although from the same or similar populations), but not individual genotype-phenotype measures, it might be expected that this correlation becomes smaller when taking into account individual data.

Our results showed that African populations tended to carry the ancestral alleles of the studied SNPs (Table 2), and that human populations with dark skin colour tended to cluster together in the MDS and STRUCTURE analyses. The clustering of the Bougainville Islander population reflects the fact that this relationship is not (or at least not solely) caused by geography but also by the underlying pigmentation genes. Bougainville Islanders from Papua New Guinea are known to be one of the most highly pigmented people in the world (Norton et al. 2005; Diamond, 2005). In our results they appeared to be closer to Sub-Saharan Africans populations - with whom they share the dark skin colour phenotype – than to the second Papuan New Guinea sample in the dataset, with whom they share their recent population history, as observed in several datasets based on neutral genetic variation from autosomal, Y-chromosomal and mtDNA analyses (Rosenberg et al. 2002; Rosenberg et al. 2005; Scheinfeldt et al. 2006, Merriwether et al. 1999). This can be partially explained by the high allele frequency of the derived allele of rs1448484 in Papuan New Guineans, which is almost absent from Bougainvillians (Fst= 0.65, p < 0.00005). Further examples are the Native American populations, which are genetically related to Siberian populations as shown by neutral Y chromosome and mtDNA analyses (Schurr & Sherry, 2004; Lell et al. 2002), whereas with neutral autosomal markers they appear more related to East Asians (Rosenberg et al. 2005). However, in our pigmentation gene based analyses all Native American populations appeared distant to Pakistanis (as well as other Eurasian groups). Native Americans share the derived allele of rs3782974 from DCT (Fct= 0.35, p < 0.001) at a high frequency, which is rare in Pakistan, as well as sharing the ancestral allele of rs1400414 from OCA2, which is rare in East Asians (Fct= 0.32, p < 0.00005). Further Mantel test analyses performed with these five SNPs showed that the skin pigmentation differences between pairs of populations can be better explained when we assume that dark skin pigmentation is the ancestral state, and that there was independent evolution of skin pigmentation in the European and Asian continents. Therefore, all these results support the hypothesis of a unique origin for dark skin pigmentation, rather than the appearance of new variants on top of a lightened skin pigmented context.

Conclusions

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web Resources
  10. References

With this study we provide evidence that the light skin pigmentation phenotype in humans has (at least) partially appeared more than once in Europeans and East Asians, whereas dark skin pigmentation seems of unique origin and reflects the ancestral state in humans. We would like to comment that our analyses were based on SNPs and the application of SNP haplotype based tests. It has been suggested that the SNP ascertainment bias of public databases (Nielsen & Signorovitch, 2003) can influence the results obtained when searching for evidence of positive selection (Soldevila et al. 2005). As we have seen here in case of SLC45A2, the power of the EHH test to detect positive selection is limited when the signature of the positive selection is almost fixed in the populations studied. Future studies using DNA sequence data and the classical test for detecting evidence for positive selection from sequence data should be performed in the most promising candidate genes, in order to confirm the promising results we have obtained here.

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web Resources
  10. References

We thank Chris Tyler-Smith for useful comments on an earlier version of the manuscript. We are grateful to the financial support provided by the Netherlands Forensic Institute and the Erasmus MC - University Medical Center Rotterdam.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web Resources
  10. References