Association of Polymorphic Sites in the OCA2 Gene with Eye Colour Using the Tree Scanning Method


Corresponding author: Wojciech Branicki, Fax: +4812 422 38 50. E-mail:


A number of genes are considered to affect normal variation in human pigmentation. Recent studies have indicated that OCA2 is the crucial gene involved in the high variation of iris colour present among populations of European descent. In this study, eleven polymorphisms of the OCA2 gene were examined in search of their association with different pigment traits. The evolutionary tree scanning method indicated that the strongest phenotypic eye colour variation is associated with the branch defined by nonsynonymous change rs1800407, which refers to amino acid causing change Arg419Gln located in exon 13. Single SNP analysis indicated that allele 419Gln is associated with green/hazel iris colour (p < 0.001). According to tree scanning analysis, the proportion of eye colour variation explained by this nucleotide position is merely 4%. Thus, additional variation present in the OCA2 gene and perhaps some other pigment related genes must be taken into account in order to explain the high phenotypic variation in iris colour.


Pigmentation is one of most striking human physical traits. Although it is assumed that pigment variation in man is controlled by more than a hundred genes, until now only the role of one locus has been explained fully (Bennett & Lamoreux, 2003). It has been shown that the melanocortin 1 receptor gene (MC1R, MIM# 155555) is responsible for most cases of red hair colour in humans. Population studies have disclosed a few nonsynonymous polymorphisms which strongly affect receptor performance and, in consequence, cause a significant excess of red or yellow pheomelanin relative to black or brown eumelanin pigment. This state is expressed in a characteristic phenotype of red hair colour and lighter skin often associated with freckling (e.g. Valverde et al. 1995; Box et al. 1997; Rana et al. 1999). The impact of the MC1R gene on pigmentation is very distinct, especially in populations of Northern European origin. Over the past several years, a number of important genes involved in pigmentation in man have been identified, often on the basis of DNA sequence homology with widely studied mouse pigmentary genes. They have been linked to different human Mendelian pigment disorders of which the best known example is oculocutaneous albinism (Rees, 2003). It is assumed that genes involved in such extreme phenotypes might also be responsible for normal variation in human pigmentation. Their role in determination of high variation in skin, hair and eye colour among humans is currently the subject of extensive studies. Work in this area has led to significant progress in the understanding of the genetic basis for eye colour determination (Sturm & Frudakis, 2004). Eye colour was formerly considered as a simple Mendelian recessive trait (Duffy et al. 2007). Although currently it is accepted that multiple genes contribute to the variety of iris hues, most recent findings suggest that a single gene plays the predominant role. A whole genome scan indicated that more than 70% of eye colour variation is due to the quantitative trait locus located on chromosome 15q (Zhu et al. 2004). This conclusion corresponds with some previous data indicating that variation within the OCA2 gene (MIM# 203200) located on chromosome 15q11.2-12 may be involved in differences in eye colour among humans (Rebbeck et al. 2002; Frudakis et al. 2003). We further addressed this issue by conducting a population association study in which we examined the relationship of eleven SNPs in the OCA2 gene with different pigment traits.

Material and Methods

Population Samples

The study was approved by the Bioethical Commission of the Jagiellonian University in Poland, decision no. KBET/17/B/2005. All participants of the study were informed that the samples were collected for the purpose of scientific research and became acquainted with the general idea of the project. Two bucccal swabs were taken from each of the 390 analysed unrelated individuals. Sample collection was preceded by phenotype examination performed by one dermatologist, who filled out a questionnaire containing phenotypic features for each subject. The questionnaire included basic information such as gender and age as well as data concerning eye colour classified into four categories (blue or grey, green, hazel, brown or black), hair colour classified into five categories (red or blond-red, blond, dark blond or brown, auburn and black) and skin phenotype, which was assessed according to the Fitzpatrick scale (I refers to the lightest and VI to the darkest skin colour) (Fitzpatrick, 1988).

DNA Extraction and Quantitation

Collected samples were subjected to DNA extraction using either the standard organic procedure as described in (Branicki et al. 2007) or the silica based method according to the procedure recommended by the manufacturer with minor modifications (A&A Biotechnology, Gdansk, Poland). Briefly, a cotton swab was cut and placed into a 2 ml tube and treated with 700μl of lysis buffer and 20μl of proteinase K (20 mg/mL) for 40 minutes in a waterbath maintained at a temperature of 37°C. Next, the lysis mixture was put onto a silicon column and subjected to centrifugation for 1 min at a speed of 11 000 rpm. The column was then carried to a new 2 ml tube, filled with 500 μL of washing buffer and subjected to centrifugation for 1 min at 11 000 rpm. The washing step was repeated with increased (2 min) centrifugation time. DNA was eluted into a new tube in 150 μL of elution buffer with 2 min centrifugation at 14 000 rpm. Total DNA concentration was measured with PicoGreen ®dsDNA Quantitation Kit (Invitrogen, Karlsruhe, Germany) and Fluoroscan Ascent FL (Labsystems, Helsinki, Finland).


PCR amplification of 10 DNA fragments encompassing 11 analysed SNP positions was carried out in two pentaplex reactions 1 and 2, using a Qiagen Multiplex PCR Kit (Qiagen, Hilden, Germany). The PCR reaction mixture consisted of 5 μL of Qiagen Master Mix, usually 1–10 ng of template DNA, primer mix and water up to 10 μL. Details concerning primer sequences and their concentrations are given in Supplementary Table S1. PCR reactions were carried out on a GenAmp 9700 thermocycler (Applied Biosystems, Foster City, CA), using the following temperature profile: 15 min/95°C; 1 min/95°C, 1 min/58°C, 1.5 min/72°C 32×, 10 min/72°C. Five μL of PCR products were used for evaluation of the PCR reaction by agarose gel electrophoresis and the remaining 5 μL were subjected to enzymatic purification using ExoSap IT kit (Amersham Pharmacia, Freiburg, Germany). Then, products of two PCR sets 1 and 2 were used in three separate multiplex SNaPshot reactions 1, 2.1 and 2.2. Details concerning extension primers are presented in Supplementary Table S2. Each minisequencing reaction was composed of 2 μL of SNaPshot mix, 1 μL of extension primer mix, 1 μL of adequate purified PCR product and water up to 10 μL. Products of extension reactions were purified with SAP enzyme (Fermentas, Vilnius, Lithuania) and analysed on an ABI 3100 Avant genetic analyser.

Population and Phylogenetic Analyses

The genetic data obtained for each SNP was tested for agreement with Hardy-Weinberg expectations using the exact test as implemented in TFPGA software ver. 1.3 (Miller, 1997). The Linkage Disequilibrium Analyser program ver. 1.1 (Ding et al. 2003) was used for evaluation of degree of linkage disequilibrium between the analysed polymorphic sites. For 10 SNPs which did not depart from Hardy-Weinberg expectations, haplotypes were inferred using the statistical approach as implemented in PHASE ver. 2.1.1 (Stephens et al. 2001). PHASE allows haplotype reconstruction using the Bayesian method, applying a priori expectations about occurrence of haplotypes in natural populations, which are predicted by application of rules used in population genetics and coalescent theory (Stephens et al. 2001). The reconstructed haplotypes were then used for generation of evolutionary trees using the parsimony approach implemented in the DNAPARS program from the PHYLIP package ver. 3.66 (Felsenstein, 2005). The obtained phylogenetic trees were displayed with TreeView ver. 1.6.6 (Page, 1996).

Genotype-phenotype Associations

The genotyped nonsynonymous positions which were found not to be in linkage disequilibrium were subjected to analysis for their association with pigment traits using the χ2 test. The haplotypes inferred with PHASE were tested for associations with different pigmentary traits using stratified analysis with the Mantel-Haenszel procedure (Mantel & Haenszel, 1959). Statistical analyses were performed using either SPSS 12.0 or SAS 9.1 statistical packages. The generated evolutionary trees were then subjected to the tree scanning method (Templeton et al. 2005) as implemented in the program TreeScan ver. 1.0 (Posada et al. 2005). This method allows genotypic analysis of phenotypic associations based on haplotype data assuming the use of evolutionary information. Tree scanning assumes partitioning of the haplotype tree into two mutually exclusive clades (classes) which are then treated in the subsequent search as two alleles. Using the ANOVA procedure, the TreeScan examines all such possible biallelic partitions of the phylogenetic tree of haplotypes when searching for significant genotype-phenotype associations.


Population Data

Table 1 shows characteristics of the study population. The population sample consisted of individuals of European descent living in Poland. The most frequently observed phenotypic characteristics were blue eyes (55.9% individuals), dark blond/brown hair (47.2%) and skin type III (36.2%). The sample included an increased number of red headed individuals compared to the regular pigment phenotype distribution in our population. The set of 11 SNP positions initially subjected to analysis included two aminoacid causing changes – rs1800407 and rs1800401, responsible for substitutions Arg419Gln and Arg305Trp respectively. They were the first suggested as related to eye colour in humans (Rebbeck et al. 2002). The remaining SNPs were either synonymous or intronic polymorphisms (Table 2). Distribution of alleles for analysed biallelic variable sites is shown in Table 2. All but one of the polymorphisms were in Hardy-Weinberg (HW) equilibrium. As such a disequilibrium may indicate some technical problems with the test, the polymorphism rs749846, departing from HW expectations, was removed from haplotype reconstruction. LD analysis revealed extensive linkage disequilibrium between most of the tested SNP pairs. LD was not supported in three cases, between the rs1800407 and SNPs rs17566952, rs1800401 and AY392134 (data not shown). It is worth noting that as in Rebbeck et al., two analysed nonsynonymous changes (rs1800401 and rs1800407) were found to be in linkage equilibrium (Rebbeck et al. 2002).

Table 1.  Characteristics of the study population
Characteristics Total n = 390
Hair colourRed/blond-red84
Dark blond/brown184
Skin typeI/II173
Eye colourBlue or grey218
Brown or black44
Table 2.  Data for analysed polymorphisms. SNP order refers to position in haplotypes
GenBank IDContig position (NT010280)Gene locationVariants and frequency of major alleleHW (P-value)
  1. *– not included in haplotype.

rs17566952599874Intron 18C = 0.95G1.0000
rs11638265605562Intron 16A = 0.68G1.0000
AY392134605495Intron 16C = 0.94T0.4124
rs1800411614910Exon 15T = 0.69C0.5556
rs1900758633086Intron 13A = 0.67G0.4951
rs10852218634782Intron 11C = 0.83T0.4744
rs1037208634346Intron 12A = 0.83C0.7243
rs1800401663042Exon 9C = 0.94T0.1393
rs1800404638762Exon 10A = 0.78G1.0000
rs1800407633307Exon 13G = 0.93A0.7044
rs749846*671979Intron 5C = 0.87A0.0011

Genotype-phenotype Association

The nonsynonymous nucleotide position rs1800407 (exon 13) was found to be associated with both eye colour (p = 0.001) and skin type (p = 0.015). In our data, this SNP was associated with eye colour in the blue versus nonblue test, but the most significant association was found with green/hazel iris colour (p < 0.001). These results were also statistically significant after adjustment for multiple tests. The second studied nonsynonymous polymorphism (rs1800401) was found not to be associated with eye colour in the blue versus nonblue test. An additional test revealed significant association for this SNP in the blue/green versus hazel/brown test (p = 0.035). This result is, however, not significant after Bonferroni's multiple-comparison correction.

In order to eliminate the problems of statistical dependence among analysed SNPs, which, as we found, revealed extensive linkage disequilibrium, the data were subjected to haplotype reconstruction. Using a Bayesian procedure implemented in the PHASE program, twenty nine different haplotypes were inferred. Table 3 presents relevant haplotypes associated with phenotypic effect according to stratified analysis or tree scanning. The stratified analysis revealed three haplotypes (4, 6, 27) to be significantly associated with nonblue eye colour and another one (21) with blue irises. The most significant association was noted for haplotype 6 (p = 0.0015), which was observed 46 times. As haplotypes are themselves correlated due to a common evolutionary history, it is assumed that consideration of this evolutionary data may improve the effectiveness of association studies (Posada et al. 2005; Templeton, 1995). Thus, the inferred haplotypic data were subjected to the method developed by Templeton et al. which assumes genealogical relationships of haplotypes reflecting their evolutionary history (Templeton et al. 2005). Since the procedure implies analysis of a phylogenetic tree of haplotypes, the reconstructed haplotypes were first used for inference of evolutionary trees using the parsimony approach implemented in the DNAPARS computer program (Felsenstein, 2005). As shown in Table 2, two very close positions located in intron 16 were analysed in this study. However, application of both was advantageous for haplotype inference and further phylogenetic analyses, as the omitted AY392134 resulted in 28 instead of 29 reconstructed haplotypes, which when subjected to phylogenetic tree reconstruction resulted in a very high level of ambiguity as reflected by 1206 most parsimonious phylogenetic trees. An inference performed on the basis of 10 polymorphisms revealed phylogenetic ambiguity which could have been resolved into 32 most parsimonious evolutionary trees of various topologies. All phylogenetic trees were then analysed with the Treescan program (Posada et al. 2005). The scanning was performed for three different pigment characteristics – hair colour, skin phenotype and eye colour. Table 4 shows the results of the first round of the tree scanning test obtained for eye colour, the only trait associated with the studied polymorphisms according to this method. The different values presented in this table for each of the analysed factors reflect the various results noted for different trees. The initial scan based upon 1000 permutations revealed associations in the case of eye colour (p≤ 0.05) sixty four times linked with four different branches, sixteen times respectively for branches 7–8, 8–6, 7–5, 5–6 (see Fig. 1). To obtain the second round of scanning all detections from the first round of analysis are taken into account, thus the second round is simply conditioned on the branches detected during the first round. The second round revealed associations 28 times always with branch 5–6 (conditioned on 7–5 – 15 times, on 8–6 – 8 times and 7–8 – 5 times, see Table 5 and Fig. 2). The results obtained clearly indicate that the strongest phenotypic effect is associated with branch 5–6. Careful inspection of haplotypes 5 and 6 reveals that branch 5–6 is defined by rs1800407, i.e. an amino acid causing change Arg419Gln located in exon 13 (Table 3). Interestingly, branch 7–8 associated with eye colour in the initial round of tree scanning is also defined by this same polymorphism. It is worth noting that either branch 5–6 or 7–8 or both of them were detected for all 32 scanned most probable evolutionary trees. It also seems possible that in the case of detected branches 7–5 and 8–6, both defined by rs1800404 (synonymous change in exon 10) the true phenotypic association may be actually due to branch 5–6, but tree scanning assigned the association to an adjacent branch due to some stochastic fluctuations. The consensus network constructed using the SplitsTree4 computer program (Huson & Bryant, 2006) on the basis of all 32 most parsimonious trees shows ambiguity associated with detected branches 5–6, 7–8, 8–6 and 7–5 (Figs 1 and 2). The second step analysis conditioned on branch 7–5 indicated branch 5–6 as associated with eye colour in all cases but one. Similarly, in the case of initial branch detection 8–6, the second round indicated branch 5–6 in 50% of cases. Figure 3 presents the haplotype tree for which most significant associations were found. Neither the stratified analysis nor the tree scanning revealed any trace of association with hair colour or skin type for the analysed population sample.

Table 3.  Inferred OCA2 haplotypes involved in eye colour variation. P-values are given for haplotypes significantly associated with eye colour
Haplotype No.State of SNPsNo. of observationsTrait, P-value
  1. *– not associated according to the stratified analysis.

4GGCCGCACAG15Non-blue, 0.0142
6GGCCGTCCGA46Non-blue, 0.0015
21GACTGGTCAG13Blue, 0.0314
27CGTCGGTTGG32Non-blue, 0.0399
Table 4.  Results of the first round of tree scanning performed on 32 most parsimonious haplotype trees obtained for eye colour
  1. 1– pvk– the proportion of trait variation explained by the partition.

  2. 2– pcorr– the corrected permutational P-values.

7–89.6478, 7.49720.0425, 0.03220.0410–0.0030
8–69.6346, 7.39760.0424, 0.03180.0410–0.0030
7–57.3976, 7.49720.0318, 0.03220.0330–0.0210
5–69.6346, 9.64780.0424, 0.04250.0090–<0.0001
Figure 1.

The results of the first round of tree scanning on the OCA2 haplotype tree for eye colour. The tree drawing rules as in Templeton et al. 2005. Double-headed arrows indicate single mutational changes. Dashed vertical lines are used to indicate multiple branches coming off a single node (note that they do not indicate any mutational change). Dotted lines show possible alternative branches.

Table 5.  Results of the second round of tree scanning performed of the 32 most parsimonious haplotype trees obtained for eye colour
First-round cut branchSecond-round branchF-statisticPvkPcorr
Figure 2.

The results of the second round of tree scanning on the OCA2 haplotype tree for eye colour. The tree drawing rules are the same as in Figure 1.

Figure 3.

The phylogenetic tree for which most significant associations were found. Initial analysis: branch 7–5, p = 0.0210, branch 5–6, p < 0.0001; second round conditioned on branch 7–5: branch 5–6, p = 0.0190.


Pigmentation may be involved in modulation of individual risk for developing different skin cancers and therefore its determination is not only a theoretical genetics issue, but may also have important medical implications (Duffy et al. 2004; Rees, 2004). Moreover, as a very distinguishing human trait, pigmentation is also in the area of interest for forensic science (Jobling & Gill, 2004; Branicki et al. 2005). The MC1R gene is the first example of a genetic marker whose prognostic value has already been explored by forensics for red hair colour prediction (Grimes et al. 2001; Branicki et al. 2007). Another more thoroughly studied pigment gene is OCA2 which is a major gene involved in oculocutaneous albinism in human (Oetting et al. 1996). Initially, research that was carried out into the link between OCA2 and normal pigment variation revealed an association between the gene and skin pigmentation (Akey et al. 2001). A more thorough study on the evolution of pigmentation variation indicated the minor role of the OCA2 gene in light skin evolution in European descent populations (Norton et al. 2007). However, recent studies found OCA2 to be the most important gene involved in variation of iris colour in Europeans (Rebbeck et al. 2002; Frudakis et al. 2003; Zhu et al. 2004; Duffy et al. 2007). The OCA2 gene consists of 24 exons spanning a region of 345 kb (Lee et al. 1995). The gene encodes an 838 amino acid protein which contains 12 transmembrane domains and is an integral melanosomal membrane protein. The OCA2 protein is supposed to be a transporter for tyrosine, a crucial substrate in melanin production (Lee et al. 1995). The oxidation of tyrosine is a first step in both eumelanin and pheomelanin biogenesis. Physically, eye colour is determined by the distribution and content of melanin within the melanocytes of the outermost layer of iris (Nizankowska, 2007). In general, blue iris colour is caused by very little content of melanin, whilst brown colour is associated with high melanin concentration. In reality, there is a continuing range in eye shades among people, which is shaped not only by different amounts of melanin, but also its packaging and the ratio of two forms of the pigment. Up to now, only two nonsynonymous OCA2 changes were linked with normal pigmentation variation in man (Rebbeck et al. 2002).

Using conventional statistical methods as well as a recently developed evolutionary approach we present another piece of evidence that the OCA2 gene is associated with variation in iris colour. Single SNP analysis revealed that one of two studied nonsynonymous changes, rs1800407, is strongly associated with iris colour. The less frequent allele A refers to variant 419Gln and was found to be most significantly associated with green/hazel eye colour. This is concordant with the data presented by Rebbeck's team (Rebbeck et al. 2002). In the extensive data presented by Duffy et al. 2007, this variant was found to be most significantly associated with green iris colour. There is still controversy concerning the role of the second exonic variant, 305Trp (rs1800401), in determination of eye colour. Some population studies revealed association of this allele with darker eye colours (Rebbeck et al. 2002; Frudakis et al. 2003; Jannot et al. 2005), but others disagree with that finding (Duffy et al. 2007). Our data are inconclusive in this matter. In one of two performed χ2 tests, allele T (305Trp) was found to be associated with hazel/brown eye colour and this would support the hypothesis of its association with darker iris colours. However, after Bonferroni adjustment, this finding did not achive the level of statistical significance.

Analysis of haplotypes rather than single SNPs is often considered as more informative in association studies. This is not only due to linkage disequilibrium, which frequently affects analysed SNPs (Zaykin et al. 2002), but also because it is the combination of SNP variants which may affect phenotypes more strongly than single changes within a gene (Templeton et al. 2005). The approach implemented in the PHASE program used in this study (Stephens et al. 2001) is widely recognised as the most accurate method of statistical haplotype reconstruction (e.g. Balding, 2006). Statistical tests performed on twenty nine inferred haplotypes revealed three of them to be linked with nonblue eye colour of which haplotype 6 was most significantly associated with darker eyes (Table 3). Haplotype 6 was also implicated in eye colour variation by use of the alternative approach, assuming scanning of the phylogenetic trees of haplotypes. The three remaining haplotypes (4, 21 and 27) which stratified analysis indicated as associated with eye colour were not detected by the TreeScan program. This discordance should be explained by the different nature of the applied statistical tests. The tree scanning method makes use of the information contained within the structure of the phylogenetic tree and, for instance, in contrast to stratified analysis, utilizes information about internal linkage disequilibrium. Indeed, Templeton and co-workers suggest that the use of genealogical relationships of haplotypes increases the amount of biological information and through this may lead to more accurate conclusions concerning phenotype/genotype associations (Templeton et al. 2005). It is also worth noting that the statistical significance of the associations found with stratified analysis was rather low, especially in the case of haplotypes 21 and 27 (Table 3). The TreeScan program indicated branch 5–6 as associated with the strongest phenotypic effect, which enabled us to acknowledge the rs1800407 as the most crucial for eye colour variation. This is in strong agreement with single SNP test results. However, it is worth noting that application of the tree scanning approach gave additional information about the proportion of trait variation (pvk) explained by partition 5–6, which was assessed by the program at the level of about 4% (Tables 4 and 5). This low value suggests that additional exonic variation of the OCA2 gene or polymorphic sites located within its regulatory elements must play an important role in eye colour determination. However, when compared with the MC1R gene, which is the key pigment gene, exonic variation within OCA2 appears to be rather low. The OCA2 gene has several nonsynonymous changes within its 24 exons (Lee et al. 1995; Duffy et al. 2007), many of them associated with albinism (Oetting et al. 2005). The MC1R gene has been found to have over thirty five variable amino acid positions within its single exon (Rees et al. 2003). Moreover, unlike in the case of MC1R, which has a few very common variants, only two polymorphisms (rs1800401 and rs1800407), both studied in this project, should be considered as common in the case of OCA2 (Duffy et al. 2007). It seems, therefore, that the action of these two pigment genes on phenotypic features is due to different mechanisms and maybe in the case of the OCA2 more effort should be put into the variation present in regulatory elements. Duffy et al. found that three SNPs located in intron 1 of the OCA2 when grouped in haplotype were most informative in blue versus nonblue eye colour prediction. They suggest that the three intron-1 SNPs may be so informative due to tight linkage with proximal regulatory elements (Duffy et al. 2007). There is a growing body of data which indicate the significant role of regulatory regions in shaping phenotypic features in humans. For example, the regulation of mRNA stability was suggested to be the most probable mechanism of action for ASIP (MIM# 600201), one of many pigment related genes (Voisey et al. 2006). Variant 8818G at a position located in the 3′ UTR region of the ASIP gene is, according to some studies, associated with darker pigmentation (Kanetsky et al. 2002; Bonilla et al. 2005). Variation in the promoter region of the MC1R gene was also implicated in regulation of the gene performance (Makova et al. 2001). Since the OCA2 gene is considered as the most important, but not the only gene assigned to phenotypic variation in eye colour, examination of other genes, for example MATP (MIM# 606202) or TYRP1 (MIM # 115501) (Graf et al. 2005; Frudakis et al. 2003), will be helpful in drawing a complete picture of the genetics of eye colour in man.


We would like to thank all volunteers who participated in the study and provided biological samples. We also wish to thank Wieslaw Babik for helpful comments on the manuscript. The project was supported by the Ministry of Education and Science in Poland, grant no. 0 T00C 018 29.