Phenotype versus Genotype Methods for Copy Number Variant Analysis of Glutathione S-Transferases M1


Corresponding author: MARIA FUCIARELLI, Department of Biology, University of Rome “Tor Vergata,” Via della Ricerca Scientifica 1, 00133 Rome, Italy. Tel: +390672594310; Fax: +39062023500; E-mail:


Several variants have been identified for genes encoding Glutathione S-transferase (GST) enzymes; some are associated with significant alteration of protein function. One of the most extensively studied is a copy number variant (CNV) in the GSTM1 gene. In this study, we compared phenotype (positive, null) and genotype (1/1, 1/0, 0/0) methods in order to assess dissimilarities obtained using these two different approaches to evaluate possible methodology-related bias. We analyzed a sample of 1947 individuals belonging to 18 human populations with different ethnic origins. We also evaluated whether the presence of missense substitutions in the GSTM1 gene might influence the association of the CNV with phenotype distribution.

Through the comparison of GSTM1 CNV frequencies in phenotype and genotype among human populations, we observed that differences increase in high heterogeneous populations. Furthermore, we identified two missense variants (rs199816990 and rs202002774) that may distort the outcome of genetic association studies on Asian populations.

These results indicate that the phenotype analysis may strongly alter the genetic association. Therefore, genotype discrimination analysis should be used to analyze GSTM1 CNV. To understand the role of GSTM1 in human health, the analysis of CNV should be combined with the investigation of single nucleotide polymorphisms with functional effect.


Glutathione S-transferases (GSTs) are multifunctional proteins that play an important role in cellular metabolism. GSTs are, in fact, the principal phase II enzymes in detoxification processes involved in the metabolism of a wide range of exogenous/endogenous compounds (Hayes et al., 2005). Their main mode of action is to catalyze phase II of the detoxification mechanism by conjugating glutathione (GSH) with electrophilic compounds, to form more soluble and nontoxic derivatives, ready to be excreted or compartmentalized by phase III enzymes (Frova, 2006; Omiecinski et al., 2011). GST detoxification ability plays a role in cellular protection from environmental and oxidative stresses. In mammals, and particularly in human, GSTs modulate important cell signaling pathways, participate in leukotriene and prostaglandin biosynthesis, and are implicated in cellular resistance to drugs (Frova, 2006; Dourado et al., 2008). Furthermore, several studies have highlighted that GST gene polymorphisms are associated with complex disease, as reported also for other antioxidant genes (Cilenšek et al., 2012; Letonja et al., 2012). Human GSTs are divided into three families: cytosolic, mitochondrial (known also as soluble GSTs) and Membrane-Associated Proteins in Eicosanoid and Glutathione metabolism (MAPEG) (Hayes et al., 2005; Polimanti et al., 2011a). Based on sequence homology and immunological cross reactivity, human cytosolic GSTs are divided into seven classes: Alpha, Mu, Omega, Pi, Sigma, Theta and Zeta. In humans these enzymes show inter-ethnic and inter-individual differences in terms of their efficiency in the detoxification processes. This variability is mainly due to genetic and environmental factors and may account for the ethnic diversity observed in susceptibility to some xenobiotic compounds (Thier et al., 2003; Polimanti et al., 2011b). Several variants have been identified for GST genes and some of these are associated with a significant alteration of protein function. One of the most extensively studied variants occurs in the GST Mu class isozymes (GSTM1–1). GSTM1 (1p13.3) consists of eight exons and two almost identical 4.2 kb flanking regions (McLellan et al., 1997). GSTM1 shows copy number variant (CNV) (i.e., from 0 to 2 gene copies). GSTM1*0 allele is caused by a homologous recombination process involving repeats flanking the gene and spanning a region of approximately 18 kb (Fryer et al., 1993). CNV for GSTM1 has a dosage effect between gene copy and concentration of the GSTM1-1 enzyme: this variant affects its ability to efficiently detoxify compounds and provide protection against oxidative stress (Pemble et al., 1994; McLellan et al., 1997; Fuciarelli et al., 2009; Ginsberg et al., 2009). In addition to GSTM1 CNV, numerous missense substitutions are identified and reported in databases of human genetic variation, like the 1000 Genomes database (2010). However, few studies have analyzed the role of these variants in human health (Polimanti et al., 2011a).

Several studies have been conducted both to contribute to the knowledge of GSTM1 phenotype distribution in human populations (Garte et al. 2001; Gaspar et al. 2002; Buchard et al. 2007; Piacentini et al. 2011; Polimanti et al. 2013) and to evaluate the role of GSTM1 as a biomarker in complex diseases (Polimanti et al. 2011c, 2012 Piacentini et al. 2012a, 2012b). To date, the analysis of GSTM1 CNV is generally performed using a phenotype-based method that permits discrimination only between a positive phenotype (carriers of at least one gene copy GSTM1*1/0 plus GSTM1*1/1) and a null phenotype (absence of both gene copies GST*0/0). This method only allows identification of individuals with the null genotype associated with a complete loss of catalytic activity. Currently, through the increased sensitivity of real-time PCR assay, it may be possible to provide dosage data for the GSTM1 gene. This technique could permit identification of the exact number of gene copies, enabling the identification of all three genotypes. An accurate distinction, for the positive phenotype, between homozygous (high enzymatic activity) and heterozygous (medium enzymatic activity) could help to correlate phenotypes with GSTM1 genotypes.

In this study, we compared phenotype versus genotype methods to analyze whether a more discriminating approach could distinguish different subpopulations, and to evaluate a bias related to the phenotype-based methodology. Specifically we analyzed the difference in GSTM1 CNV among human populations, considering their ethno-geographic origins. Moreover as single nucleotide polymorphisms (SNPs) show a large impact on gene function in individuals with GSTM1 positive phenotype, we verified that GSTM1 missense substitutions have significant inter-ethnic differences.

Materials and Methods

A total of 786 unrelated adult individuals of both sexes were typed from different human populations: Amhara (n = 103) and Oromo (n = 98) from Ethiopia; Bamileke (n = 30) from Cameroon; Cayapas (n = 182), Colorados (n = 83) and African Ecuadorians (n = 185) from Ecuador; and Italians from Italy (n = 105).

A total of 5–10 ml of peripheral blood from each subject was collected by venipuncture and stored in a heparinized vacutainer. Each donor was asked to supply name, birthplace, language and ethnicity for three generations, in order to allow us to determine the extent of the recent admixture. Further information about these human groups and DNA purification procedures is available in previous studies (De Stefano et al., 2002; Polimanti et al., 2010; De Angelis et al., 2012). We matched our data with those reported for different ethnic groups in the HapMap project (available at (International HapMap3 Consortium et al., 2010). Our six populations and eleven HapMap samples are classified into five groups considering their geographic origins: Amhara, Oromo, Bamileke, African Ecuadorians, ASW (African ancestry in the southwestern USA), LWK (Luhya in Webuye, Kenya), MKK (Masaay in Kinyawa, Kenya) and YRI (Yoruba in Ibadan, Nigeria) in the African group; Cayapas, Colorados, and MEX (Mexican ancestry in Los Angeles, California) in the American group; CHB (Han Chinese in Beijing, China), CHD (Chinese in metropolitan Denver, Colorado, USA), and JPT (Japanese in Tokyo, Japan) in the Asian group; GIH (Gujarati Indians in Houston, Texas, USA) in the central Asian group; Italians, CEU (individuals from the Centre d'Etude du Polymorphisme Humain collected in Utah, USA, with ancestry from northern and western Europe) and TSI (samples collected in Tuscany, Italy) in the European group.

Genotyping of GSTM1 CNV was achieved by the TaqMan Copy Number Assay from Applied Biosystems, (Applied Biosystems Inc., Foster City, CA). The predesigned Copy Number Assay ID is Hs00273142 (Applied Biosystems). The total reaction volume per well was 20 μL, including 5 ng genomic DNA, 1 μL TaqMan Copy Number Assay, and 10 μL TaqMan Universal PCR Master Mix (Applied Biosystems), according to the manufacturer's manual. PCR was performed at 95°C for 10 min, 40 cycles at 95°C for 15 s and 60°C for 1 min. Two blank controls in each 96-well plate were used for the assay quality control. The analysis of results was performed using CopyCaller software version 1.0 (Applied Biosystems Inc., Foster City, CA).

Allele frequencies were computed using the genotype-counting method. Hardy–Weinberg equilibrium and genotype/phenotype differences were evaluated using the Chi-square (χ2) test. P values < 0.05 were considered to be significant. To analyze the differences between genotype and phenotype methods in pairwise comparisons of worldwide populations, four classes of P values were identified: P ≥ 0.10 (not significant); 0.10 > P ≥ 0.05 (not significant with a trend); 0.05 > P ≥ 0.01 (significant); P < 0.01 (highly significant). To identify the most divergent geographic origin group for the GSTM1 null phenotype and GSTM1*0 allele, the method proposed by Hofer et al. ( 2009) was used. For each allele i, we computed the average allele frequency pij within each geographic origin group j, as well as the difference in the average frequency computed over all other populations via Equation : DF = pij – p-ij, where pij is the average frequency of allele i in all populations not belonging to the geographic region j. The functional prediction analysis of GSTM1 missense substitution was performed using Polyphen and SIFT softwares (Ramensky et al., 2002; Kumar et al., 2009).


In Table 1 allele, genotype and phenotype frequencies of GSTM1 CNV in a worldwide population are reported. Genotype distribution was in Hardy-Weinberg equilibrium for all populations. GSTM1*0 allele frequency in the worldwide population showed a great range of variability among individuals with different geographic origins, from 51.7% (Africans) to 70.5% (Europeans); these findings are consistent with those shown in the HapMap population, from 42.7% (Africans) to 76.1% (Europeans). Also, considering the GSTM1 null phenotype distribution in the worldwide population, we found a great range of variability from 30.0% (Africans) to 51.4% (Europeans), confirmed also by the HapMap data, from 20.1% (Africans) to 59.0% (Europeans).

Table 1. Allele, Phenotype and Genotype Frequencies of GSTM1 CNVs in Worldwide Populations
  Genotype (n)  
PopulationN1/11/00/0GSTM1 null phenotype (%)GSTM1*0 allele (%)

Firstly, to compare genotype and phenotype methods, we analyzed the differences among ethno-geographic groups (Figure 1). A similar situation was observed in both methodologies. The greatest differences are present between Africans and non-Africans, in which African populations showed lower frequencies of the GSTM1 null phenotype and GSTM1*0 allele. Differences were also observed between Europeans versus non-Europeans and Asians versus non-Asians, in which higher frequencies of the GSTM1 null phenotype and of the GSTM1*0 allele were observed in these ethnic groups in comparison to the populations not belonging to these groups. The lowest differences were found between American and non-American populations. Consequently, we performed the pairwise analysis between worldwide populations (Figure 2). Within African and Amerindian groups, significant differences were highlighted between genotype and phenotype analyses, whereas, for the European and the Asian cluster, phenotype versus genotype techniques did not reach any significant difference. More specifically, the differences in the African group between genotype and phenotype methods were observed for: Amhara versus Oromo (Pphenotype = 0.095; Pgenotype = 0.021); Amhara versus Bamileke (Pphenotype = 0.048; Pgenotype = 0.014); African-Ecuadorians versus Oromo (Pphenotype = 0.044; Pgenotype = 0.074); and ASW versus Oromo (Pphenotype = 0.015; Pgenotype < 0.001). In the Amerindian groups, phenotype and genotype analyses of GSTM1 CNV highlighted different outcomes for: Cayapa versus MEX (Pphenotype = 0.027; Pgenotype = 0.086); and Cayapa versus Colorado (Pphenotype = 0.006; Pgenotype = 0.020). Considering the pairwise comparison among populations with different ethno-geographic origins, we observed considerable differences between phenotype and genotype analyses for: Amhara versus CEU (Pphenotype = 0.174; Pgenotype = 0.015); Amhara versus TSI (Pphenotype = 0.519; Pgenotype = 0.012); Amhara versus CHB (Pphenotype = 0.756; Pgenotype = 0.002); Amhara versus CHD (Pphenotype = 0.413; Pgenotype = 0.036); Bamileke versus CHB (Pphenotype = 0.085; Pgenotype < 0.001); Bamileke versus Italians (Pphenotype = 0.380; Pgenotype = 0.033); and African-Ecuadorians versus CHB (Pphenotype = 0.633; Pgenotype = 0.034).

Figure 1.

Distribution of GSTM1 frequency differences (DF) comparing a given geographic origin group versus the rest of the world.

Figure 2.

Pairwise comparisons (χ2 analysis) among worldwide populations. Above the diagonal the P values obtained by phenotype comparisons are represented. Below the diagonal the P values obtained by genotype comparisons are represented. Different colours represent the significance level: black (highly significant, P < 0.01); dark gray (significant, P < 0.05); light gray (not significant with a trend, P < 0.10) and white (not significant, P > 0.10). Af, Africa; Am, America; As, Asia; Eu, Europe; R, Amhara; O, Oromo; I, Bamileke; F, African-Ecuadorian; A, ASW; L, LWK; K, MKK; Y, YRI; N, Italians; C, CEU; T, TSI; G, GIH; H, CHB; D, CHD; J, JPT; M, MEX; P, Cayapa; Q, Colorado.

After the analysis of the phenotype estimation bias, we investigated whether a missense substitution with a large impact may alter genetic association studies on GSTM1 CNV. In Table 2, nonsynonymous coding variants of the GSTM1 gene available from the 1000 Genomes Project data are shown. Most of the substitutions with large deleterious effects were not detected or, if present, were at very low frequencies (<0.05%). Only two nonsynonymous coding variants (rs199816990 and rs202002774), predicted as deleterious by Polyphen, showed allele frequencies higher amongst Asians (16% and 17%, respectively) than other worldwide populations.

Table 2. Nonsynonymous Coding Variants in GSTM1 Gene in 1000 Genomes Project (NA, Not Available)
  PredictionAllele frequency
rs11546855D42GDeleteriousProbably damagingNANANANA
rs150797170E49KDeleteriousPossibly damaging0000
rs142484086R145WDeleteriousProbably damaging0NANA0.02
rs72549313R187CDeleteriousPossibly damagingNANANANA


Glutathione S-transferases play a fundamental role in the cellular detoxification of a wide range of xenobiotic compounds and oxidative stress metabolites (Hayes et al., 2005). A vast literature is present pertaining to the association of GST gene variants and risk of various diseases (Piacentini et al., 2012c, 2013). In particular, CNV of the GSTM1 gene has been studied in depth in relation to human health (Bolt & Thier, 2006). Most studies on GSTM1 CNV are based on discrimination between positive (carriers of at least one gene copy) and null (homozygous for gene deletion) phenotypes.

The aim of our study was to investigate the dissimilarities between phenotype and genotype analyses of GSTM1 in order to evaluate whether the phenotype analysis generates a bias in the genetic association analysis of GSTM1. Furthermore, we analyzed the inter-ethnic difference of missense substitutions with a large functional impact, in order to verify whether these variants may alter GSTM1 analysis.

Our analyses of the GSTM1 CNV frequency in worldwide populations highlighted outcomes in agreement with those reported in the HapMap population with the same ethnic origins and with those reported in the literature (Garte et al., 2001; Gaspar et al., 2002; Buchard et al., 2007; Piacentini et al., 2011). Differences among ethnic groups were observed mainly between African and non-African populations. This outcome may be due to human demographic history and, especially, to the African origin of humankind (Fu et al., 2013). As for the phenotype versus genotype analysis, no significant differences were observed. This development supports the hypothesis that phenotype discrimination does not generate bias when the analysis is focused on highly differentiated groups, such as a cluster based on ethnic origin. A different condition is observed when pairwise comparisons between worldwide populations are considered. In particular, phenotype and genotype analyses provided different outcomes in African and American groups. This result is probably due to the heterogeneity within these geographic groups, caused by their particular demographic history (Travis et al., 2007). Within the African group, we observed more significant P values for genotype comparisons than for phenotype ones, except for the differences between African-Ecuadorian and Oromo. Conversely, within American populations, P values obtained by phenotype discrimination were more significant than those obtained by genotype analysis. The difference between genotype and phenotype analyses strongly increases when populations with different origins are compared and, especially, when populations with African ancestry are compared with Asian populations. In these pairwise comparisons it is possible to observe nonsignificant P values (>0.1) in the phenotype discrimination that become significant (<0.05) or highly significant (<0.01) in the genotype comparison. This outcome suggests that the differences between GSTM1 phenotype and genotype methods strongly increase when heterogeneous populations are compared.

Regarding GSTM1 missense substitutions, we identified two variants (rs199816990 and rs202002774) potentially associated with large functional effects that showed higher frequencies in Asians than in non-Asian populations. This condition may generate a bias in the genetic association studies on Asian populations. Indeed, an individual may be a carrier of a gene copy in which one of these SNPs is present and, consequently, may show the same enzymatic activity as a gene deletion carrier.

In conclusion, our study demonstrates that the phenotype analysis of GSTM1 CNV may strongly distort the detection of genetic association, especially among populations characterized by a high heterogeneity. Therefore, the genotype discrimination analysis should be used to analyze GSTM1 CNV. Furthermore, to understand completely the role of GSTM1 in human health, the analysis of CNV should be combined with the investigation of SNPs with large functional effect.


The subjects of the investigation were adequately informed about the aims of the study and gave their approval, which is also gratefully acknowledged. Human studies have been performed in accordance with the ethical standards as laid down by law. The authors declare that they have no conflict of interest. This study was supported by Ricerca Scientifica di Ateneo (RSA) Grant 2009 from University of Rome “Tor Vergata” and by PRIN 2009–2011 (prot. n. 200975T9EW) from MIUR.