SEARCH

SEARCH BY CITATION

Keywords:

  • genome-wide association study;
  • pigmentation;
  • skin cancer;
  • single-nucleotide polymorphisms;
  • pathway analysis

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. GWAS study design
  5. Utilizing GWAS from other traits
  6. Post-GWAS research
  7. References

Genome-wide association studies (GWAS) have become a widely used approach for genetic association studies of various human traits. A few GWAS have been conducted with the goal of identifying novel loci for pigmentation traits, melanoma, and non-melanoma skin cancer. Nevertheless, the phenotype variation explained by the genetic markers identified so far is limited. In this review, we discuss the GWAS study design and its application in pigmentation and skin cancer research. Furthermore, we summarize recent developments in post-GWAS activities such as meta-analysis, pathway analysis, and risk prediction.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. GWAS study design
  5. Utilizing GWAS from other traits
  6. Post-GWAS research
  7. References

Genome-wide association studies (GWAS) have been widely conducted on various human traits and diseases during the past decade and have rapidly become a standard approach for genetic association studies. As reviewed by Gerstenblith et al. (2010) in this journal, a number of GWAS have been conducted on pigmentation traits as well as melanoma and non-melanoma skin cancer, and novel genetic susceptibility loci have been identified. However, only a few genetic markers have been identified so far, and they explain only a small fraction of the variation in the general population. In addition, most of the single-nucleotide polymorphisms (SNPs) discovered in GWAS are not located in protein-coding regions. The identified associations could be either due to the fact that these SNPs exert an effect through splicing/expression or that they are in linkage disequilibrium (LD) with the underlying true causal SNPs. In this review, we discuss the challenges and opportunities presented by GWAS and offer an overview the emerging post-GWAS approaches that can be used to further explore the GWAS data.

Prior to the advent of GWAS, three primary approaches were widely used for genetic studies: linkage analysis, cytogenetics, and candidate gene studies (Fountain et al., 1990). These approaches helped identify several susceptibility genes/loci for various diseases including skin cancer (such as CDKN2 and CDK4 as high-penetrance genes for melanoma) as well as the MC1R gene as a low-penetrance gene for melanoma, basal cell carcinoma (BCC), and squamous cell carcinoma (SCC) (Gerstenblith et al., 2010). However, the high-penetrance genes account for less than half of the inheritance in melanoma-prone families, and the fact that most of the melanomas occur as sporadic cases suggests the existence of additional susceptibility genes (Begg et al., 2005). Candidate gene studies detected variants in low-penetrance genes known to be involved in the pathogenesis of melanoma such as those responsible for pigmentation, DNA-repair pathways, cell growth, oxidative stress, inflammatory response, and telomere-maintaining genes (Gu et al., 2009; Gudbjartsson et al., 2008; Han et al., 2004a,b, 2006a,b; Nan et al., 2009a,b, 2011a; Raimondi et al., 2008), while many more low-penetrance susceptibility loci remain unrevealed.

With the development of high-throughput genomics and analytical tools, GWAS, in which several hundreds of thousands to more than a million SNPs can be detected, emerged as a powerful approach to identify the remaining genetic susceptibility loci. Genetic loci identified to be associated with pigmentation, nevi, and skin cancer as of 2010 were reviewed (Gerstenblith et al., 2010). Since then, several new GWAS have been published on cutaneous nevus count, and on BCC, SCC, and melanoma risk in Caucasian populations (Amos et al., 2011; Barrett et al., 2011; Macgregor et al., 2011; Nan et al., 2011b,c). A summary of novel SNPs identified to be associated with nevus count, BCC, SCC, and melanoma risk is presented in Table 1.

Table 1.   Single-nucleotide polymorphisms associated with cutaneous nevi, basal cell carcinoma (BCC), squamous cell carcinoma (SCC), and melanoma
Chr#rs#ReferenceRef. All.Gene neighborhoodCutaneous neviP-valueBCC OR (95% CI)P-valueSCC OR (95% CI)P-valueMelanoma OR (95% CI)P-value
  1. aThe regression parameter beta (standard error) based on the linear regression.

  2. bResult from the dominant model.

1q42rs3768080 Nan et al. (2011c) A NID1 −0.07 (0.01)a6.5 × 10−8    0.86 (0.75, 0.98)b0.02
6p25rs12210050 Nan et al. (2011b) T EXOC2   1.24 (1.17, 1.31)9.9 × 10−101.35 (1.16, 1.57)7.6 × 10−5  
13q32rs7335046 Nan et al. (2011b) G UBAC2   1.26 (1.18, 1.34)2.9 × 10−81.21 (1.02, 1.44)0.03  
15q13rs12913832 Amos et al. (2011) A HERC2       0.69 (0.61, 0.79)4.3 × 10−8
11q22rs1801516 Barrett et al. (2011) A ATM       0.84 (0.79, 0.89)3.4 × 10−9
21q22rs45430 Barrett et al. (2011) G MX2       0.88 (0.85, 0.92)2.9 × 10−9
2q33rs13016963 Barrett et al. (2011) A CASP8       1.14 (1.09, 1.19)8.6 × 10−10
1q21rs7412746 Macgregor et al. (2011) Amos et al. (2011) TIntergenic      0.89 (0.85, 0.95) 0.88 (0.84, 0.91)9 × 10−11 6.2 × 10−10

GWAS study design

  1. Top of page
  2. Summary
  3. Introduction
  4. GWAS study design
  5. Utilizing GWAS from other traits
  6. Post-GWAS research
  7. References

Substantial technical breakthroughs on genotyping have allowed investigators to affordably genotype a million SNPs per sample using arrays. In addition, with the completion of the International HapMap Project and the 1000 Genome Project, we have a better understanding of the LD pattern of the human genome. Although there are about 30 million SNPs across the whole human genome, many of them are in LD and can stand in for each other (Chin, 2003). For that reason, a smaller set of SNPs that captures most of the genetic variation in a region can be used in association studies to reduce the number of SNPs needed to detect LD-based association. Commercially available SNP chips (Illumina, San Diego, CA, USA or Affymetrix, Santa Clara, CA, USA) have been widely used in GWAS. As a consequence of testing large numbers of SNPs, any analysis will inevitably generate a large number of false positive results. One common practical strategy is the multistage design, in which cases and controls are divided into a discovery set and a replication set. Usually the most significant SNPs identified in the discovery set are selected for replication in a replication set. Compared with the one-stage design, in which all SNPs are genotyped for all individuals, a well-designed multistage study substantially reduces genotyping cost while maintaining reasonable power. Descriptions of multistage designs with illustrating figures have been presented previously (Hunter et al., 2008). Skol et al. 2006 showed that in a two-stage design, instead of the replication strategy that separates the replication stage and considers only the selected markers in the replication set for Bonferroni corrections, the joint analysis strategy (which combines the discovery and replication sets and tests the significance after Bonferroni correction using the number of genome-wide SNPs) almost always increases power to detect genetic associations. Skol et al. 2006 also recommended using joint analysis for all two-stage GWAS, especially when the discovery set consists of more than 30% of the entire study population and the proportion of SNPs being selected for follow-up study is large (more than 1%). Furthermore, as shown by Pahl et al. 2009, based on the current genotyping cost structure, using no more than four stages in study design will be sufficient for most practical purposes.

Utilizing GWAS from other traits

  1. Top of page
  2. Summary
  3. Introduction
  4. GWAS study design
  5. Utilizing GWAS from other traits
  6. Post-GWAS research
  7. References

Genome-wide association studies examining melanoma-related phenotypes have identified loci on hair color, eye color, skin color, sun sensitivity, freckling, and nevi (reviewed in (Gerstenblith et al., 2010). These phenotypes are intermediates in melanoma pathogenesis pathways, and it is plausible that genetic loci that contribute to these traits may also confer melanoma risk. For example, in a recently published GWAS (Nan et al., 2011c), the authors’ primary aim was to identify SNPs conferring high nevus count, which is a well-known risk factor for melanoma (Gandini et al., 2005). The top two hits, which were from nidogen 1 (NID1) on chromosome 1, were then analyzed and confirmed for their association with melanoma susceptibility (Nan et al., 2011c). A similar approach has been successfully applied to identify other nevus-melanoma loci such as PLA2G6 et al. (Falchi et al., 2009). In addition, susceptibility loci conferring the risk of one type of cancer may also be associated with the development of cancer at other sites. For instance, following the identification of genetic variances at the TERT_CLPTM1L locus associated with BCC, the same region was associated with the risk of several other cancer types including melanoma (Rafnar et al., 2009). Our group also reported that the SNPs near the EXOC2 and the UBAC2 identified in GWAS on BCC risk were associated with the risk of SCC (Nan et al., 2011b).

Post-GWAS research

  1. Top of page
  2. Summary
  3. Introduction
  4. GWAS study design
  5. Utilizing GWAS from other traits
  6. Post-GWAS research
  7. References

GWAS meta-analysis

Single SNP effect size is usually modest for complex traits and diseases. It is thus often hard to establish statistically significant conclusions because of the demand for large sample size. Meta-analysis is a widely accepted method to increase sample size and statistical power. Usually meta-analysis is used to summarize the results from multiple published studies to obtain more conclusive results. For instance, the results on the association between the polymorphisms in the vitamin D receptor (VDR) gene and melanoma risk were conflicting. The first Leeds study (UK, 1028 population-ascertained cases and 402 controls) reported no significant association between any VDR SNP and melanoma risk, while the second Leeds study (UK, 299 cases and 560 controls) found that the FokI T allele was associated with increased melanoma risk (odds ratio, OR: 1.42; 95% confidence interval, CI: 1.06–1.91; P = 0.02). By a meta-analysis in conjunction with published data from other smaller datasets (total 3769 cases and 3636 controls), the FokI T allele was associated with increased melanoma risk (OR 1.19, 95% CI 1.05–1.35), and the BsmI A allele was associated with a reduced risk (OR 0.81, 95% CI 0.72–0.92) (Randerson-Moor et al., 2009).

Meta-analyses are most helpful in identifying low-penetrance loci. In a meta-analysis exploring the effect of DNA-repair gene polymorphisms, XPD/ERCC2 was first suggested as a low-penetrance melanoma susceptibility gene with OR of 1.12 (P = 0.01) (Mocellin et al., 2009). More recently, Chatzinasiou et al. 2011 conducted a systematic meta-analysis on cutaneous melanoma, including data from melanoma GWAS. As a result, in addition to the known melanoma risk genes, they identified a locus at 9p21.3 (CDKN2A/MTAP) that was associated with genome-wide statistical significance with melanoma risk (Chatzinasiou et al., 2011). However, because of the absence of publicly available melanoma GWAS datasets, they relied only on results presented in the GWAS publications. Meta-analysis on original full GWAS data has become critical to increase the sample size and statistical power in the discovery stage to identify additional novel loci. This effort requires coordination and collaboration across multiple groups.

Pathway analysis

The traditional GWAS focuses only on marginal effects of individual markers, which often lack power to detect relatively small effects conferred by most genetic variants. It is well known that genes do not work in isolation. Instead, complex molecular networks and cellular pathways are often involved in disease development. Therefore, pathway-based approaches, which evaluate the cumulative contribution of the genes within biological pathways, have been developed. Such approaches may help collect the modest signals contained in the GWAS data and identify biological pathways in the etiology of disease on a pathway level. There are currently two main types of pathway-based approaches: a competitive test comparing the statistics for genes in a given pathway with statistics for other genes, such as GenGen and GSEA_SNP (Holden et al., 2008; Wang et al., 2007); and a test comparing results of a given pathway with the null association, such as the GRASS (Ding et al., 2010) and PLINK set-tests (Purcell et al., 2007). Both strategies have been successfully used for various complex diseases (Macgregor et al., 2011). Recently, we conducted a pathway analysis in a GWAS on BCC. In addition to the gene sets containing BCC-related genes that were previously identified, we found four other biological pathways (the heparan sulfate biosynthesis pathway, the mCalpain pathway, the Rho cell motility signaling pathway, and the nitric oxide pathway) associated with BCC risk, which may provide new insight into the etiology of BCC upon further functional studies (Zhang et al., 2011b).

However, the traditional pathway analysis assigns the SNPs to nearby genes based on their physical locations, which has some limitations. A SNP located in a structural gene but regulating the expression of another gene may not be assigned functional relevance. For instance, the SNP rs12913832 in the intron of HERC2 was reported to confer pigmentary traits by regulating the expression of a downstream gene OCA2, which is a well-known pigmentation gene (Sturm et al., 2008). Instead of assigning SNPs to their physical location, defining the expression quantitative trait loci (eQTLs) and assigning them to the genes that they regulate may better functionally annotate SNPs. eQTLs have been reported to be enriched for disease associations in several GWAS studies (Ding et al., 2011; Zhong et al., 2010b), and the integration of eQTLs into the pathway analysis for the GWAS of type 2 diabetes has successfully identified several novel disease-related pathways (Zhong et al., 2010a). GWAS have primarily focused on marginal effects for individual markers, and functional pathways were investigated only after robust statistical associations were identified. More recently, we integrated eQTL information into a BCC GWAS and conducted a pathway analysis on those data. We first used the eQTL information from lymphoblastoid cell lines (LCL) to functionally annotate the SNPs and filtered out the SNPs not associated with gene expression, and then performed the pathway analysis. As a result, we identified the JAK-STAT signaling pathway associated with BCC risk (Zhang et al., 2011a).

Even though the patterns of gene expression may vary among different tissue types, eQTLs are substantially shared across tissues (Cookson et al., 2009; Ding et al., 2010). The eQTLs identified in the LCL have helped interpret findings from GWAS in which LCL is not the directly relevant tissue, including human height, body mass index, waist-hip ratio, osteoporosis-related traits, childhood asthma, and Crohn’s disease (Czarnecki et al., 1991; Dixon et al., 2007; Heid et al., 2010; Hsu et al., 2010; Lango Allen et al., 2010; Libioulle et al., 2007). In a recent study, it was estimated that ∼70% of cis-eQTLs in LCLs are shared with skin (Ding et al., 2010). In our study, we also marginally validated the findings on the JAK-STAT signaling pathway using the skin eQTLs (Zhang et al., 2011a).

Risk prediction model using existing replicated top SNPs

The discovery of novel genetic susceptibility loci by GWAS has raised expectations for predicting disease risk by analyzing multiple common alleles. In melanoma risk prediction, Fears et al. 2006 used at most seven variables to calculate attributable risks: 86% for men and 89% for women. To explore the potential possibility of adding a genetic component to the traditional prediction model, Whiteman and Green 2005 attempted to develop a risk-prediction tool taking into account various levels of risk factors such as current age, place of residence, number of melanocytic nevi and skin type, as well as the status of the melanocortin-1-receptor (MC1R) gene. Although much more needs to be carried out in building such a genetic prediction model, these attempts provided the first impression of how a practical, predictive tool might be constructed, and how it might look (Whiteman and Green, 2005).

One field in which prediction models are actively evaluated is breast cancer risk prediction. In the receiver operating characteristic curve analysis, area under the curve (AUC) is used as a measure of discrimination of cases and non-cases. Perfect classification of cases versus non-cases provides an AUC of 100%, while random classification of cases and non-cases provides an AUC of 50%. To examine the value of adding genetic variants to the traditional breast cancer model, Wacholder and colleagues evaluated 10 SNPs with established breast cancer associations as an alternative and as a supplement to the Gail model, the most widely used invasive breast cancer prediction model. In a model consisting of age, study, and entry year and four traditional risk factors (the number of first-degree relatives with a diagnosis of breast cancer, age at menarche, age at first live birth, and number of previous breast biopsies), the AUC was 58%, while adding 10 genetic factors to the same model increased the AUC to 61.8% (Wacholder et al., 2010). Earlier last year, Zitteren et al. built a genetic prediction model of breast cancer using SNPs from a review of meta-analyses and GWAS. Among the 96 SNPs, 41 were nominally significant at a 0.05 level. The AUC estimated by simulation was 0.67 for 41 SNPs (Van Zitteren et al., 2011). Although these improvements seem modest compared with traditional models, it should be realized that many of the variables in the model (such as family history, the number of the woman’s first-degree relatives diagnosed with breast cancer, and race/ethnicity) have certain components of genetic susceptibility. Adding more SNPs can only increase the role of genetic contribution that cannot be explained by the traditional characteristics. Furthermore, the higher the AUC, the more difficult it becomes to improve.

Sequencing

More recently, various novel sequencing technologies are being developed with the goal of reducing costs by several orders of magnitude (Shendure et al., 2004). It has been expected that these ultra-low-cost sequencing technologies will revolutionize genetic studies in the near future. The feasibility of deep resequencing, exome sequencing, and whole-genome sequencing at a low price will allow the discovery, validation, and assessment of genetic markers in populations, leading to comprehensive discovery of genetic alterations (Davey et al., 2011). The application of comprehensive sequencing approaches can help fine-map the known regions, refine the signals, and identify new loci, which will substantially move the field of personalized medicine forward (Burgess, 2011). In addition, the ability of sequencing tumor DNA samples is a great advantage in cancer research. Improvements in methods of computational, biological, and clinical analyses are the main challenges in making sense of the huge amount of genetic data (Meyerson et al., 2010).

References

  1. Top of page
  2. Summary
  3. Introduction
  4. GWAS study design
  5. Utilizing GWAS from other traits
  6. Post-GWAS research
  7. References