A functional variant on 20q13.33 related to glioma risk alters enhancer activity and modulates expression of multiple genes

Abstract Genome‐wide association studies (GWAS) have identified single‐nucleotide polymorphisms (SNPs) associated with glioma risk on 20q13.33, but the biological mechanisms underlying this association are unknown. We tested the hypothesis that a functional SNP on 20q13.33 impacted the activity of an enhancer, leading to an altered expression of nearby genes. To identify candidate functional SNPs, we identified all SNPs in linkage disequilibrium with the risk‐associated SNP rs2297440 that mapped to putative enhancers. Putative enhancers containing candidate functional SNPs were tested for allele‐specific effects in luciferase enhancer activity assays against glioblastoma multiforme (GBM) cell lines. An enhancer containing SNP rs3761124 exhibited allele‐specific effects on activity. Deletion of this enhancer by CRISPR‐Cas9 editing in GBM cell lines correlated with an altered expression of multiple genes, including STMN3, RTEL1, RTEL1‐TNFRSF6B, GMEB2, and SRMS. Expression quantitative trait loci (eQTL) analyses using nondiseased brain samples, isocitrate dehydrogenase 1 (IDH1) wild‐type glioma, and neurodevelopmental tissues showed STMN3 to be a consistent significant eQTL with rs3761124. RTEL1 and GMEB2 were also significant eQTLs in the context of early CNS development and/or in IDH1 wild‐type glioma. We provide evidence that rs3761124 is a functional variant on 20q13.33 related to glioma/GBM risk that modulates the expression of STMN3 and potentially other genes across diverse cellular contexts.

2018). These 25 loci, in total, are estimated to account for approximately 30% of heritable risk (Melin et al., 2017). The discovery of the biological mechanism underlying these risk variants has the potential to reveal novel insights into glioma development. However, characterization of the biological basis of risk has proven to be challenging, because few index GWAS single-nucleotide polymorphisms (SNPs) are themselves functional. The emerging picture is that most functional/causal SNPs associated with risk map to enhancers or promoters and lead to an altered gene expression (Biancolella et al., 2014;Fortiniini et al., 2014).
The top GWAS SNP at 20q13.33, rs2297440, maps to Intron 14 of the regulator of telomere elongation helicase 1 (RTEL1). It is unlikely that this SNP is functional/causal, as it does do not map to any functional elements, including enhancers, in astrocytes or cortical tissues (data not shown). We, therefore, tested the hypothesis that the functional/causal SNP(s) in this region were in linkage disequilibrium (LD) with this top GWAS SNP. To identify candidate functional variants within the 20q13.33 GWAS locus, we systemically screened SNPs in LD with rs2297440 that intersected with regulatory elements/enhancers as cataloged in publicly available annotations. Each candidate enhancer/SNP region identified was tested in luciferase reporter assays for SNP-dependent allele-specific effects on enhancer activity. Using this approach, we identified a functional SNP rs3761124 that mapped to an enhancer region on 20q13.33. This SNP had allele-specific effects on enhancer activity in cell-based luciferase reporter assays. Several candidate target genes of this enhancer were identified after CRISPR-Cas9 deletion including stathmin 3 (STMN3). We further demonstrated that this variant correlated with the expression of several cis genes using eQTL analysis, including STMN3 as the most consistent target gene across different cellular contexts.

| Region analysis
Several GWAS have shown an association between SNPs mapping to chromosome 20q13.33 and glioma risk (Kinnersley et al., 2015;Melin et al., 2017;Rajaraman et al., 2012;Shete et al., 2009). We identified an LD block of approximately 116 kb on chromosome 20q13.33, which included 120 SNPs (Figure 1) in LD with the lead SNP in the most recent GWAS meta-analysis (rs2297440; Melin et al., 2017). LD was determined by r 2 > = .6 in the CEU population. To identify candidate functional SNPs, we used University of California Santa Cruz Genome Browser (http://genome.ucsc.edu/; Kent et al., 2002) to overlay the SNPs in LD with rs2297440 with physiologically relevant histone ChIP-Seq peaks for H3K4me1 (from Encyclopedia of DNA Elements [EN-CODE] normal human astrocyte cell line GSM733710) and H3K27ac (from ENCODE normal human astrocyte cell line GSM733763 and ENCODE GBM cell line GSM1121878; Figure 1). All cloned candidate enhancer regions contained at least one SNP in LD (r 2 > = .6) with the lead SNP (rs2297440) and coincided with peaks of chromatin marks of enhancers in at least two relevant ChIP-seq datasets. Additional histone ChIP-Seq peaks for enhancer elements derived from normal human astrocytes, human glioblastoma cancer stem cells, and human glioblastoma cell lines were also used in these analyses (GSM1121881, GSM894065, GSM1515744, GSM2500170, GSM1121859, and GSM1121869; data are not shown). This analysis resulted in the identification of four candidate enhancer elements that each contained at least one SNP in LD with rs2297440 ( Figure 1).  were assayed for luciferase activity using the Dual-Luciferase Reporter Assay System (Promega), according to the manufacturer's instructions, and measured using a Synergy H1 Microplate Reader (BioTek). To quantify enhancer activity in cells transfected with candidate enhancer elements, luminescence resulting from the transcription and translation of firefly luciferase in the presence of luciferin was measured, and background luminescence in the absence of reagents was subtracted. To control for cell concentration, normalization with luminescence from constitutively active NanoLuc luciferase was performed (i.e., candidate enhancer/constitutive control). To control for assay artifact, normalization with luminescence from cells transfected with clones empirically found to have no enhancer activity was performed (i.e., candidate enhancer/negative control). Measurements for each enhancer were obtained in three wells (i.e., technical replicates) for four clones (i.e., biological replicates) for each of the two alleles observed in human populations (i.e., experimental conditions) on three separate days (i.e., experimental validations) and in two independent cell lines (i.e., experimental validations). For statistical testing, measurements from the two cell lines were considered separately, and regression analysis was performed with generalized estimating equations to account for repeated measurements of clones (Goldhoff et al., 2008;Zeger & Liang, 1986).

| CRISPR-Cas9 genome editing
Upstream and downstream CRISPR gRNAs (guide RNAs) were designed flanking the candidate enhancer regions, using https://www.  The region of chromosome 20 associated with glioblastoma risk. A detailed view of the region defined by LD block with lead SNP rs2297440, r 2 > .6 in CEU population, using the UCSC Genome Browser showing putative enhancer elements containing SNPs in LD with rs2297440. SNPs in LD are observed below genes in the region. Histone ChIP-Seq tracks for H3K27ac from normal human astrocytes (NHA), MGG8 glioblastoma stem cells, and H3K4me1 from NHA aligned below SNPs indicate potential enhancer elements. Region 1 denotes a region with no enhancer activity in luciferase assays. Region 2 denotes the allele-specific enhancer region, which includes rs3761124 (marked with an asterisk). Regions 3 and 4 denote regions that exhibited enhancer activity but were unaffected by haplotype. It should be noted that the size of the regions tested for enhancer activity is not to scale. LD, linkage disequilibrium; SNP, single-nucleotide polymorphism; UCSC, University of California Santa Cruz Kit (Qiagen) and enhancer deletion was confirmed with PCR amplifications (forward primer: 5′ GCCTGACCAACATGATGAAA, reverse primer: 5′ TGGCCAGTGAACCTCACTTC).

| Quantitative Real-Time PCR
RNA was isolated using Trizol reagent (Thermo Fisher) and cDNA was synthesized from 2 μg of total RNA using the High-Capacity Reverse Approval was obtained from the National Institute of Mental Health to use the control brain dataset of CMC release 1. These data were generated in postmortem brain tissues, previously verified to be free of any neurological diseases. The eQTL analysis included the genotyping and RNA-seq data of a total of 216 unique individuals of European ancestry. RNA-Seq FASTQ files of CMC were downloaded from https://www.synapse.org/ (syn2759792) and mapped to hg19 using STAR (Dobin et al., 2013), with Gencode v19 as the reference annotation. FeatureCounts from the Subread package (http:// subread.sourceforge.net/) was used to generate gene-level counts from the aligned reads. All genes included in the analysis had more than five reads across all samples, and less than 10 samples per gene had zero reads. The filtered counts of the target genes were normalized using the variance-stabilizing transformation in DESeq2 (Love et al., 2014). Genotyping data of rs3761124 were extracted from Illumina Infinium HumanOmniExpressExome array data (plink file), and the alleles were matched to the forward strand of GRCh37 reference genome using the Bcftools fixref plugin (http://www.htslib. org/doc/#publications). Linear regression was used to evaluate the association between rs3761124 and specific target genes discovered by quantitative RT-PCR, with age, gender, RIN scores, the first three principal components of genotype, and RNA-seq expression residuals as covariates.
To github.io/sra-tools/), followed by processing of FASTQ files using the same pipeline (as CMC dataset) to obtain, filter, and normalize gene counts. Data for the functional SNP rs3761124 were extracted from the processed genotype data in PLINK format (Purcell et al., 2007), and the alleles were matched with the forward strand of GRCh37 reference genome using the Bcftools fixref plugin. eQTL mapping was performed in the same way as normal brain tissues using linear regression, except the covariate age that was substituted as gestational age.
To evaluate eQTL after glioma is established, the combined cohort of GBM and LGG from TCGA was used (Ceccarelli et al., 2016

| Colocalization of eQTL and GWAS signals
To provide additional supporting data that the functional SNP

| Visualization of Hi-C chromosome conformation interactions
The 3D Genome Browser (http://3dgenome.org) was used to visualize Hi-C data in 20q13.33 in GBM cell line G583 (Johnston et al., 2019;Wang et al., 2018). The bait region contained rs3761124, and the browser extracted Hi-C data centered on the bait region and presented interaction events as peak signals in nearby or distal genomic regions; hence, virtual 4C data were constructed from Hi-C data, with resolution (bin size) at 5,000 bp.

| Candidate enhancer region characterization
All four putative enhancers shown in Figure 1 were cloned separately into luciferase enhancer activity vectors (

| CRISPR−Cas9 enhancer disruption
To provide evidence that the candidate functional SNP rs3761124 direction for which a TaqMan assay was available was tested by qPCR. No detectable levels of the following genes were observed in the GBM cell lines used in our study: KCNQ2 (potassium voltage-gated channel subfamily Q member 2), FNDC11, or ABHD16B.

| Colocalization of eQTL and GICC GWAS meta-analysis summary statistics
Using the 1000 Genome European population as a reference panel, COLOC results showed that the PP3 is 0 and PP4 is 0.82 for F I G U R E 2 Allele-specific enhancer activity of enhancer region 2. All enhancer regions seen in Figure 1 were cloned into a luciferase enhancer assay construct and tested for enhancer activity. Here, we show data for enhancer region 2 that includes SNP rs13761124 alleles T and C (four clones for each allele, three independent experiments for each cell line: (+) represents Experiment 1, (×) represents Experiment 2, and (*) represents Experiment 3. The construct with the T allele demonstrated statistically significantly higher activity than the C allele, as shown in box plots from LN-229 (

| Visualization of Hi-C chromosome conformation interactions
The virtual 4C constructed from Hi-C data generated from the GBM cell line, G583, is illustrated in Figure 5. It shows interactions, demonstrated as peaks, between the region containing rs3761124 and promoters of STMN3 and RTEL1.

| DISCUSSION
We provide evidence that rs3761124 is a functional SNP on 20q13.33 mapping to a risk enhancer using cell-based enhancer activity assays. We show that rs3761124 had allele-specific effects on enhancer activity in the forward direction only. We previously noted unidirectional, rather than bidirectional, allele-specific effects on enhancer activity in colorectal cancer functional studies (Biancolella et al., 2014;Fortiniini et al., 2014). We further show that this SNP affects glioma risk potentially through the altered expression of The identification of multiple target genes suggests that this putative risk enhancer interacts with multiple promoters, some of which may also depend on additional molecular stimuli. It is not unprecedented that there may be multiple gene targets of a risk enhancer. A previous report identified rs73001406 as a candidate functional variant for glioma on 11q23.3, with PHLDB1 and DDX6 as potential target genes (Baskin et al., 2015). We previously described a risk enhancer for colorectal cancer on 11q23.1 that correlated with the expression of three target genes (Biancolella et al., 2014). Other studies reported multiple gene targets of risk enhancers in cancers, such as prostate cancer (Huang et al., 2014) and breast cancer (Betts et al., 2017;Dunning et al., 2016;Ghoussaini et al., 2016).
STMN3 (also known as SCLIP or SCG10-like protein) is one of the members of the stathmin family of proteins that plays an important role in the regulation of microtubule stability (Charbaut et al., 2001).
In addition to our own finding, the importance of STMN3 in glioma is supported by other studies. For example, a recent transcriptomewide association study (TWAS), which used the Genotype-Tissue Expression Project (GTEx) data to build a gene expression model, identified STMN3 as a highly significant gene associated with the risk of adult glioma (4.54 × 10 -27 ; Atkins et al., 2019). Among 55 adult tissues analyzed through GTEx, the STMN3 expression is the highest in the 13 CNS tissues (GTEX portal access 18 June, 2020); furthermore, messenger RNA (mRNA) and protein of this gene are overexpressed in human glioma of all grades as compared with normal brain tissues (Zhang et al., 2015). The overexpression of STMN3 increased growth and mobility of glioblastoma cells, whereas STMN3 knockdown impaired cell growth, proliferation, invasion, and migration (Zhang et al., 2015). Another study reported that high-resolution chromosome conformation capture (Hi-C) data generated in H1 embryonic stem cell and neuronal progenitor cell lines revealed a physical interaction between the STMN3 promoter and the top GWAS SNP rs2297440 (Dixon et al., 2015;Labreche et al., 2018), which is in high LD (r 2 = .92) with the functional SNP rs3761124.
Therefore, among the several target genes identified in this study, STMN3 appears to be the most robust and consistently validated.
Our data also implicate RTEL1 and GMEB2 in glioma, but their role may be more context-specific. CRISPR deletion of the risk enhancer containing rs3761124 in the U-87 MG and LN-229 GBM cell lines (which are both IDH1 wild type) correlated with altered T A B L E 1 eQTL results of rs3761124: during early brain development (UCLA), in nondiseased adult brain (CMC), and IDH1 wild-type glioma (TCGA) expression of both genes. Consistent with this, rs3761124 correlated with altered expression of RTEL1 in IDH1 wild-type glioma and during early brain development but not in the CMC brain tissues.
This lack of supportive evidence for this gene in the adult normal brain is also seen in a recent TWAS study, which showed RTEL1 was not a significant target gene using the GTEx adult nondisease brain tissues (Atkins et al., 2019). As RTEL1 is a DNA helicase that maintains genomic stability directly by suppressing homologous recombination, it may be quiescent during normal adult brain, but becomes increasingly active during conditions of active cellular growth, such as gliomagenesis and/or during early neurological development. Therefore, our candidate functional variant rs3761124 may impact genomic stability through changes in activity of its risk enhancer, modulating the expression of RTEL1.
Another target gene found in selective context is GMEB2, which is a transacting factor that binds to glucocorticoid modulatory elements (GME) present in the Tyrosine Aminotransferase promoter, increasing sensitivity to glucocorticoid (Oshima et al., 1995). GMEB2 has been associated with prostate cancer, but its role in gliomagenesis is unknown. Nevertheless, dexamethasone, a common corticosteroid with a high glucocorticoid activity, has been found to significantly increase invasion, cell proliferation, and angiogenesis in vitro or in vivo in GBM stem cell lines, including GBM stem cells that are IDH1 wild-type (Luedi et al., 2018). Therefore, it is possible that GMEB2 may be a target gene in the brain only during conditions of cellular proliferation, such as at the time of early neurological development and during gliomagenesis.
F I G U R E 5 Visualization of chromosome interactions with rs3761124 in 20q13.33. Virtual 4C was constructed from Hi-C data of the GBM cell line, G583. The anchoring point was the location of rs3761124 and also the bait region. Interactions of the bait with genomic regions were highlighted as peaks in the virtual 4C presentation. The horizontal bar shows the location of the promoter of STMN3 and corresponds to its interaction peak (contained within the vertical bar). Another interaction peak is observed near the promoter of RTEL1, which is transcribed in the opposite direction of STMN3,to the right of the anchoring point. GBM, glioblastoma multiforme ALI ET AL.

| 85
Two additional genes were identified as potential target genes of the putative risk enhancer on 20q13.33 after CRISPR−Cas9 deletion including RTEL1-TNFRSF6B and SRMS. Neither of these genes was identified in any of our eQTL analyses. RTEL1-TNFRSF6B is a noncoding, readthrough transcript that is subject to NMD (Chang et al., 2007), but it is currently unclear how a change in mRNA decay will affect glioma development. SRMS belongs to a family of nonreceptor tyrosine kinases that have been involved in a number of cancers, including fibrosarcoma (Lin et al., 2013), eosinophilic variant of chromophobe renal cell carcinoma (Pagano et al., 2018), and breast cancer (Fan et al., 2015), but its function in gliomagenesis remains unknown. Additional studies will be required to determine if these genes are indeed relevant to glioma risk.
There are some limitations to our study. We cannot discount the possibility that there are additional functional SNPs on 20q13.33.
Although we used all available publicly available datasets to identify candidate enhancers, these may not have captured all relevant enhancers across this region. In addition, we restricted our analysis to SNPs with an r 2 > .6 (in CEU population) to the index SNP, and we may have missed functional SNP(s) that would be captured at a lower r 2 . Our in vitro assessment of enhancer activity was conducted in only two GBM cell lines and we do not know if the candidate enhancers that showed no activity in these cells would have been active in other cell lines. Similarly, our CRISPR deletion experiments were conducted in GBM cell lines and additional/different target genes of this enhancer may be seen in more relevant "normal" cells of the brain. Finally, our eQTL analyses were restricted to data available, which may not capture all relevant cellular contexts to capture associations with the functional SNP. Despite this, we believe we provide strong supportive evidence for the identification of at least one functional SNP relevant to gliomagenesis on 20q13.33.
In summary, we report identification and characterization of a functional SNP, rs3761124, that affects the activity of an enhancer on 20q13.33 that leads to modulated expression of multiple genes implicated in glioma risk. Our results are concordant with reports by others that a multiple-gene, rather than a single-gene, association with GBM is present at 20q13.33 (Atkins et al., 2019). As the effect of these genes on glioma growth and development has not been evaluated in depth, further studies to evaluate their molecular mechanisms may lead to novel therapeutic strategies in the future.

ACKNOWLEDGMENTS
The work is supported by the National Institutes of Health/National Cancer Institute grant CA207972 (Rose K. Lai and Graham Casey) and T32 5T32CA163177 (Christopher H. Dampier).