Genetic variants in hypoxia‐inducible factor pathway are associated with colorectal cancer risk and immune infiltration

Abstract Hypoxia‐inducible factor (HIF) pathway genes influence tumorigenesis and immune status. However, the associations between genetic variants in hypoxia‐related genes and colorectal cancer risk and the immune status of hypoxia‐associated genes in colorectal cancer have not been systematically characterized. The associations between genetic variants and colorectal cancer risk were evaluated in Chinese, Japanese and European populations using logistic regression analysis. The relationships between target genes and tumour immune infiltration were predicted by Tumour Immune Estimation Resource (TIMER). We found that rs34533650 in EPAS1 was associated with colorectal cancer risk (OR = 1.43, 95% CI = 1.20–1.70, P (FDR) = 8.35 × 10−4), and this finding was validated in two independent populations (Japanese: OR = 1.07, 95% CI = 1.01–1.15, p = 3.38 × 10−2; European: OR = 1.11, 95% CI = 1.03–1.19, p = 6.04 × 10−3). EPAS1‐associated genes were enriched in immune‐related pathways. In addition, we found that EPAS1 copy number variation (CNV) was associated with the degree of infiltration of immune cells and observed correlations between EPAS1 expression and immune cell infiltration levels in colorectal cancer. These results highlight that genetic variants of hypoxia‐related genes play roles in colorectal cancer risk and provide new insight that EPAS1 might be a promising predictor of colorectal cancer susceptibility and immune status.


| INTRODUC TI ON
Colorectal cancer presents a major health burden worldwide, 1 attracts attention due to its high incidence and mortality rates, 2 and presents an early-onset trend. 3The number of colorectal cancer patients in China is rapidly increasing, 4 mainly due to the interaction of genes and the environment. 5Through genome-wide association studies (GWASs), multiple independent variants in colorectal cancer have been identified, 6 and most of the variants are believed to act by regulating gene expression. 7Genetics and immunity play an important role in colorectal cancer. 8poxia activates several adaptive responses, which are mainly mediated by HIF1α and HIF2α. 9Previous studies suggested that hypoxia-induced changes affect tumour immune microenvironment. 10poxia-inducible factor (HIF) pathway genes influence the expression of gene in both tumour cells and immune cells and affect tumour progression and treatment response. 11The HIF pathway, including HIF-1α, EPAS1, HIF-3α and ARNT, is associated with disease progression and adverse clinical outcomes in colorectal cancer. 12EGLN1, EGLN2 and EGLN3 regulate the activity of HIF-1α and EPAS1, which are widely expressed in cells. 13The loss of Von Hippel-Lindau (VHL) impacts the stabilisation of HIF-1α and EPAS1 and activates the signals that mediate adaptive responses. 14EPAS1 encodes a subunit of HIFs 15 and shows multifaceted effects, including regulation of angiogenesis, haemoglobin concentration and erythrocytosis. 16EPAS1 affect immune infiltration in paraganglioma and pheochromocytoma. 9e immune microenvironment can be used to predict the treatment outcomes and curative effects, 17 and the treatment response and prognosis of colorectal cancer are associated with the immune microenvironment. 18In addition, the outcome and progression of many types of diseases are affected by single nucleotide polymorphisms (SNPs), 19 and studies have demonstrated that SNPs in HIF-1α are associated with susceptibility to multiple cancers. 20However, few studies have systematically elucidated the correlations between genetic variants in HIF pathway genes and the susceptibility of colorectal cancer.
In this study, we aimed to assess the relationships between SNPs in HIF-associated genes and colorectal cancer risk in a discovery set and two validation sets and elucidate possible immunoregulatory mechanisms.Individuals without expected relatedness were retained for analysis. 21,22Written informed consent was given by all recruited patients, and the projects were approved by the relevant research ethics committees at each institute.The characteristics of the three groups are shown in Table S1.
The differentially expressed genes were identified using The Cancer Genome Atlas (TCGA) database and met the following criteria: (a) call rate > 95%; (b) fold change >1.5 and (c) p < 0.05.To further identify the genes associated with colorectal cancer, we performed principal component analysis (PCA) and orthogonal partial least square-discriminate analysis (OPLS-DA) on HIF-related genes using data from the TCGA database.
Genomic DNA was isolated from peripheral blood samples.For genotyping, Illumina Human Omni ZhongHua Bead Chips were used.
Eighteen SNPs were included after removing the LD redundancy and functional annotation in the genotyping stage.

| Prediction of target genes at single-cell level
The single-cell data was used for analysis to explain the functional states of cancer cells.At the single-cell level, gene expression was predicted using colorectal tissues in The Human Protein Atlas (HPA, https:// www.prote inatl as.org/ ).The role of target genes in 14 crucial functional states was predicted by the CancerSEA database (http:// biocc.hrbmu.edu.cn/ Cance rSEA/ ).

| TME immunoregulation analysis
Immunoregulation analysis was performed using the Tumour Immune Estimation Resource (TIMER, https:// cistr ome.shiny apps.io/ timer/ ). 30e correlations between target gene expression and the abundance of TME-infiltrating immune cell types were evaluated.The correlations between immune markers in immune cell types and target gene were also assessed.Adjustments were made for tumour purity, which has been proven to influence the analysis of immune infiltration.

| Statistical analysis
To assess the differences between colorectal tumour tissues and normal tissues, the PCA model was used.The variable importance in the projection (VIP) value was calculated to indicate the contribution of genes to the classification using the OPLS-DA model, and the VIP value >1.0 was considered significant.The quality of the OPLS-DA model was based on the parameter R 2 , which is known as the goodness of fit of the model (cut-off value ≥0.50).Unconditional logistic regression models were used to assess the associations between genetic variants and colorectal cancer risk, with adjustment for age, sex, PC1 and PC2 (where appropriate).For correction, the false discovery rate (FDR) method was used.The associations between different haplotypes and colorectal cancer risk were assessed by haplotype analysis.The heterogeneity of different studies was evaluated using I 2 , and I 2 provides an estimate of the amount of variance.
Multiplicative and additive effects were used to assess the interaction effects between gene-environment interactions.The effects of genetic variants on the expression of genes were performed by the expression quantitative trait loci (eQTL) analysis.The differences in candidate gene mRNAs (log2 transformed) were compared using the TCGA dataset, GEO database and in-house data, and the gene expression differences were estimated using Student's t-test.Cox regression analyses were used to evaluate the associations between EPAS1 expression and the overall survival (OS) of patients with colorectal cancer.To investigate the potential mechanisms of target genes, gene set enrichment analysis (GSEA) was used.All analyses were performed with R 4.0.2,PLINK 1.90 and SIMCA software (version 13.0).p < 0.05 was considered statistically significant.

| Associations of candidate SNPs with colorectal cancer risk
Figure 1A shows the process of this study.Seven differentially expressed genes were identified among these 16 key genes using TCGA database, as shown in Figure 1B.PCA and OPLS-DA were used to identify the genes associated with colorectal cancer.Notably, the adjacent mucosa was separated from tumour samples (Figure 1C).
Clustering between these two groups was shown through an OPLS-DA score plot (Figure 1D), reflecting many differentially expressed genes between them.Finally, five genes (VIP value >1.0) contributed to the difference between the two groups and were differentially expressed between them (Figure 1E).
After a systematic process of screening SNPs, a total of 18 SNPs in five cancer-related genes were obtained to explore the associations with colorectal cancer risk.The functions of these SNPs are shown in Tables S2.There were five SNPs associated with susceptibility to colorectal cancer in an additive model.After FDR correction, only two SNPs increased the susceptibility to colorectal cancer (rs34533650: P (FDR) = 8.35 × 10 −4 , rs6753127: P (FDR) = 5.91 × 10 −3 , Table S3).
Notably, rs34533650 and rs6753127 were located in the EPAS1 gene in low disequilibrium (r 2 ≤ 0.6).These two SNPs were combined to calculate the number of risk alleles.As shown in Table S4 S5).
We subsequently evaluated the two SNPs in two independent validations.As shown in Table 1, rs34533650 increased colorectal cancer risk in the Japanese and European populations (OR = 1.07, 95% CI = 1.01-1.15,p = 3.38 × 10 −2 ; OR = 1.11, 95% CI = 1.03-1.19,p = 6.04 × 10 −3 , respectively), and the meta-analysis showed the same trend (OR = 1.16, 95% CI = 1.03-1.30,p = 1.17 × 10 −2 ).The MAF values of the East Asian and the European population in the 1000 Genomes Project were consistent with the MAF of the Chinese population and European population in this study Table S6).To reveal the associations between the expression levels of EPAS1 and these two SNPs, the eQTL analysis was conducted.However, the mutant genotype of rs34533650 tended to decrease EPAS1 expression using in-house database (p = 2.01 × 10 −6 , Figure S1).Similarly, the most significant correlation existed between SNP rs34533650 and susceptibility to colorectal cancer through the gene-based analysis in Table S7.
In different subgroups, stratified analysis was performed to evaluate associations between genetic variants and the risk of colorectal cancer (Table S8 and S9).The mutant genotype associated with colorectal cancer risk in the subgroups of age, sex, smoking status, alcohol consumption status, tumour site, tumour grade and Dukes stage.We also performed interaction analysis on EPAS1 rs34533650 and basic characteristics, and no significant multiplicative or additive interactions were observed.There were no statistically significant differences for the distributions of allele frequencies were observed in the tumour site, tumour grade and Dukes stage (Table S9).

| Potential regulatory function of rs34533650
The annotation of rs34533650 is shown in Figure S2A using data derived from HCT116 cells, and the potential functions of the target SNP were predicted by in silico analysis.The SNP rs34533650 was located in the intron, which was considered to have enhancer states in 14 cell types (Table S10).There was no evidence to suggest that rs34533650 affects EPAS1 splicing through the CancerSplicingQTL database (Table S2).EPAS1 rs34533650 was bound to BARX1, POU4F2 and POU4F3 according to RegulomeDB (Figure S2B).After phenome-wide association study (PheWAS) analysis for rs34533650, we found that the target SNP was associated with multiple diseases, as shown in Table S11.

| Expression levels of EPAS1
According to the Oncomine database, EPAS1 was low expressed in multiple tumour tissues in Figure 2A.The level of EPAS1 expression was low in tumour tissues using the TCGA database, TCGA paired database, GSE113513, GSE32323, GSE21510, GSE8671 and in inhouse database (Figure 2B-H).Overall, the mRNA levels of EPAS1 were significantly decreased in tumour tissues [standard mean deviation (SMD) = 0.90, 95% CI = 0.71-1.09), Figure 2I].We predicted the role of EPAS1 in crucial functional states in colon cancer at the single-cell level and observed positive correlations between EPAS1 and angiogenesis, quiescence, differentiation, epithelial-mesenchymal transition (EMT), metastasis and hypoxia, as shown in Figure S3.
We also assessed the relationships between EPAS1 expression and clinicopathological or demographic characteristics.EPAS1 expression was lower in tumour samples (in GSE44076, Figure S4A).
No significant difference was observed between subgroups of age (in GSE44076, Figure S4B).Compared with grade I or II, EPAS1 was expressed at low levels in grade III (in the GSE17536 database, Figure S4C).EPAS1 expression was significantly decreased in tissues with a high degree of lymph node involvement (in GSE21815, Figure S4D) and varied between different levels of malignant tissue in the GSE21510 database (Figure S4E), and we found similar phenomenon in the TCGA database (Figure S4F).These results indicated that EPAS1 may be involved in tumour progression.The patients with low EPAS1 expression had slightly shorter OS times than those with high levels of EPAS1 expression, although the p value was larger than 0.05 (in GSE12945, Figure S5).EPAS1 expression was enriched in enteroendocrine cells according to single-cell sequencing data from colorectal tissues in the HPA database (Figure 2J).

| Correlation between EPAS1 expression and infiltration of immune cells
Hypoxia was associated with the activity of immune cells in the tumour microenvironment.Based on the expression of EPAS1, multiple immune-related gene sets and cancer-related pathway were enriched in colorectal cancer through the GSEA analysis including intestinal immune network pathway, B cell receptor pathway, T cell receptor pathway, primary immunodeficiency pathway, natural killer cell pathway, cell adhesion molecule pathway and pathways in cancer pathway in Figure 3A.Interestingly, we found that EPAS1 copy number variation (CNV) impacted the infiltration levels of B cells, CD8+ T cells, CD4+ T cells, macrophages, neutrophils and dendritic cells in colon adenocarcinoma (COAD) data (Figure 3B).At the same time, we analysed the correlations between EPAS1 expression and the levels of immune infiltrating cells in the colorectal tumour.In COAD data, TA B L E 1 Association analyses between rs34533650 in EPAS1 and colorectal cancer risk in discovery and validation stages.the expression of EPAS1 was correlated with the expression levels of B cells, CD8+ T cells, CD4+ T cells, macrophages, neutrophils and dendritic cells (Figure 3C).As shown in Figure 3D, EPAS1 expression was correlated with the levels of CD8+ T cell, macrophages, neutrophils and dendritic cells in rectal adenocarcinoma (READ).

| Correlation analysis between EPAS1 expression and various immune markers
The correlations between EPAS1 expression and immune signatures were validated using the TIMER database.Immune cells were identified according to the gene markers in Table 3.The values were adjusted based on tumour purity, which affected the dissection of immune infiltration.EPAS1 expression was associated with the vast majority of immune markers for different types of immune cells in colorectal cancer.The correlations between EPAS1 expression and multiple markers of T cells were examined.The mRNA levels of EPAS1 were significantly correlated with 27 of 32 T cell markers in the COAD dataset.In the READ dataset, 19 of 32 T cell markers were correlated with mRNA expression levels of EPAS1.
Furthermore, the relationship between EPAS1 expression and the levels of hub T cell checkpoints was investigated.In colorectal cancer, EPAS1 expression was significantly correlated with the expression of well-known T cell checkpoints including PD-1, PD-L1 and CTLA-4 (Figure 4).These findings suggest that EPAS1 expression is significantly

| DISCUSS ION
In the present study, we investigated the associations between genetic variants in HIF pathway genes and colorectal cancer risk.EPAS1 rs34533650 significantly increased colorectal cancer risk in the Chinese population, and this result was validated in the Japanese and European populations.EPAS1-related genes were enriched in immune-related pathways, and we found that the mutant of EPAS1 was associated with the degree of infiltration of immune cells.Moreover, significant correlations were observed between EPAS1 expression and immune marker genes in immune infiltrating cells.These results suggest that EPAS1 might be promising predictor of colorectal cancer susceptibility and immune regulation.
The HIF pathway plays a crucial role in tumorigenesis and progression 31 and causes a series of responses by mediating hypoxia.
Hypoxia is linked to diseases such as cancer, diabetes, and inflammation. 32Multiple cancers are associated with hypoxia. 33The relationships between genetic variants in HIF-1α and diseases have been estimated in a few reported studies.Genetic variants of HIF-1α were connected with colon or rectal cancer risk, 34 suggesting that genetic polymorphisms in HIF pathway genes may be connected with colorectal cancer.However, few studies analysed the associations between hypoxia-related genes and colorectal cancer risk, and we identified rs34533650 as a novel colorectal cancer risk factor in the present study.In the Chinese population, compared with the most common haplotype AC, the haplotype GC in the EPAS1 gene significantly increased colorectal cancer risk.Moreover, SNP rs34533650 increased colorectal cancer risk in two independent replication cohorts, including the Japanese population and European population.
However, heterogeneity existed among the three populations.Many GWAS loci are duplicated in different populations but show heterogeneity in effect sizes.The frequencies of the alleles differed, suggesting that the loci are genetically heterogeneous in multiethnic populations, which is probably due to the different population histories and environmental factors of individuals. 35The populations of subjects were recruited from different ethnicities, which might result in heterogeneity of different populations.To exclude the effect of heterogeneity, we used the random-effects model for meta-analysis.
Colorectal cancer risk can be influenced by multiple factors, and stratified analysis was conducted in different subgroups.Mutant genotypes increased colorectal cancer risk in all subgroups of age.
Colorectal cancer has a higher incidence rate in males than in females, 36 and males with mutant genotypes significantly increased colorectal cancer risk than females.Between rs34533650 and colorectal cancer susceptibility, a more significant association was observed in smokers than non-smokers.The increased risk was more TA B L E 3 Correlation analysis between EPAS1 and marker gene in immune cell using TIMER.BARX1 38 and POU4F3 39 are both associated with the methylation of cancer.PheWAS are high-throughput studies, which was used to systematically examine the impact of one or many genetic variants across on a variety of human phenotypes. 40After PheWAS analysis, EPAS1 rs34533650 was associated with multiple diseases, which suggested that rs34533650 is a potential novel biomarker.
EPAS1 is a heterodimeric transcription factor complex. 41 The predominant regulators of the hypoxic response consist of HIF-1α and EPAS1, both in cells and organisms. 42EPAS1 is downregulated at the mRNA level in cancerous tissues, 43

TA B L E 3 (Continued)
Hypoxia is related to the immune status of tumour cells.EPAS1 plays a valuable role in predicting the outcome of PD-L1 treatment. 9P rs34533650 influences EPAS1 expression in an allele-specific manner, and EPAS1 expression was associated with the infiltration levels of macrophages, 44 suggesting that genetic variants may influence immune infiltration through the expression of EPAS1 gene associated with colorectal cancer risk.Through GSEA, intestinal immune-associated pathway and cancer-related pathway were enriched based on EPAS1 expression in colorectal cancer.Notably, in COAD data, the degree of infiltration of various immune cells was associated with EPAS1 CNV suggesting that genetic variants influence immune status.In single-cell data, EPAS1 expression enriched in enteroendocrine cells plays a vital role in intestinal immunity. 45 the colorectal tumour, EPAS1 expression associated with immune infiltrating cells in both COAD and READ data.We also explored the correlations between EPAS1 expression and immune signatures.
Immune cells can be identified by gene markers.EPAS1 expression was associated with most immune markers in various immune cells.
T cells were shown to be associated with immune status, 46 and the correlations between EPAS1 expression and multiple T cells were In the discovery study, 1150 colorectal cancer cases and 1342 controls were recruited in the First Affiliated Hospital of Nanjing Medical University and the Affiliated Nanjing First Hospital.Subjects were excluded due to unexpected duplications or genetic relatedness.All of subjects were frequency-matched based on age and sex and signed written informed consent.Cases were diagnosed with colorectal cancer via pathological examination, and patients with previous cancer were excluded.Unqualified subjects were excluded based on physical examination.Questionnaires to collect information on demographic factors were administered through face-toface interviews.This research was approved by Nanjing Medical University Institutional Review Board.In addition, summary estimates from the Japanese population [(6692 cases from the BioBank Japan Project (BBJ) and 27178 controls from JPHC (Japan Public Health Center), J-MICC (Japan Multi-Institutional Collaborative Cohort) and ToMMo (Tohoku Medical Megabank Organization)] and genotyping data from the European population [4461 cases and 4140 controls from Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) deposited in the database of Genotypes and Phenotypes (dbGaP) phs001315.v1.p1] were used to verify the results of the Chinese population.

F
I G U R E 1 The procedures of selecting colorectal cancer functional genetic variants in the HIF pathway.(A) The flow chart for selecting potential functional genetic variants.(B) The fold change in the expression of key HIF pathway genes.(C) Principal component analysis (PCA).(D) Orthogonal partial least squarediscriminant analysis (OPLS-DA).Pp represents the predictive component, and Po represents the orthogonal component.(E) Scatter plot from the OPLS-DA model using the variable influence on projection (VIP) score.All data were obtained from the TCGA database.FC, fold change; CHB, Han Chinese in Beijing; MAF, minor allele frequency; HWE, Hardy-Weinberg equilibrium; LD, linkage disequilibrium; SNP, single nucleotide polymorphism; FDR, false discovery rate.GUO et al.
Abbreviations: CI, confidence interval; OR, odds ratio.a p-value of the additive model with adjustment for age, sex, PC1 and PC2 (where appropriate) in the logistic regression model.

F I G U R E 3
Immune infiltration analysis based on EPAS1.(A) GSEA based on EPAS1 expression using the TCGA database.(B) EPAS1 CNV affects the infiltrating levels of B cells, CD8+ T cells, CD4+ T cells, macrophages, neutrophils and dendritic cells in COAD.(C-D) Analysis of the associations between EPAS1 expression and the levels of infiltrating immune cells in COAD and READ tissues based on TIMER.

5 |F I G U R E 4
evaluated.EPAS1 expression level was associated with the vast majority of T cell markers in colorectal cancer data.The EPAS1 expression was positively correlated with the expression of immune checkpoints for T cells in colorectal cancer.These results suggested that EPAS1 expression was associated with immune infiltration and indicated that EPAS1 plays an important role in immune escape in the microenvironment of colorectal cancer.However, there are some limitations to this study.Lifestyle factors, such as diet and other lifestyle factors, were not collected during the design of this study, so they were not analysed accordingly in the stratified analysis.Molecular biology experiments are required to investigate the specific regulatory mechanisms.CON CLUS IONIn summary, our research showed that the genetic variant rs34533650 A > G in EPAS1 was associated with colorectal cancer risk.EPAS1 affects immune-related pathways, and the CNV of EPAS1 is related to the infiltration of immune cells.EPAS1 The relationship between EPAS1 expression and the expression of T cell checkpoints including PD1, PD-L1 and CTLA-4 using the TIMER database.Scatterplots of the correlations between EPAS1 expression and PD-1, PD-L1 and CTLA-4 in COAD (A) and READ (B) using the GEPIA database.

Population Cases/controls rs34533650 genotypes (AA/AG/GG) MAF P P a OR (95% CI) a P b Cases Controls Case/control
Note: Some controls are missing in the discovery stages.
Abbreviations: CI, confidence interval; NA, not available; OR, odds ratio.a p-value of the additive model with adjustment for age, sex, PC1 and PC2 (where appropriate) in the logistic regression model.b p-value for heterogeneity.c A meta-analysis was performed for combined group.
Association analyses between rs34533650 in EPAS1 and colorectal cancer risk in Chinese populations.
TA B L E 2