Mitochondrial DNA haplogroup M7 confers a reduced risk of colorectal cancer in a Han population from northern China

Abstract Mitochondria are central eukaryotic organelles in cellular metabolism and ATP production. Mitochondrial DNA (mtDNA) alterations have been implicated in the development of colorectal cancer (CRC). However, there are few reports on the association between mtDNA haplogroups or single nucleotide polymorphisms (SNPs) and the risk of CRC. The mtDNA of 286 Northern Han Chinese CRC patients were sequenced by next‐generation sequencing technology. MtDNA data from 811 Han Chinese population controls were collected from two public data sets. Then, logistic regression analysis was used to determine the effect of mtDNA haplogroup or SNP on the risk of CRC. We found that patients with haplogroup M7 exhibited a reduced risk of CRC when compared to patients with other haplogroups (odds ratio [OR] = 0.532, 95% confidence interval [CI] = 0.285–0.937, p = 0.036) or haplogroup B (OR = 0.477, 95% CI = 0.238–0.916, p = 0.030). Furthermore, haplogroup M7 was still associated with the risk of CRC when the validation and combined control cohort were used. In addition, several haplogroup M7 specific SNPs, including 199T>C, 4071C>T and 6455C>T, were significantly associated with the risk of CRC. Our results indicate the risk potential of mtDNA haplogroup M7 and SNPs in CRC in Northern China.


| INTRODUC TI ON
Colorectal cancer (CRC) is one of the most common cancers in both men and women, accounting for nearly 10% of the global cancer incidence. 1 Approximately 274,800 new CRC cases and 132,100 CRC-related deaths have been estimated to occur in China each year, accounting for nearly one-tenth of the global CRC burden. 2 Assessing the risk of CRC and then screening is the most powerful public health tool to reduce mortality or incidence. 3,4 The major risk factors that may influence the development of the CRC include age, male gender, obesity, diet high in fat as well as the medical history of inflammatory bowel disease (IBD), diabetes mellitus and so on. 5 Several prediction models have been developed to quantify CRC risk based on clinical or laboratory data. However, all these models still have obvious limitations, such as the restricted age range of subjects, selection bias of subjects and incomplete assessment of risk factors in these studies. [5][6][7] Meanwhile, the diagnosis of CRC mainly depends on imaging techniques, which are less efficient for the early detection of CRC. Accurate early detection or risk prediction contributes to improved CRC survival. Unfortunately, to date, there are no favourable biomarkers for the risk prediction of CRC.
Intensive efforts have been made to understand the genetic risk factors of CRC. To date, over 40 nuclear genome variants associated with CRC risk have been identified, including SNP rs10911251, rs1321311, rs1035209 and so on. [8][9][10] However, these susceptibility loci account for only about 8%-16% of CRC cases, suggesting that additional genetic risk factors of CRC may remain to be explored. 8 Mitochondria are central eukaryotic organelles in cellular metabolism and ATP production. Notably, multiple metabolic deregulations have been linked to the pathogenesis of CRC, including amino acid metabolism, glucose metabolism and lipid metabolism. 11 Somatic mtDNA mutations and copy number alterations have also been frequently observed in CRC samples. 12 However, the potential involvement of germline mtDNA variations in CRC development is less known.
Germline mtDNA variations are often characterized by assigning haplogroups, which can be defined by a certain set of mtDNA variants and reflect specific ancestral populations and geographic origins. At present, it has been proved that mtDNA haplogroups are associated with the risk of various cancers. [13][14][15][16] Grzybowski et al.
found that haplogroup R and its diagnostic mutations at positions 12705 and 16223 are associated with higher frequencies in Polish CRC patients when compared to healthy individuals. 17 Certain mtDNA SNPs have also been associated with an increased risk of CRC in Iranian or European Americans. 18

| Capture-based mtDNA nextgeneration sequencing
To obtain the full spectrum of germline mtDNA variations in CRC patients, we performed capture-based mtDNA next-generation sequencing as previously described. 24 Briefly, genomic DNA (1 μg for each sample) was randomly sonicated by Sonicator (Scientz98) to obtain fragments mainly distributed at 300-500 bp. The sonicated DNA fragments were end-repaired, ligated with sequencing adapters and amplified to generate the whole genome sequencing (WGS) library. Then, the WGS libraries of 20 samples were mixed with homemade biotinylated mtDNA capture probes for hybridization.
After PCR amplification and purification, the mtDNA capture quality was determined using agarose gel electrophoresis and real-time fluorescence quantification. Finally, the captured mtDNA libraries were sequenced on an Illumina X Ten platform using paired-end runs with 2 × 150 cycles (PE 150).

| mtDNA haplogroup and mtSNPs
The FASTQ preprocessor fastp software (version 0.20.0) 25 was used to perform quality control and adapter trimming on the raw sequencing data. BWA software (version 0.7.17-r1188) was used to map the trimmed reads. To minimize the contamination of nuclear mtDNA segments, trimmed reads were mapped to the Revised Cambridge Reference Sequence (rCRS) of mtDNA and the reference genome (hg19). Next, Picard tools (version 1.81) were used to mark and remove duplicate reads. To reduce the false-positive rate of nearby indel positions, local realignment was performed using IndelRealigner in GATK software (version 3.2-2). Then, mtDNA sequences were extracted into FASTA format with the Perl script written in our laboratory. The obtained FASTA sequences were analysed using MitoTool (www.mitot ool.org) 26 to determine mtDNA haplogroups and SNPs. Macro-haplogroups and micro-haplogroups were annotated based on the PhyloTree (www.phylo tree.org). 27 The variation which was observed in both tumour and paired non-tumour tissues was defined as mtDNA SNP. SNPs with minor allele frequency (MAF) <5% in cases and controls were excluded in further analysis.

| Control cohorts
Two independent Han Chinese cohorts were used to evaluate the risk of CRC. First, we collected information on mtDNA haplogroups and SNPs from a published data set, which included 562 normal individuals in Shaanxi Province of Northern China. 28 To further validate the results, we also collected another control cohort from the 1000 Genome Project, 29

including 249 Chinese (Southern Han Chinese, Beijing Han
Chinese from Northern China and Denver Han Chinese from Colorado).

| Statistical analysis
In a case-control study, haplogroups with frequency>1% in both the controls and CRC patients were analysed to evaluate the effect of common mtDNA haplogroup on CRC. To estimate the relative risk,

| Patients characteristics
The clinical characteristics of the 286 CRC patients are summarized in had serum CEA <5 ng/ml. The prevalence of major haplogroups is also listed in these different clinicopathological categories (Table S1).

| Association between mtDNA haplogroups and colorectal cancer risk in northern Han population
To investigate the association between genetic mtDNA variation and CRC risk, mtDNA haplogroups were annotated in 286 CRC cases and 562 Northern Han Chinese healthy controls (cohort 1). As shown in When compared with other haplogroups, haplogroup M7 had a much lower percentage in CRC patients (5.24%, n = 15) than in control cohort 1 (9.43%, n = 53), which corresponded to a significantly

| Association between haplogroup M7 and CRC risk with validation cohort
To further validate the association of haplogroup M7 with the risk of CRC, we used another Han Chinese data set from the 1000 genome project, which consisted of 249 individuals (Southern Han, 55; Northern Han, 121; and Denver Han, 73). As shown in Table 4

| The association between mtDNA SNPs and CRC risk
To clarify the association between mtDNA SNPs and CRC risk, we screened for common SNPs with allele frequencies higher than 5% in our cohorts. Using this criterion, 53 SNPs were identified in control cohort 1 (Table S2), 72 SNPs were identified in control cohort 2 (Table S3) and 58 SNPs were identified in the combined control cohort (Table S4).
As shown in Figure 1, three haplogroup M7-specific SNPs, in- TA B L E 2 Association between mtDNA haplogroups and CRC risk with other haplogroups as reference group 4071C>T and 6455C>T did not significantly indicate a decreased risk of CRC in control cohort 1 (p = 0.062 and p = 0.073, Figure 1A), which may partially be due to sampling bias. However, these three haplogroup M7-specific SNPs were significantly associated with the risk of CRC in control cohort 2 ( Figure 1B). Another haplogroup M7 specific SNP 9824T>C also showed a significant association with CRC only in control cohort 2 (p = 0.019), because 9824T>C was not detected in control cohort 1.
Moreover, four SNPs catalogued to non-M7 haplogroup, including 14783T>C, 16129G>A, 16311T>C and 16362T>C, also showed significant associations with CRC risk in control cohort1 (p = 0.002, p = 0.015, p < 0.001, p = 0.029), but these associations may be artificial due to the complete lack of support in control cohort 2. Thus, the association of haplogroup M7 specific SNPs with CRC provides further evidence that haplogroup M7 is associated with reduced CRC risk in the Han Chinese population.

| DISCUSS ION
In this study, we comprehensively analysed the association between the mtDNA haplogroups, SNPs and CRC risk in the Northern Han This study has some limitations. One of the limitations was the small sample size. As mitochondrial haplogroup M7 only accounted for 5.2% of our CRC patient cohort and 9.4% and 11.2% in the two control cohorts, it is necessary to validate our findings in a large, independent case-control study. The control cohort 2 was consisted with Han samples from south and north China, which will also introduce some bias to the results as the higher distribution frequency of haplogroup M7 in south China. Furthermore, the association between mtDNA haplogroup M7 and CRC risk should be further validated by functional studies, which are greatly hindered by our inability to accurately manipulate the mitochondrial genome. In addition, since the majority of mitochondrial respiratory chain proteins are encoded in the nuclear genome, the association between haplogroup M7 and reduced CRC risk may also be affected by the genetic background of the host's nuclear genome.
In conclusion, our study analysed the relationship between mitochondrial genetic variations and the risk of CRC. It revealed for the first time that mtDNA haplogroup M7 reduced the risk of CRC in the Northern Han Chinese population. Our study points to the importance of the mitochondrial genetic background in the development of colorectal cancer.

ACK N OWLED G EM ENT
We thank the patients for their participation in the study.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that supports the findings of this study are available in the supplementary material of this article.