Somatic mutation profiles as molecular classifiers of ulcerative colitis‐associated colorectal cancer

Ulcerative colitis increases colorectal cancer risk by mechanisms that remain incompletely understood. We approached this question by determining the genetic and epigenetic profiles of colitis‐associated colorectal carcinomas (CA‐CRC). The findings were compared to Lynch syndrome (LS), a different form of cancer predisposition that shares the importance of immunological factors in tumorigenesis. CA‐CRCs (n = 27) were investigated for microsatellite instability, CpG island methylator phenotype and somatic mutations of 999 cancer‐relevant genes (“Pan‐cancer” panel). A subpanel of “Pan‐cancer” design (578 genes) was used for LS colorectal tumors (n = 28). Mutational loads and signatures stratified CA‐CRCs into three subgroups: hypermutated microsatellite‐unstable (Group 1, n = 1), hypermutated microsatellite‐stable (Group 2, n = 9) and nonhypermutated microsatellite‐stable (Group 3, n = 17). The Group 1 tumor was the only one with MLH1 promoter hypermethylation and exhibited the mismatch repair deficiency‐associated Signatures 21 and 15. Signatures 30 and 32 characterized Group 2, whereas no prominent single signature existed in Group 3. TP53, the most common mutational target in CA‐CRC (16/27, 59%), was similarly affected in Groups 2 and 3, but DNA repair genes and Wnt signaling genes were mutated significantly more often in Group 2. In LS tumors, the degree of hypermutability exceeded that of the hypermutated CA‐CRC Groups 1 and 2, and somatic mutational profiles and signatures were different. In conclusion, Groups 1 (4%) and 3 (63%) comply with published studies, whereas Group 2 (33%) is novel. The existence of molecularly distinct subgroups within CA‐CRC may guide clinical management, such as therapy options.

carcinoma (CRC). 1 UC-associated CRC (CA-CRC) develops in a multifactorial manner involving a complex imbalance of regulation and coordination of the human immune system, gut microbial composition and epithelial regeneration during the persistent inflammatory period. 1,2 Inflammation together with no mucosal healing predisposes to CA-CRC via inflammation-dysplasia-carcinoma sequence. 2 During inflammation-associated tumorigenesis, active inflammatory cells produce reactive oxygen species and reactive nitrogen intermediates, which induce mutations leading to genetic instability. 3 Cytokine production further enhances intracellular reactive oxygen species and reactive nitrogen intermediates in a malignant cell; moreover, it promotes epigenetic modifications that can accelerate tumor initiation by silencing DNA repair genes, for example. 3 Up to 35% of CRC risk can be attributed to genetic factors, and some 5% of CRC cases represent hereditary single-gene disorders. 4 Germline defects in DNA mismatch repair (MMR) genes MLH1, MSH2, MSH6 and PMS2, 5 or more rarely deletions in the 3 0 end of EPCAM gene leading to hypermethylation of MSH2 gene promoter, 6 cause Lynch syndrome (LS), the most prevalent form of hereditary CRC. Heterozygous germline defects lead to reduced levels of functional MMR proteins, which increases the risk for early-onset cancers, primarily CRC and endometrial cancer. 7 CRC in LS often, but not always, develops via the adenoma-carcinoma sequence. 8 The emergence of de novo somatic mutations contributes to high levels of neoantigens, which are thought to result in immune activation and later, immune evasion. 9 Thus, in analogy to CA-CRC, immunological alterations and inflammation accompany LS-associated colorectal tumorigenesis from the outset.
Recently, genomic analyses on cancers have revealed details of mutational processes and their timing in various cancers. Characterization of such events can be useful to understand molecular mechanisms of cancers and can help to determine plausible biomarkers for diagnostic and therapeutic use. In our study, we aim to determine the somatic mutational profiles and signatures for CA-CRC and compare the findings to LS-associated colorectal tumors, thereby covering two forms of earlyonset colorectal cancer with different etiologies, but with a strong immunological component as a common denominator.

| Patients and samples
The study material consisted of formalin-fixed, paraffin-embedded tissue specimens from UC patients developing CRC (CA-CRC, n = 27) and LS patients (verified carriers of pathogenic or likely pathogenic germline variants of MMR genes) developing colorectal tumors (adenomas with highgrade dysplasia, n = 10, and CRCs, n = 18). Besides tumor material, we had the patients' normal colon tissue or blood specimens available. All the LS patients were represented in the nationwide Lynch Syndrome Registry of Finland. DNA was extracted using the nonenzymatic protocols, modified extraction protocol of the phenyl-chlorophorm method 10 and protocol described in Lahiri and Nurnberger 11 for formalin-fixed, paraffin-embedded and blood samples, respectively.

| Microsatellite instability analysis
Microsatellite instability (MSI) was assayed using mononucleotide repeat markers BAT25 and BAT26 that are sensitive and specific markers of high-degree MSI (MSI-H). 12 16 For both data sets, raw data were processed using an in-house pipeline called variant calling pipeline version 3.7, 17 and data were aligned to the human genome GRCh38. First, the adapters were trimmed from the reads as well as any bad quality nucleotides from the beginning or the end of the reads, removing any pair having read(s) smaller than 36 bp. Reads were then aligned to the GRCh38 reference genome with the BWA (version 0.6.2). Nonunique read pairs and nonunique single reads were removed and GATK (version 3.7) BaseRecalibrator was used to clean the alignment. Any potential PCR duplicates were removed using Picard (version 2.9.0). MarkDuplicates and GATK Inde-lRealigner were used for indel sites.

| Statistical analysis
Statistical analyses were conducted using the SPSS software, version 25.0 (IBM SPSS Inc. Chicago, IL). Fisher's exact test was used to study pairwise comparisons of categorical variables. Normal distribution of continuous data was tested using the Shapiro-Wilk test. As data were largely not normally distributed and sample sizes were small, continuous variables were analyzed using the nonparametric Mann-Whitney U test. Correlation analyses were calculated with Spearman's or Pearson's correlation test. Exact two-sided P values were calculated. P values <.05 were considered statistically significant.

| Study design
This investigation was undertaken to define the molecular pathogenesis of colorectal tumorigenesis in UC, an idiopathic chronic inflammatory bowel disease with accelerated tumor development by a "landscaper" mechanism. 20 The results were compared to LS, where impaired "caretaker" function due to inherited (and acquired) MMR gene defects is known to induce immunological/inflammatory alterations and result in rapid tumor development. Basic characteristics of the study series are described in Table 1.  Table 4).

| Mutation signatures of CA-CRC tumors
Mutation signature analysis was performed on somatic mutation data (VarScan2 P < .01), and signatures were compared to known 60 single

| Molecular comparison of CA-CRC vs LS tumors
A study protocol analogous to CA-CRCs was applied to LS tumors, and Table 2 provides a comparative summary of the essential findings.
All LS adenomas had high-grade dysplasia and did not significantly differ from LS-CRCs with respect to MSI, CIMP or somatic mutational load (  Table 2). The same applied to high-frequency mutations studies have arrived at considerably lower frequencies (5%-17%). 40,41 Our CIMP scoring method corresponds to that described in Berg et al 42 Table 4).
Contrary to sporadic CRCs, TP53 mutations are proposed to occur as initiating events and APC mutations as late events in CA-CRC-related carcinogenesis. 26,47 The high VAFs and involvement in most CA-CRC tumors characteristic of TP53 mutations we observed ( Figure 1) is consistent with this notion. It has been proposed that TP53 mutations in CA-CRC occur at different hotspots compared to sporadic CRC, although in both tumor types, DNA binding domain is the most common target of mutations. 23,26 As demonstrated in Figure 2, the locations of TP53 mutations in our CA-CRCs were largely concordant with those reported in recent literature. [23][24][25][26][27] Yaeger et al 27 additionally found MYC to be more frequently mutated in CA-CRC compared to sporadic cases, but we found only one case with a MYC mutation. Among CA-CRCs, APC mutations occurred exclusively in the hypermutated MSS subgroup (Supplementary   Tables 4 and 5B). The relatively low overall prevalence of highfrequency mutations in APC ( Figure 1A; Supplementary Table 4) and the fact that all samples exhibiting APC mutations harbored more than one APC mutation (Supplementary Table 4 Table 2). Signature 1 was prominent in LS tumors; this signature has been proposed to be associated with an endogenous mutational process of deamination of 5 0 methylcytosine to thymine and is characterized by C>T transitions at methylated NpCpG sites.
Formalin fixation and older age may increase this process. 29