Genome‐wide methylation profiling of diagnostic tumor specimens identified DNA methylation markers associated with metastasis among men with untreated localized prostate cancer

Abstract Background We used a genome‐wide discovery approach to identify methylation markers associated with metastasis in men with localized prostate cancer (PCa), as better identification of those at high risk of metastasis can inform treatment decision‐making. Methods We identified men with localized PCa at Kaiser Permanente California (January 1, 1997–December 31, 2006) who did not receive curative treatment and followed them for 10 years to determine metastasis status. Cases were chart review‐confirmed metastasis, and controls were matched using density sampling. We extracted DNA from the cancerous areas in the archived diagnostic tissue blocks. We used Illumina's Infinium MethylationEPIC BeadChip for methylation interrogation. We used conditional logistic regression and Bonferroni's correction to identify methylation markers associated with metastasis. In a separate validation cohort (2007), we evaluated the added predictive utility of the methylation score beyond clinical risk score. Results Among 215 cases and 404 controls, 31 CpG sites were significantly associated with metastasis status. Adding the methylation score to the clinical risk score did not meaningfully improve the c‐statistic (0.80–0.81) in the validation cohort, though the score itself was statistically significant (p < 0.01). In the validation cohort, both clinical risk score alone and methylation marker score alone are well calibrated for predicted 10‐year metastasis risks. Adding the methylation score to the clinical risk score only marginally improved predictive risk calibration. Conclusion Our findings do not support the use of these markers to improve clinical risk prediction. The methylation markers identified may inform novel hypothesis in the roles of these genetic regions in metastasis development.


| INTRODUCTION
In 2022, 268,490 men are estimated to be diagnosed with prostate cancer (PCa) in the United States. 1 Of these, about 80% are localized. 2A significant proportion of localized PCa is indolent and will not affect a patient's health even in the absence of curative treatment. 3On the other hand, PCa treatment often leads to persistent side effects that compromise patient's quality of life. 4he American Urology Association (AUA) treatment guidelines recommended active surveillance for very low-risk PCa, and as a management option for low-risk and favorable intermediate-risk PCa. 5 Distinguishing the subset of localized PCa cases whose metastasis may be prevented by treatments is critical to inform treatment decision-making.
[8][9][10][11][12][13] These prediction algorithms have been found to misclassify a proportion of patients in validation studies. 6,14,15outine clinical and pathological features are unlikely to fully capture the complex molecular mechanisms underlying tumor behavior.These existing prediction tools also offer little insight for potential molecular mechanisms for aggressive PCa.Epigenetic regulation is now recognized as a major mechanism underlying tumor heterogeneity and disease progression. 16,17DNA methylation is a stable and readily measurable epigenetic modification. 18The contribution of aberrant DNA methylation to metastatic process has become evident. 16,17,19We thus hypothesize that certain methylation markers will be predictive of metastasis in men with localized PCa and can enhance the current risk prediction algorithm for PCa aggressiveness.
A major challenge of studying molecular tumor markers for PCa aggressiveness is the need to focus on specimens obtained at the time of diagnosis rather than at surgery because surgery may alter the disease course (e.g., by preventing or delaying metastasis in aggressive PCa).It is thus impossible to classify the pretreatment risk of metastasis in treated patients.In a recent systematic review, 20,21 most studies that examined the associations between DNA methylation markers and PCa outcome focused on treated patients.The majority of the studies also employed a candidate gene approach, 22 which may hinder the development of novel risk prediction scores based on the most informative methylation markers.To shed light on the utility of DNA methylation markers on risk prediction for PCa aggressiveness, we conducted a nested case-control study to identify novel methylation markers measurable at PCa diagnosis that is predictive of PCa metastasis among untreated men using a discovery approach and examined the added predictive value of these methylation markers beyond clinical risk factors.

| Study setting, design, and populations
This study was conducted at Kaiser Permanente Southern California (KPSC) and Northern California (KPNC), integrated health care delivery systems that provide comprehensive health services for over 9-million racial and socioeconomically diverse members who are broadly representative of residents in California. 23,24We first identified the source cohort based on the following criteria: (1) newly diagnosed with PCa between January 1, 1997, and December 31, 2006 at KPSC or KPNC; and (2) diagnosed at localized stage (TNM stages I and II or localized stage of the SEER summary stage).We then excluded men who: (1) had unknown Gleason score; (2) received prostatectomy, radiation therapy, hormone therapy, or chemotherapy within 6 months of diagnosis; risk score did not meaningfully improve the c-statistic (0.80-0.81) in the validation cohort, though the score itself was statistically significant (p < 0.01).In the validation cohort, both clinical risk score alone and methylation marker score alone are well calibrated for predicted 10-year metastasis risks.Adding the methylation score to the clinical risk score only marginally improved predictive risk calibration.

Conclusion:
Our findings do not support the use of these markers to improve clinical risk prediction.The methylation markers identified may inform novel hypothesis in the roles of these genetic regions in metastasis development.

K E Y W O R D S
epigenetics, metastasis, methylation, prostate cancer, risk model and (3) had metastatic disease, died, or were lost to follow-up within 6 months of diagnosis.We followed all eligible men from 6 months after PCa diagnosis until metastasis, initiation of prostatectomy, radiotherapy, chemotherapy or immunotherapy (since treatment will alter the disease course), death (non-PCa-related), health plan disenrollment, or 10 years after initial PCa diagnosis (~88% of the men who remained alive at 10 years after PCa diagnosis retained KP membership), whichever came first.
We included all men with identified metastasis during study follow-up as cases.We selected controls from the source cohort using density sampling, individually matched to cases at a 3:1 ratio initially by age (within 5 years), race (black vs. nonblack), year of PCa diagnosis (within 5 years), and KP region (i.e., KPSC or KPNC).We later sampled additional controls to allow proper adjustment for Gleason score in the analysis.
We also included a separate validation cohort for the evaluation of the added predictive utility of DNA methylation markers discovered from the case-control set.We included men diagnosed with localized PCa in 2007 at KPSC who met the other eligibility criteria described above and followed them as above.Because metastasis risk prediction was most helpful for those with Gleason score 7 or lower (i.e., treatment is recommended for those with Gleason score 8 or higher), we further limit the validation cohort to those with Gleason score ≤7.
Both the KPSC's and KPNC's Institutional Review Boards approved this study and waived the requirement for informed consent.We conducted this study in accordance with the U.S. Common Rule guidelines.

| Data collection
We identified men diagnosed with PCa from KPSC's and KPNC's Surveillance, Epidemiology, and End Resultaffiliated cancer registries.The KP cancer registries collect data on clinical stage, Gleason score (Gleason grade was not collected for all study years, therefore it was not used), age at diagnosis, race/ethnicity as well as initial course of cancer treatment.We identified potential metastases using an algorithm based on ICD-9 and ICD-10 diagnosis codes for metastatic malignancy, natural language processing of radiology reports, serum PSA levels >20 ng/mL, CPT procedure codes for treatment initiation and utilizations of chemotherapy, immunotherapy, or hormonal therapy (leuprolide), encounters with an oncologist after 12 months following diagnosis, and death due to prostate cancer.We manually chart reviewed probable cases identified by the algorithm to confirm prostate cancer metastasis and date of metastasis diagnosis.We re-reviewed a random sample of 10% of the charts to ensure the quality of the chart reviewed.An experienced urologist (GWC) reviewed all questionable cases and made a final determination on the presence of metastasis.
We identified PCa treatment and PSA levels from KP's electronic health records and date and cause of death from KP's records and death certificate linkage with the State of California and Social Security Administration.

| Pathology review
We retrieved archived diagnostic biopsy H&E slides and formalin-fixed, paraffin-embedded (FFPE) blocks for study subjects from KP's centralized storage center.Previous studies have shown strong correlations comparing methylation values between paired FFPE and fresh-frozen tumor tissues using Illumina's genomewide methylation technology 25,26 and supported the use of FFPE sample for our purpose.The study pathologists (JH, FS, and NS) centrally reviewed all pathology reports and diagnostic slides to identify appropriate biopsy cores for the methylation assay and circled the cancerous areas on each FFPE block for macrodissection.We choose not to perform microdissection for the following reasons: (1) Gene expression studies suggest an important role of stromal cells in the metastasis signature 27,28 ; (2) microdissection is limiting for routine clinical use due to costs; and (3) PCa tumor glands are usually clustered in biopsy cores, making macrodissection easy to perform and contamination less of a concern.When necessary, we used multiple biopsy cores to achieve sufficient DNA sample for a subject for the methylation assays.

| Laboratory procedures
We performed DNA extraction using Qiagen's QIAamp DNA FFPE Tissue Kit.Methylation assay was performed at the University of Southern California Core Facility using Illumina's Infinium MethylationEPIC BeadChip, following manufacturer protocol.The technician first bisulfite converted the purified DNA (Zymo's EZ DNA methylation kit) and treated them with a FFPE DNA Restoration Kit before the assay.Methylation status of the interrogated CpG site is presented as the β value, which is a continuous variable ranging from 0 (unmethylated) to 1 (fully methylated).We included each case-control trio in the same batch and same BeadChip and randomly assigned well positions within a case-control trio.The USC core lab performed background correction and dye-bias normalization using the "noob" function in the minfi R package. 29

| Statistical analysis
We calculated the distributions of the demographic and clinical characteristics of the final analytical case-control set and the validation cohort and compared the cases and controls using the Wilcoxon rank-sum test or chi-square test.We estimated the 10-year cumulative incidence of metastasis in the source cohort and the validation cohort using the Kaplan-Meier method.
For the analysis of the methylation data, we omitted the data points with probe failure rate >5% (33,697 probes, or 3.9% of all probes), leaving 832,140 probes in the analyses.We did not further filter methylation signals that may be due to SNPs or genomic deletions or copy number variations given our purpose was mainly prediction.Using the methylation data from the case-control set, we performed conditional logistic regressions to test the association between methylation status (i.e., the β value) of each of the 832,140 CpG sites (after excluding those with >5% probe failure) in the MethylationEPIC BeadChip and metastasis status (i.e., the case status) after adjusting for age at diagnosis, race, Gleason score, and LUMP score.A separate model was constructed for each CpG site.We adjusted for the LUMP score (leukocytes unmethylation for purity, which averaged 44 non-methylated immune-specific CpG sites to infer tumor purity), to account for potential tissue impurities in the DNA samples. 30We defined genomewide significance as p value less than 1 × 10 −7 based the considerations of (1) Bonferroni correction of α-level 0.1 (0.1/865,837 = 1.17 × 10 −7 , with α = 0.10 to avoid over restriction in the search of predictors); and (2) balance between established p value cutoffs for GWAS studies (5 × 10 −8 and 5 × 10 −7 ) 31 We generated the Manhattan plot of minus log p values by position in base pairs for each chromosome to evaluate the relative positions of the significant CpG sites after Bonferroni correction (Figure S1).
We initially created two sets of methylation scores based on the CpG sites identified: The first score was the weighted sum of β values for each selected site, weighted by the regression coefficient estimates from the individual logistic regression; the second score used only the sign of the regression coefficient (positive or negative) and is the "sum" of the β values for the sites (with the β values for CpG sites with negatives associations subtracted instead of added).We compared the area under the ROC curve (AUC) for the risk prediction model including each of the methylation scores in the case-control dataset.The two scores achieved the same AUC, thus the score with coefficient signs as weights was chosen due to its simplicity.We also created a single clinical risk score (composed of age at diagnosis, race/ethnicity, stage at diagnosis, Gleason score, PSA at diagnosis, PSA doubling time, and number of positive biopsy cores) rather than adjusting for multiple clinical variables in the model, which allowed us to condense clinical risk factors into a single degree of freedom for modeling.This clinical risk score was created in the case/control set using logistic regression.Some of the clinical predictors (i.e., PSA doubling time and number of biopsy cores positive) had considerable missingness (up to 20%).We therefore used multiple imputation with the chained equation technique for all the analyses. 32e evaluated the AUC with the clinical risk score alone, the methylation score alone, and both the clinical risk score and the methylation score in both the casecontrol set (using conditional logistic regression) and in the validation cohort (using Cox regression) to assess the predicted risk discrimination of each of these models.In subset analyses, we repeated the analyses for those with Gleason score of 7 vs. ≤6.In addition, we assessed predicted risk calibration in the validation cohort by comparing model-predicted risk groups (<5%, 5%-14.9%,and 15%+) of the 10-year risk of metastasis and observed risk (estimated by 10-year cumulative incidence) for both the model with clinical risk score alone and the model with both the clinical and methylation risk scores.There was not enough sample size to perform stratified analysis on predicted risk calibration by Gleason score.We performed the methylation data processing and EWAS analysis using R, version 4.3.1.We conducted all other analyses using the SAS Enterprise Guide statistical software, version 7.1.

| Study population
A total of 39,802 men diagnosed with localized PCa between 1997 and 2006 were identified at KPSC and KPNC.After applying the exclusion criteria, the source cohort for the case-control selection consisted of 10,039 men (Figure 1).Of these men, a total of 357 metastases that developed within 10 years after diagnosis were chart review confirmed, with a total of 1322 controls selected.Among the identified cases and controls, a total of 215 cases and 404 controls contributed successful methylation results (Figure 1).
For these 215 cases and 404 controls, the mean age at PCa diagnosis was 72.7 and 72.9 years, respectively.About 42% and 26% of the cases had Gleason score 7 and 2-6, respectively, while 27% and 51% of the controls had Gleason score 7 and ≤6, respectively.Cases also had higher level of PSA at diagnosis, shorter PSA doubling time, and number of biopsy cores positive for cancer (Table 1).
A total of 2229 men diagnosed with localized PCa in 2007 were identified at KPSC; 630 met all eligibility criteria for the validation cohort, and 382 contributed successful methylation results (Figure 2, including 10 cases who developed metastasis (3 with Gleason score ≤6 and 7 with Gleason score 7)).The mean age at diagnosis in the final validation cohort was 67.0 years.Thirty-one percent and 69% of the validation cohort had Gleason score 7 and ≤6, respectively (Table 1, right).

| Cumulative incidence of metastasis
In the source cohort, the cumulative incidence of metastasis was 7.8% over 10 years.The 10-year cumulative incidence varies by the index calendar year, ranging from 3.8% (1997) to 10.8% (1998) without a clear trend over time.In the validation cohort, the 10-year cumulative incidence was 8.0%.

| Selection of methylation markers
Analysis of the MethylationEPIC BeadChip data among the cases and controls led to the identification of 31 CpG sites whose methylation status was significantly associated with metastasis status after Bonferroni correction.These 31 sites, their associations with PCa metastasis (measured as odds ratio) and p values are listed in Table 2. T A B L E 1 Demographic and clinical characteristics of the study populations.

| Model discrimination and added model discrimination by methylation markers
In the case-control set, the methylation markers alone achieved similar AUC as the clinical risk score alone (0.69 vs. 0.68, respectively, Table 3).Adding the methylation score to the clinical risk score improved the model discrimination from 0.68 to 0.73 (p = 0.01).In the analyses stratified by Gleason score, similar findings were found for those diagnosed with Gleason score ≤6, 7, and ≥8 (Table 3).However, in the validation cohort, the methylation score alone achieved lower AUC compared to that from clinical risk score (0.73 vs. 0.80).There was also no improvement  3).

| Added model calibration by methylation markers
In the validation cohort, both clinical risk score alone and methylation marker score alone are well calibrated for the following groups of model-predicted 10-year metastasis risks: <5%, 5%-14.9%,and ≥15%.Observed cumulative incidence of metastasis was 1.6%, 9.6%, and 28.3% for these groups defined by clinical risk score alone, and 0%, 7.5%, and 32.8% for groups defined by methylation score alone.
When comparing model-predicted risks from the model composed of clinical risk score only to that with the addition of the methylation score, both models assigned 65% of men into the same predicted risk groups, while 29% of men were moved to a higher risk category and 6% moved to a lower risk category.As shown in Figure 3, adding the methylation score only improved predicted calibration for 2% of the men (whose estimated risk increased from 5%-15% to >15% after adding methylation risk score  [in Figure 3, moderate ≥ high] had an estimated 10-year rate of 20.3% [high]) but worsen the predicted risk calibration for 33% of the men in the validation cohort (i.e., in Figure 3, low ≥ moderate [actual observed risk group: low] and high ≥ moderate [actual observed risk group: high]).

| DISCUSSION
In this study, we identified 31 methylation markers after Bonferroni adjustment from genome-wide methylation assay to be significantly associated with risk of metastasis among men with localized PCa that are untreated.Adding a risk score based on these 31 methylation markers to the clinical risk score in the risk prediction model moderately improved the predicted risk discrimination as suggested by the AUC in the case-control set (0.73 vs. 0.68), but not in the validation cohort (0.81 vs. 0.80).Further, adding the methylation score to clinical risk score did not enhance predicted risk calibration.Our findings do not support clinical use of these markers to improve overall risk prediction.However, our data suggest men with Gleason score ≤6 may be the exception.In this patient subset, methylation risk score alone achieved similar AUC (0.87) as the combined use of the two risk scores (0.91) and had a meaningful improvement in AUC in magnitude (from 0.80 to 0.91, although this change was not statistically significant).These findings suggest that the 31 methylation markers may help to identify patients with low grade Gleason score that are at higher risk of metastasis beyond clinical characteristics.Given the small event size of the validation cohort in this study, our findings will need to be confirmed in larger studies.
Although adding the methylation score to the clinical risk score did not improve predicted risk calibration in the validation cohort, it should be noted that the validation cohort only has 10 metastatic cases with successful methylation results.The potential of these methylation markers in improving predicted risk calibration should thus be evaluated in other larger cohorts of untreated men.However, it should be noted that when the addition of the methylation score changed the predicted risk category, they did further distinguish the metastasis risk within that predicted risk group.For example, for those predicted to have ≥15% 10year risk of metastasis by clinical risk score only, the addition of the methylation score further stratified this group into two groups (those that were downgraded vs. unchanged by the methylation score).Those who were downgraded did indeed had lower 10-year cumulative incidence (16.5%) compared to those whose predicted risk group was not altered by methylation score (36.2%).This was also the case for those within the <5% predicted risk group by the clinical risk score.These findings support value of these methylation markers in offering additional prognostic information beyond clinical risk factors, despite the calibration results in this study with the chosen predicted risk categories.
To our knowledge, this study is among the first to perform genome-wide methylation profiling using diagnostic biopsy PCa tissue for the identification of prognostic methylation markers for risk of PCa metastasis in untreated men.As noted by a recent systematic review, 22 existing literatures in the search of prognostic methylation markers have primarily focused on patients who received radical prostatectomy; a majority also used biochemical recurrence as the outcomes of interest rather than metastasis or lethal PCa.Many of these studies reported positive findings on prognostic methylation markers, 22 including several recent studies with genome-wide profiling that reported moderately improved the risk prediction [33][34][35][36][37] for recurrent/lethal PCa.However, most of these studies did not evaluate predicted risk calibration.
The 31 methylation markers identified in this study may help inform the molecular mechanism of disease progression, and potentially inform the development of new therapeutic targets.Of the 31 methylation markers identified, six are within the gene body/coding regions.Some of these genes, namely TRPS1, 38,39 FAM66A, 40 and PRDM16, 41 have been implicated in prostate carcinogenesis, although research on these genes and PCa remain sparse.The roles of the other three genes (i.e., ADD3-AS1, PHACTR3, and ALDH1L1-AS2) in PCa development and progression have not been reported.
In addition, two markers are located in CpG islands, while six markers are located in the CpG shore and shelf regions.Sixteen of the 31 markers did not have a clear relation with CpG island or the gene body.This finding is similar to a prior study that reported that many of the differential methylated positions within cancer stem cells do not reside in CpG islands, but in shelf and open sea (sites outside of the CpG island regions 42 ).Of note, methylation status was associated with a strong protective effect on metastasis for 28 out of the 31 markers.These findings may lead to novel hypotheses regarding the role of these genetic regions in PCa progression.
There are several limitations that should be considered when interpreting our results.First, there was a large attrition of study subjects for obtaining the methylation data.This was in part due to the retrospective nature of the study and in part due to the nature of small tumor size in the biopsy cores for those with very limited diseases.There is therefore potential selection bias that those with more "extensive" disease were more likely to be included in the final analytical datasets.Second, our algorithm may not have captured all metastases developed, and some men with metastasis may lack documentation in chart notes or were undiagnosed.Third, our cohorts of patients were diagnosed between 1997 and 2007, when the standard of care was a systematic sextant biopsy.Thus, it is possible that our cohort may be under-staged due to low sampling.Fourth, the validation cohort was limited in sample size.The markers identified should be further examined in a larger validation cohort.Fifth, our methodology focused on CpG site-level analyses.Thus, our findings do not rule out that possibility that a different approach (e.g., clustering supervised or unsupervised) may lead to the identification of methylation markers that may better improve the predicted risk discrimination and calibration.
Methylation markers provided limited increased ability to predict metastasis better than established clinical risk factors, with some potential benefit in patient risk stratification for those with Gleason score ≤6.These findings should be validated in larger cohorts.The methylation markers identified, however, may inform novel hypothesis in the roles of these genetic regions in metastasis development.

F I G U R E 2
Study population flow chart-validation set. in AUC with the addition of the methylation score (AUC = 0.80 vs. 0.81).When stratified by Gleason score, there was a meaningful improvement (although not statistically significant) in AUC for those with Gleason score ≤6 (AUC = 0.91 [clinical risk score + methylation score] vs. 0.80 [clinical risk score alone], p = 0.60; Table for one unit increase in the methylation score or the clinical score; it's magnitude should not be directly compared between clinical risk score and the methylation score, since the scores have different ranges.Rather, the p value and AUC are more informative for the purpose of our study.b Clinical risk score ranged from 4.2 to 12.0 in the entire case-control set and from 4.3 to 10.1 in the validation cohort.The ORs here represent one unit increase in the clinical risk score.c Methylation score ranged from −22.2 to −6.1 in the entire case-control set and −21.3 to −5.4 in the validation cohort.The ORs here represent one unit increase in the methylation score.d Individuals with Gleason ≥8 were not included in the validation cohort.*p Value < 0.05 derived from a likelihood ratio test comparing the predictive model to the clinical risk score only model, which is equivalent to the comparison of AUCs from these two models.

F I G U R E 3
Ten-year cumulative incidence for the model-predicted risk categories determined by the predicted risk group from clinical risk score only and the predicted risk group from clinical risk score + methylation score.High: 15%+ 10-year risk of metastasis; Moderate: 5-14.9% 10-year risk of metastasis; Low: <5% risk of metastasis.Predicted Risk Category was first labeled by the predicted risk group from the clinical risk score only (high, moderate, or low), followed by the predicted risk group when methylation score was added to the model (i.e., clinical risk score + methylation score).For example, the risk group[High, High] means that both the clinical risk score and the [clinical risk score+ methylation score] predicted high risk (15+%) for these individuals; while [High, Moderate] means that the clinical risk score only model predicted high risk (15+%) for these individuals, but the [clinical risk score+ methylation score] model predicted moderate risk (5-14.9%)for these individuals.| 18847 CHAO et al.
The 31 CpG sites identified by Bonferroni adjustment and their odds ratio for metastasis and p values.Odds ratios from conditional logistic regression and area under the ROC curve of the risk prediction models in the case-control dataset and in the validation cohort, overall and by Gleason score.
T A B L E 2