High mutation burden of circulating cell‐free DNA in early‐stage breast cancer patients is associated with a poor relapse‐free survival

Abstract Background High tumor mutation burden is shown to be associated with a poor clinical outcome. As the tumor‐derived fraction of circulating cell‐free DNA (cfDNA) is shown to reflect the genetic spectrum of the tumor, we examined whether the mutation burden of cfDNA could be used to predict the clinical outcomes of early‐stage breast cancer (BC) patients. Methods We selected a set of 79 Finnish early‐stage BC cases with a good prognosis based on traditional prognostic parameters but some of which still developed relapsed disease during follow‐up. cfDNA was isolated from the serum collected at the time of diagnosis, sequenced, and compared to matched primary tumors, clinical parameters, and survival data. Results High cfDNA mutation burden was associated with the poor relapse‐free survival (RFS) (P = .016, HR = 2.23, 95% Cl 1.16‐4.27) when patients were divided into high and low mutation burden according to the median number of somatic variants. A high discordance was observed between the matched tumor and cfDNA samples, thus highlighting the challenges related to the liquid biopsy of early‐stage cancer cases. Despite the low number of detected tumor‐specific variants, the presence of tumor‐specific somatic variants in the cfDNA was associated with the poor RFS (P = .009, HR = 2.31, 95% Cl 1.23‐4.31). Conclusions Our results confirm previously observed challenges about the accuracy of liquid biopsy‐based genotyping of early‐stage cancers and support the parallel sequencing of tumor and cfDNA while also demonstrating how the presence of tumor‐specific somatic variants and the high mutation burden in the cfDNA are both associated with the poor RFS, thus indicating the prognostic potential of liquid biopsy in the context of early‐stage cancers.


| INTRODUCTION
Breast cancer (BC) is a heterogeneous disease with a high degree of phenotypic variation within individual primary tumors. 1 This diversity, often referred to as intratumoral heterogeneity (ITH), arises largely from somatic driver variants that provide a selective advantage for variant-carrying cancer cell in its microenvironment. 2 Fitness-promoting variants may lead to positive selection and clonal expansion of cancer cell lineages as described by previous studies of cancer genomics. [3][4][5] As a result, individual BC tumors tend to be composed of multiple subpopulations that may respond differently to treatments and thus pose a major challenge for targeted cancer therapies where treatment strategies are selected based on specific biomarkers. 6 It has been largely hypothesized that intensive ITH could reflect cancer's potential for evolutionary adaptation and thus be associated with a poor clinical outcome. Indeed, recently The Cancer Genome Atlas Network (TCGA) data utilizing pan-cancer studies 7,8 have reported that high ITH is associated with a poor patient survival in various cancer types, thus highlighting the potential prognostic importance of ITH.
The current approaches for ITH testing strongly rely on tumor samples obtained from needle biopsies and surgical excisions. Tumor biopsies are known to have their limitations and the analysis of circulating cell-free DNA (cfDNA) is often proposed as a minimally invasive and easily repeatable alternative for tumor biopsies. 9 cfDNA is released into circulation from the apoptotic and necrotic tumor cells and a small fraction of it is demonstrated to carry tumor-specific genetic alterations that can be detected already in an early-stage disease. 10,11 We hypothesize that the mutational spectrum of cfDNA reflects the mutational spectrum of tumor and predicts the clinical outcome of early-stage BC patient in a manner similar to primary tumors. To test our hypothesis, we sequenced a set of primary tumors and peripheral cfDNA samples of early-stage BC patients with a stage T1N0M0 or T2N0M0 disease and compared results to clinical data. Our results indicate that the cfDNA mutation burden and the presence of tumor-specific somatic variants in cfDNA are both associated with a poor relapse-free survival (RFS) thus providing further evidence that liquid biopsy could be used to identify early-stage patients with a higher risk of relapse and to assess more precise prognosis.

| Patients, sample material, and clinical data
This study included a set of 79 Eastern Finnish breast cancer patients who had no nodal or distant metastases at the time of primary diagnosis with a stage T1N0M0 or T2N0M0. Clinical data, tissue, blood, and serum samples were obtained from the Kuopio Breast Cancer Project (KBCP), a prospective population-based case-control study conducted in 1990-1995 in Eastern Finland. 12

| Somatic variant calling
Paired-end reads were trimmed with cutadapt 15 and mapped to hg19 reference genome with BWA-MEM. 16 Mapped reads with a Phred quality score < 20 were filtered and remaining reads were sorted and indexed with SAMtools. 17 Local realignment was performed with GATK IndelRealigner 18 tool to minimize the number of mismatching bases across all reads. FASTQ and BAM file qualities were assessed with FastQC and Picard CollectHsMetrics. Variant calling was performed using VarScan2 19 ; at least 10 variant supporting reads and variant allele frequency of 0.01 were required to retain variant. Variants reported in Finnish population in ExAC 20 or detected in sequenced blood samples were filtered to obtain somatic variants calls. Called somatic variants were annotated with ANNOVAR 21 and public databases. Pathogenicity of somatic variants was evaluated with existing ClinVar, 22 COSMIC 23 and International Cancer Genome Consortium (ICGC) 24 records. Pathogenicity of somatic variants without existing database records was evaluated with MetaSVM 25 scoring where the score of 0-0.825 was interpreted as likely pathogenic variant and score greater 0.825 was interpreted as pathogenic variant.  The computational analyses were run on the servers provided by the Bioinformatics Center, University of Eastern Finland, Finland.

| Statistical analysis
The overall mutation burden of samples was estimated by calculating the number of somatic variants per number of sequenced base pairs. Somatic variants detected both in the matched tumor and cfDNA were referred as tumor-specific somatic variant.

| Sequencing performance
The ratios of targeted bases covered with more than 100 reads after read processing were 94.6 ± 0.4%, 93.3 ± 0.3%, and 80.7 ± 0.4% for cfDNA, blood, and tumor, respectively. The achieved mean sequencing coverages after read processing were 3557 ± 69x for cfDNA samples, 887 ± 38x for tumor samples and 1669 ± 179x for blood samples (File S3). The average VAF was 9.3 ± 0.8% in tumor and 1.7 ± 0.1% in cfDNA when all sequenced genes and detected somatic variants were considered. Most frequently mutated genes were TP53 (10.1% of all variants), ARID1A (7.4%), AKT1 (6.8%), GATA3 (6.4%), and MAP3K1 (6.4%) in tumor ( Figure 1A, Figure S4) and TP53 (8.6%), PIK3CA (5.2%), GATA3 (4.0%), ARID1A (3.7%), FGFR1 (3.7%), and MAP3K1 (3.7%) in cfDNA ( Figure  S5). In total, 8.9% of all somatic variants detected in the tumor and 10.3% of all somatic variants detected in the cfDNA were annotated as likely pathogenic or pathogenic based on their existing records in public databases ( Figure 1A,B). The ratios of known benign somatic variants in tumor and cfDNA were 4.7% in tumor and 5.4% in cfDNA correspondingly. Remaining variants (85.7% in tumor and 85.0% in cfDNA samples) did not have any existing records in public databases and were considered as variants of uncertain significance (VUS) whose pathogenicity was predicted computationally. According to prediction, 23.4% of the VUS detected in the tumor and 30.3% of VUS detected in the cfDNA were annotated as likely pathogenic or pathogenic while rest of the variants were annotated as benign.

| High mutation burden of tumor is associated with the poor RFS and BCSS
When patients were divided into two groups, high and low, according to the median number of detected somatic variants, high mutation burden of tumor (≥5 variants) was associated with a poor RFS (P = .020, HR = 2.47, 95% Cl 1.10-6.83, Cox regression, Figure 2A) and BCSS (P = .009, HR = 4.35, 95% Cl 1.44-13.16, Figure 2B). No association between the high mutation burden and OS was observed (P = .381). Closer analysis observed that the association of highest two quartiles of tumor mutation burden and survival were inconsistent with the hypothesis as the intermediate tumor mutation burden (5-7 variants) was found to be more associated with a poor RFS (P = .001, HR = 4.35, 95% Cl 1.84-10.26, Cox regression, Figure 2C) and BCSS (P = .002, HR = 6.19, 95% Cl 1.93-19.92, Figure 2D) than the highest quartile of tumor mutation burden (>7 variants) (RFS P = .476, BSCC P = .136). The average age at the time of diagnosis was significantly (P = .030, unpaired samples t-t-test) higher in the intermediate tumor mutation group when compared to high mutation burden group. Although the age in overall was not significantly associated covariate with the RFS (P = .244, Cox regression) or BCSS (P = .143, Cox regression), the age group of ≥ 70 years old patients that was overrepresented in the intermediate mutation group was associated with the poor RFS (P = .050, Cox regression) and BCSS (P = .018, Cox regression) in statistically significant manner. No statistically significant correlation between the tumor mutation burden and BC subtypes was observed. ROC curve analysis supported the predictive ability of tumor mutation burden in predicting the relapse (AUC = 0.682, P = .007) while the predictive ability for BCSS and OS remained statistically non-significant (Table S19).

| High mutation burden of cfDNA is associated with a poor RFS
When patients were divided into two groups, high and low, according to the median number of detected somatic variants, high mutation burden of cfDNA (≥5 variants) was associated with a poor RFS (P = .016, HR = 2.23, 95% Cl 1. 16-4.27, Cox regression, Figure 3A). No association with the poor BCSS (P = .106) or OS (P = .473) was observed. The median split turned out to be the most effective method to classify patients as practically no difference was observed between the highest two quartiles of cfDNA mutation burden; highest quartile of cfDNA mutation burden (≥7 variants, P = .011, HR = 2.64, 95% Cl 1. 25-5.56, Cox regression) and intermediate cfDNA mutation burden (4-6 variants, P = .041, HR = 2.27, 95% Cl 1.04-4.99) were both associated with a poor RFS in a similar manner ( Figure 3B). No statistically significant correlation between the cfDNA mutation burden and BC subtypes was observed. ROC curve analysis F I G U R E 1 Distribution of somatic variants per gene. TP53 was the most commonly mutated gene both in the tumor and cfDNA samples while the mutation frequency of other genes varied slightly between samples (A). Only genes sequenced in both samples are shown in figure. Only about 15% of somatic variants detected in the tumor (B) or cfDNA (C) had an existing clinical record in public databases while variants without public records were annotated as variant of uncertain significance (VUS). According to our prediction, about 60% of all VUSs were predicted to be benign in their nature while the rest of the VUSs possessed potential pathogenic potential supported the predictive ability of cfDNA mutation burden in predicting the relapse (AUC = 0.675, P = .008, Table S19) while the predictive ability for BCSS and OS remained statistically nonsignificant.

| Presence of tumor-specific somatic variants in the cfDNA is associated with a poor RFS
Tumor-specific somatic variants were detected in 28 cases (45.9%). Among these cases, an average concordance between the matched tumor and cfDNA samples was 31.1 ± 0.0% when genes sequenced in both gene panels were considered. Tumor-specific variants were most frequently detected in TP53 (20.0% of all variants), MAP3K1 (10.0%), AKT1 (8.6%), PIK3CA (7.1%), and GATA3 (7.1%) ( Figure 4A,B). Strong correlation (r = 0.738, P < .001) was observed between the tumor and cfDNA VAFs of tumorspecific somatic variants ( Figure 4C). In general, somatic variants that were well presented in the tumor with a high VAF occurred more often also in the cfDNA. Presence of tumor-specific variants in cfDNA was associated with a poor RFS (P = .009, HR = 2.31, 95% Cl 1.23-4.31, Cox regression, Figure 4D). No association with BCSS (P = .201) or OS (P = .690) was observed. The ROC curve analysis did not support the diagnostic ability of the tumor-specific variants in the prediction of RFS (AUC 0.521, P = .748, Table S19), thus suggesting that the presence of tumor-specific somatic variants (binary variable) had more prognostic value than the number of tumor-specific somatic variants (continuous variable).

| DISCUSSION
Our results indicate that the presence of tumor-specific somatic variants, tumor mutation burden, and cfDNA mutation burden are all associated with the poor RFS. Similar associations between the tumor-specific somatic variants and poor RFS have been recently reported [26][27][28] in the context of early stage BC patients. In contrast to these studies, our patients did not receive neoadjuvant treatment prior to sampling, thus reflecting the untreated status of cancer at the time of diagnosis. F I G U R E 2 Association of tumor mutation burden with RFS and BCSS. When patients were divided into two groups, high and low, according to the median tumor mutation burden, high tumor mutation burden (>5 variants) was associated with a poor RFS (A) and BCSS (B). Further analysis observed that the association of highest two quartiles and survival was inconsistent with the hypothesis as the intermediate tumor mutation burden (5-7 variants) was more associated with a poor survival than the highest quartile of tumor mutation burden (>7 variants) (C, D). All multivariate analyses were stratified with age at the time of diagnosis, grade, stage, ER status, PR status, HER2 status, and radiotherapy Our results were consistent with the hypothesis and literature except for observed association between the highest two quartiles of tumor mutation burden and RFS. A closer analysis showed that the average age at the time of diagnosis was significantly higher in the intermediate tumor mutation burden group and especially the age group of ≥ 70 years old patients overrepresented in the group was associated with the poor survival. The uneven grouping of the oldest patients is probably explained by the random sampling and small cohort size. Furthermore, we cannot exclude the effect of underlying factors that were not considered in our survival analyses. Even though the reported result is unexpected, results still support the conclusion that tumor mutation burden in general is associated with the poor RFS and BCSS.
Observed discordance between the mutation profiles of the tumor tissues and matching cfDNA samples was remarkably high. It has been suggested that the discordance between the cfDNA and matched tumor in general tends to be higher in early-stage cancers 29,30 which might explain why significantly better results in the terms of observed concordance have been obtained with advanced cancer diseases to which the majority of liquid biopsy related studies have focused. This has raised justified concerns about the accuracy of liquid-based genotyping in the context of early-stage cfDNA samples in clinical setup. 31 The reasons for discordance are open for discussion and may reflect either a biological or technical variation in methodology. 32 In this study, somatic variant calling used pooled reference sample and public databases instead of matched blood samples and thus the possibility of false somatic variant calls cannot be fully excluded despite the careful quality control for possible gDNA contaminations and false variant calls. Observed discordance relies to the assumption that the heterogeneity of disease is perfectly reflected by the tumor biopsy which is known to be questionable is some cases. 33 As liquid biopsy is considered to reflect the systemic status of patient, we cannot exclude the possibility that some of these discordant variants may originate either from benign or metastasized tumors especially when potentially pathogenic somatic variants were detected in the serum. Observed association between the cfDNA mutation burden and RFS together with the observed discordance underlines the potential and challenges that are related to the liquid biopsy of early-stage cancers and supports the parallel sequencing of tumor and liquid biopsies until the background of discordant variants is better understood.
It must be noted that our study has technical limitations. In addition to the variant calling without matched reference samples, used gene panels are relatively small in the context of mutation burden assessment. Although small gene panels have been used to assess mutation burden, mutation burden should be ideally evaluated from whole-exome sequencing or whole-genome sequencing data 34 instead of small gene panel sequencing enriched with common oncogenes. Another issue is the age of used cohort material which is both the strength and constraint of this study. Almost thirty years long follow-up time offers long and unique prospective perspective to the survival of Finnish BC patients who had good prognosis based on traditional prognostic parameters and allows us to detect relapses that would have been otherwise missed. At the F I G U R E 3 Association of cfDNA mutation burden and RFS. When patients were divided into two groups, high and low, by the median number of cfDNA mutation burden, high mutation burden of cfDNA (>5 variants) was associated with RFS (A) but not with BCSS or OS. Further analysis observed no significant difference between the highest two quartiles of cfDNA mutation burden in the terms of their association with the RFS (B). All multivariate analyses were stratified with age at the time of diagnosis, grade, stage, ER status, PR status, HER2 status, and radiotherapy same time, treatment strategies and techniques of BC have developed substantially. For example, neoadjuvant chemotherapy and aromatase inhibitors as an adjuvant therapy were not used when sample material was collected in the 1990s. Most liquid biopsy related studies have avoided the use of serum samples due to lysis of hematopoietic cells which may contaminate cfDNA by genomic DNA fragments. 35 However, plasma samples were not collected in the KBCP which forced us to use serum samples in our study. Finally, it must be noted that our cohort differs from standard BC cohort material as it was specifically collected to contain patients both with and without relapsed disease.

| CONCLUSIONS
To the best of our knowledge, this is the first study to report that the poor RFS of an early-stage BC patient who have not received neoadjuvant chemotherapy can be estimated from the cfDNA sample at the time of diagnosis. We confirm the previously raised concerns about the accuracy of liquid biopsy-based genotyping of early-stage cancers but provide evidence that the estimate of cfDNA mutation burden and the presence of tumor-specific somatic variants in the cfDNA may act as an independent prognostic factor and help us to identify patients with a higher risk of relapse. Further F I G U R E 4 Comparison of tumor biopsy and liquid biopsy results. Somatic variants detected in the matched primary tumors and cfDNA are shown as a matrix where each column represents a single patient and each row represents a single gene (A). Bar plots describe the frequency of somatic variants per gene and per patient. Only genes sequenced in both samples were taken into account in this comparison. Venn diagram illustrates the somatic variant counts of tumor and cfDNA samples thus illustrating the discordance observed between the tumor biopsy and liquid biopsy results (B). Only genes sequenced in both samples were taken into account in this comparison. Observed discordance was mainly explained by somatic variants that were present in low VAF either in tumor or cfDNA. Indeed, a strong correlation was observed between the tumor and cfDNA VAFs of tumor-specific somatic variants (C) suggesting that somatic variants that were presented in the tumor with a high VAF were also more likely to occur in cfDNA. Presence of tumor-specific somatic variants was associated with a poor RFS (D) but not with the BCSS or OS. Multivariate analysis was stratified with age at the time of diagnosis, grade, stage, ER status, PR status, HER2 status, and radiotherapy studies related to the liquid biopsy of early-stage BC are well warranted.