Detailed DNA methylation characterisation of phyllodes tumours identifies a signature of malignancy and distinguishes phyllodes from metaplastic breast carcinoma

Phyllodes tumours (PTs) are rare fibroepithelial lesions of the breast that are classified as benign, borderline, or malignant. As little is known about the molecular underpinnings of PTs, current diagnosis relies on histological examination. However, accurate classification is often difficult, particularly for distinguishing borderline from malignant PTs. Furthermore, PTs can be misdiagnosed as other tumour types with shared histological features, such as fibroadenoma and metaplastic breast cancers. As DNA methylation is a recognised hallmark of many cancers, we hypothesised that DNA methylation could provide novel biomarkers for diagnosis and tumour stratification in PTs, whilst also allowing insight into the molecular aetiology of this otherwise understudied tumour. We generated whole‐genome methylation data using the Illumina EPIC microarray in a novel PT cohort (n = 33) and curated methylation microarray data from published datasets including PTs and other potentially histopathologically similar tumours (total n = 817 samples). Analyses revealed that PTs have a unique methylome compared to normal breast tissue and to potentially histopathologically similar tumours (metaplastic breast cancer, fibroadenoma and sarcomas), with PT‐specific methylation changes enriched in gene sets involved in KRAS signalling and epithelial‐mesenchymal transition. Next, we identified 53 differentially methylated regions (DMRs) (false discovery rate < 0.05) that specifically delineated malignant from non‐malignant PTs. The top DMR in both discovery and validation cohorts was hypermethylation at the HSD17B8 CpG island promoter. Matched PT single‐cell expression data showed that HSD17B8 had minimal expression in fibroblast (putative tumour) cells. Finally, we created a methylation classifier to distinguish PTs from metaplastic breast cancer samples, where we revealed a likely misdiagnosis for two TCGA metaplastic breast cancer samples. In conclusion, DNA methylation alterations are associated with PT histopathology and hold the potential to improve our understanding of PT molecular aetiology, diagnostics, and risk stratification. © 2024 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Introduction
Phyllodes tumours (PTs) are rare, fibroepithelial tumours of the breast.PTs account for $1% of breast tumours, with malignant PTs accounting for $10% of PT diagnoses [1].Fibroepithelial lesions of the breast are a heterogenous group of tumours that include cellular fibroadenomas (FAs) and PTs.FAs and PTs have a similar clinical presentation and are composed of both stromal cells and epithelial cells, of which stromal cells are the neoplastic component.However, their disease course varies, with PTs characterised by more rapid growth and a higher risk of recurrence after surgery.Furthermore, if surgical control is unsuccessful, malignant PTs generally respond poorly to chemotherapy and radiotherapy and may metastasise [2].
The World Health Organisation (WHO) Classification of Breast Tumours, 5th Edition [1], outlines the current criteria used to classify PTs and FAs.In brief, FAs have well-defined margins and show variable but usually mild cellularity and low mitotic activity; PTs show a spectrum from benign to borderline to malignant with increasing cellularity, atypia, and mitotic activity.Malignant PTs have sarcomatous stroma and may also show stromal overgrowth and heterologous differentiation with malignant bone (osteosarcoma) and cartilage (chondrosarcoma) [1].Notably, there is significant overlap between diagnostic categories, which can pose a clinical challenge.
In recognition of this challenge, the Singapore General Hospital (SGH) Group developed a nomogram which assigns a score based on cellular atypia (mild, moderate, severe), mitotic count per 10 highpowered fields, presence or absence of stromal overgrowth, and margin status, the most heavily weighted variable [3].This SGH score is used as a predictor of recurrence-free survival and has been independently validated in several cohorts [4][5][6], leading to its inclusion in the WHO classification [1].While the nomogram provides more data regarding likely outcome for a particular patient, it does not correctly identify all patients with PT who recur, nor does it distinguish PT from histopathologically similar tumours [4][5][6].There is therefore a need to develop molecular biomarkers that can provide an accurate diagnosis to inform optimal patient management and minimise both under-and overtreatment.
Several studies have identified PT molecular biomarkers, but with limited translational success [7], and few studies exist to address the clinical need to distinguish PT from histopathologically similar tumours [8].PTs are difficult to study owing to their relative rarity; in one of the largest genomic studies to date, Tan and colleagues [9] undertook exome sequencing (n = 22 PTs) and performed targeted resequencing of a larger cohort of 100 PTs.They described recurrent loss of function mutations in SETD2 and KMT2D, which are histone methyltransferase enzymes known to play a significant role in epigenetic modification.That these mutations are detected in PTs and only rarely in FAs suggests a potential role for epigenetics in PT tumorigenesis [10].Disruption to epigenetic mechanisms, such as DNA methylation, is a recognised hallmark of cancer [11] and therefore may be important in the molecular aetiology of PTs.Epigenetic studies of PTs have been limited to date; early studies investigated DNA methylation at a small number of candidate genes such as TWIST1 and RASSF1 [12,13], and a more recent integrative genomic and epigenomic study by Hench et al [14] highlighted the diagnostic potential of combining DNA methylation and copy number profiling for predicting clinical outcomes.
We hypothesised that DNA methylation could provide novel biomarkers for diagnosis and more accurate tumour stratification in PTs, as well as a better understanding of the biological processes underlying the development and progression of PTs.In this study, Characterising the methylome of phyllodes tumours we undertook whole-genome methylation profiling using DNA methylation arrays on a unique cohort (n = 33) of fibroepithelial and breast tumours encompassing FAs, metaplastic breast cancer, and benign, borderline, and malignant PTs.We used publicly available data to perform validation [14] and comparisons with histopathologically similar tumours.Our characterisation of the PT methylome reveals that it is unique compared to that of other similar or co-localised cancers including breast cancer and sarcomas.Further, we demonstrate the diagnostic potential of DNA methylation by identifying DNA methylation differences between malignant and non-malignant PTs and create a classifier with the potential to discriminate PTs from metaplastic breast cancer.

Materials and methods
Additional details for all methods are included in Supplementary materials and methods.

Clinical samples
The Phyllodes (Australian) cohort comprised n = 33 patients with cellular fibroepithelial lesions and metaplastic breast cancer who had a tumour resected between 2007 and 2017.The Phyllodes (Australian) cohort patients had few events (recurrence or death), consistent with the rarity of recurrence in PT tumours.An expert breast pathologist (S.O.T.) classified cases according to WHO categories [15] as benign (n = 2), borderline (n = 9), and malignant (n = 18) PT, FA (n = 2), and metaplastic breast cancer (n = 2, supplementary material, Table S1).This PT classification was based on the SGH nomogram score [3] minus margin status, with thresholds for benign (SGH = 3), borderline (SGH = 14), or malignant (SGH ≥ 24).Margin status was excluded as it does not necessarily reflect the underlying biological nature of a PT and instead may reflect patient and surgeon preferences.

DNA methylation
Pathological review of the Phyllodes (Australian) cohort tissue identified regions for coring.DNA was extracted from the cores using the Qiagen QIAamp DNA FFPE Tissue Kit or MN Nucleospin DNA kit for formalinfixed paraffin-embedded tissue (FFPET), following the manufacturer's instructions (Qiagen, Hilden, Germany).The Infinium HD FFPE quality control and DNA restoration kits (Illumina, San Diego, CA, USA) were used to evaluate and repair degraded DNA samples where feasible, as previously described [16].DNA (250-500 ng) was treated with sodium bisulphite using an EZ-96 DNA methylation kit (Zymo Research, Irvine, CA, USA).DNA methylation was then quantified using the Illumina Infinium Human Methylation EPIC BeadChip (EPIC arrays) according to the manufacturer's standard protocol (Illumina).
The total dataset comprised n = 817 samples (with a cohort and analysis breakdown shown in supplementary material, Table S1).For each analysis, the methylation data underwent quality control, normalisation, and harmonisation according to the genomic content of the array, depending on which cohorts were involved.

Cellular deconvolution of methylation data
To estimate tumour purity and cellular composition, we used the R package EpiDISH [20] (version 2.14.1) using the centEpiFibIC.mreference dataset, and t-tests were used to compare cellular proportions between tumour types.

Sarcoma classifier
To compare the methylation profile of all available PT and FA samples to sarcoma subtypes, we used a web-based classifier created by Koelsche and colleagues [18].

Genome-wide DNA methylation analysis
For initial data visualisation we extracted the 500 most variable probes across the dataset being studied and applied the 'Rtsne' function.For each cohort comparison we used the limma package (version 3.54.1) to identify differentially methylated probes (DMPs) with an adjusted p value cut-off of false discovery rate (FDR) < 0.05 [21].The package DMRcate (version 2.12.0) was used to identify differentially methylated regions (DMRs), with a p value cut-off of FDR < 0.05 and an absolute Δβ of ≥10% [22].

B Meyer, C Stirzaker et al
To develop a machine learning algorithm to distinguish PTs from breast cancer, we followed the steps outlined in supplementary material, Figure S1, on samples described in supplementary material, Table S1D.We first employed a DMP analysis through the limma package to determine individual CpG sites that were significantly different between PTs and non-metaplastic breast cancer.The training cohort consisted of PTs from the Phyllodes (Australian) cohort (n = 29) and TCGA datasets (n = 2), combined with randomly selected TCGA breast cancer samples (n = 150, supplementary material, Table S1D).We used the resulting 10 probes to create a random forest classifier using the caret package (version 6.0-93) [23].This model was then tested on three separate cohorts: (i) the Phyllodes (

Patient cohort and publicly available data
We performed a genome-wide DNA methylation analysis of primary breast PT samples.The Phyllodes (Australian) cohort (Figure 1) comprises female patients who had a PT (n = 29) reviewed by a specialist breast pathologist (S.O.T.) and classified according to the WHO criteria and modified SGH score as benign (Figure 1A, n = 2, SGH = 3), borderline (Figure 1B, n = 9, SGH = 14), or malignant (Figure 1C, n = 18, SGH ≥24).These modified SGH score cut-offs were selected to represent the minimum diagnostic criteria for each category.Additional samples were included from patients with FA lesions (n = 2) and metaplastic breast cancer (n = 2).For each patient, DNA was extracted from a FFPE block of the lesion.DNA methylation profiling of the samples was performed using EPIC arrays.To enable comparison with histopathologically similar tumours (Figure 1D-F), we included publicly available methylation datasets in our analysis (supplementary material, Table S1) [14,[17][18][19].

DNA methylation profiling delineates phyllodes tumours from histopathologically similar tumours
First, we explored the ability of DNA methylation to distinguish PTs from all other tissue types.Methylation data from n = 403 samples were combined to create the 'Comprehensive Cohort' (supplementary material, Table S1A).Visualisation of the top 500 most variable probes in a t-SNE plot shows that samples cluster according to tumour type, with minimal clustering by cohort, indicating minimal batch effects (Figure 2A).PTs cluster together, in a distinct group from the breast carcinoma samples, despite both being primary tumours of the breast.Phyllodes also form a separate group from the sarcoma samples despite both being tumours of mesenchymal origin.FA samples cluster towards the PTs, as expected given their histopathological similarity, but notably form their own subcluster suggesting distinct FA and PT methylomes.
To further characterise the difference between each type of tumour, we employed the methylation-based cellular deconvolution method EpiDISH to estimate epithelial, fibroblast, and immune cell fractions (supplementary material, Table S2).As anticipated given the known stromal neoplastic proliferation in PTs, we observed a greater proportion of fibroblast cells in PTs (mean = 0.70) compared to normal breast tissue samples (mean = 0.28, t-test p < 0.001; Figure 2Bi) and compared to breast carcinoma (which is of known epithelial origin) (mean = 0.23, t-test p < 0.001; Figure 2Bii) across all cohorts (supplementary material, Figure S2).Interestingly, we found that PTs and FA had a low immune cell proportion (mean = 0.13), which is significantly lower than other tumour types (supplementary material, Table S2).
Histopathologically malignant PTs can show sarcomatous differentiation including heterologous elements such as chondro-and osteosarcoma [1].To ascertain whether any PT samples exhibited the methylation profile of a particular sarcoma, we applied the methylation-based sarcoma classifier created by Koelsche and colleagues [18] to all PT and FA samples (n = 86).None of the n = 19 FA samples were classified as sarcoma.Of the total 67 PT samples, eight samples passed the classifier threshold for a sarcoma (threshold ≥0.9, Figure 2C).Interestingly, these eight samples [n = 5, Phyllodes (Australian) cohort, n = 3 Phyllodes (Hench et al) cohort] are across all PT grades and were classified within just four of the 65 sarcoma subtypes: five samples were identified as dermatofibrosarcoma protuberans (DFSP), which was also the most highly ranked subtype across all PT samples (supplementary material, Figure S3), one PT was classified as undifferentiated sarcoma (USARC), one PT as malignant peripheral nerve sheath tumour (MPNST-like), and one PT as desmoid-type fibromatosis (DTFM).Intriguingly, no overlap with chondrosarcoma or osteosarcoma was seen in the malignant PTs from the Phyllodes (Australian) cohort reported as having malignant bone or cartilage heterologous elements by histopathology.A t-SNE plot comparing PTs and specific sarcoma subtypes pathologically related to PTs (informed by pathologist advice and the results of classifier analysis) showed that the majority of PT samples maintained a distinct clustering from sarcomas (supplementary material, Figure S4A).However, the PT sample that had the highest score (0.99) in the sarcoma classifier as a DTFM was a malignant PT sample (GSM5418525) from the Phyllodes (Hench et al) cohort and clearly clustered with the DTFM Characterising the methylome of phyllodes tumours   Characterising the methylome of phyllodes tumours 485 subtype (supplementary material, Figure S4A).Sample 4,488 was noted to have heterologous elements of rhabdoid differentiation according to pathology and was considered to be MPNST-like by the sarcoma classifier and, interestingly, was also found to cluster close to both these sarcoma subtypes in the t-SNE (supplementary material, Figure S4B).While greater sample numbers are needed, there is potential for this already existing classifier to be applied to PT samples to detect misdiagnosed sarcomas.

DNA methylation profiling distinguishes phyllodes tumours from normal breast tissue
To discover methylation changes that define PT pathology and gain insight into PT biology, we compared the methylation profile of PTs against normal breast tissue.
For this we analysed two methylation datasets: (i) PTs from the Phyllodes (Australian) cohort (n = 29) and BRCA (TCGA) normal breast tissue samples (n = 30) and (ii) PTs from the Phyllodes (Hench et al) cohort (n = 38) and an independent set of BRCA (TCGA) normal breast tissue samples (n = 30) (supplementary material, Table S1B).A t-SNE plot of the top 500 most variable probes revealed distinct clusters by tissue type, irrespective of PT grade (supplementary material, Figure S5).We next identified differentially methylated regions (DMRs) between PT and normal tissue: 11,366 DMRs from the Phyllodes (Australian) cohort.Interestingly these regions showed a strong correlation with methylation in the Phyllodes (Hench et al) cohort (Pearson's r = 0.82, p < 2.2e-16) (supplementary material, Figure S6A).In an independent analysis of the Phyllodes (Hench et al) cohort we identified 9,992 DMRs, of which 6,882 DMRs overlapped between cohorts (supplementary material, Figure S6B, and Table S3).Gene ontology analysis of the common DMRs revealed 223 significant gene signatures (supplementary material, Table S4).Of note, dysregulation of the KRAS signalling pathway was one of the top signatures in an analysis of both hyper-and hypomethylated DMRs.Enriched hypomethylated pathways also include epithelial-mesenchymal transition (EMT) and extracellular structure organisation (supplementary material, Figure S7).

Identification of differential DNA methylation between malignant and non-malignant phyllodes tumours
We performed a genome-wide methylation analysis to identify novel genomic regions associated with PT malignancy by comparing malignant (SGH ≥24) and non-malignant samples (benign and borderline, SGH < 24).Initial visualisation of the top 500 most variable probes of PTs in the combined PT cohorts (n = 62; 5 samples with unknown SGH score removed, supplementary material, Table S1C) and analysis of global methylation showed no obvious difference in methylation between malignant and non-malignant samples (supplementary material, Figures S8A and S9A) or PT grade (supplementary material, Figures S8B and  S9B).Next, we applied a DMR analysis to identify methylation differences between malignant (n = 18) and non-malignant samples (n = 11) in the Phyllodes (Australian) cohort, which identified 355 significant DMRs (FDR ≤0.05, absolute Δβ ≥ 10%) (Figure 3A and supplementary material, Table S5).We next compared the differential methylation of the 355 malignant DMRs between the Phyllodes (Australian) cohort and the Phyllodes (Hench et al) cohort, finding a strong positive correlation (Figure 3B, Pearson's r = 0.75, p < 0.0001).Gene set enrichment analysis (GSEA) of genes proximal to hypomethylated DMRs showed an enrichment for oestrogen response and p53 from the hallmark signature and downregulated EMT signalling in breast cancer (Figure 3C).Significant pathways from the GSEA analysis can be found in supplementary material, Table S6 and Figure S10.Furthermore, we sought to validate CNV findings from Hench and colleagues' study [14] in the Phyllodes (Australian) cohort using the conumee package for calling CNVs from methylation array data.We observed several CNV amplifications in MDM4 and EGFR, with amplifications and deletions in RB1 and minimal CDKN2A/B deletions (supplementary material, Figure S11).Independent genome-wide DMR analysis of malignant versus non-malignant PT samples in the Phyllodes (Hench et al) cohort revealed 532 DMRs (supplementary material, Table S7).53/532 DMRs intersected with the 355 DMRs from the Phyllodes (Australian) cohort, and all agreed on the direction of effect (Figure 4A and supplementary material, Table S8 and Figure S12).These 53 validated DMRs were distributed throughout the genome and were all hypermethylated in malignant samples (Figure 4B).The most highly ranked DMR genes in both discovery and validation cohorts were HSD17B8, NADK, NELFA, and GFM1/LNX (Figures 3B and 4C and supplementary material, Figures S13 and S14).Among malignant samples we observed heterogeneity in the methylation levels of our top DMRs (supplementary material, Figure S15) but found no evidence of genomic copy number confounding methylation at these regions (supplementary material, Figure S16).
Of note, the most significant DMR in both cohorts is an expansive region covering the promoter and extending into the gene body of HSD17B8 [Figure 5A, Phyllodes (Australian) cohort: p = 4.48 Â 10 À32 , Δβ = 13.4%,# CpG sites = 33; Phyllodes (Hench et al) cohort: p = 1.66 Â 10 À95 , Δβ = 14.7%, # CpG sites = 36].Within the hypermethylated malignant PT group the absolute level of methylation was heterogenous, which led us to investigate whether methylation in this region was associated with any other patient-specific molecular or clinical variables.We observed no association with EpiDISH-predicted cell type proportion with HSD17B8 methylation (supplementary material, Figure S17A-C).However, we found a significant positive correlation with SGH score, suggesting that HSD17B8 methylation is likely on a continuum, increasing with the degree of atypia (as measured by SGH score,

486
B Meyer, C Stirzaker et al supplementary material, Figure S17D), as well as a near-significant negative association with age (r = À0.36,p = 0.054, supplementary material, Figure S17E).Decreased expression of HSD17B8 was previously associated with poor survival outcomes in breast cancer, including in the BRCA (TCGA) cohort [24].We therefore used the full BRCA (TCGA) dataset to test for an association between HSD17B8 expression and methylation [19].We observed a significant association between positive HSD17B8 methylation and decreased gene expression (Pearson's r = À0.48,p < 0.001, supplementary material, Figure S18), suggesting a potential regulatory role associated with the methylation change.Interestingly we also observed that, as in the Phyllodes (Australian) cohort, HSD17B8 methylation was highly heterogenous between BRCA (TCGA) samples, with only a small proportion (9.6%) with methylation levels above 40% (supplementary material, Figure S18).
Finally, we generated single-cell expression data for three malignant PTs (mean = 3,163 cells per sample), of which two overlapped with our Phyllodes (Australian) cohort methylation dataset (sample nos.4,413 and 4,436).For initial characterisation of the single-cell data, we clustered cells by predicted cell type.As expected for PTs as a fibroepithelial tumour type, most cells were predicted to be of stromal origin, but we also observed several smaller clusters of immune and epithelial cell types (Figure 5B) at similar levels to those observed from our previous cellular deconvolution analysis (supplementary material, Figure S19).We interrogated the data of all three PTs for expression of HSD17B8 and found minimal expression within the predominant   with the high gene promoter methylation levels observed in malignant samples (Figure 5C).HSD17B8 expression appears to occur specifically within the small proportion of epithelial cells (Figure 5C); however, a larger singlecell cohort including non-malignant PTs is required to determine the cell specificity of HSD17B8 expression and changes with malignancy.

DNA methylation as a biomarker to prevent misdiagnosis of phyllodes tumours as metaplastic breast cancer
Malignant PTs can share histopathological features with types of metaplastic breast cancer (Figure 1) such as spindle cell metaplastic carcinoma and metaplastic carcinoma with heterologous mesenchymal differentiation [15].As a result, PTs can be misdiagnosed as metaplastic breast cancer, which has a significant therapeutic consequence for patients [25].In our initial visualisation of DNA methylation data in the Comprehensive Cohort, we observed that metaplastic breast cancers largely clustered with nonmetaplastic breast cancers, away from the PTs (Figure 2A).Thus, we hypothesised that the methylation differences between PTs and non-metaplastic breast cancer could be exploited to distinguish PT from metaplastic breast cancer.
To develop a classifier, we curated cohorts of PT, breast cancer, and normal samples (as outlined in supplementary material, Table S1D).Initial visualisation of all samples (Figure 6A) confirmed that PTs largely clustered away from breast cancer/tissue samples.We then used a training cohort of PT and breast cancer samples to identify CpG sites of differential methylation (DMPs).These CpG sites were used to develop a random forest classifier, which we assessed using three test datasets: (i) an independent PT/breast cancer validation cohort and alternative tissue types in (ii) normal breast tissue and (iii) metaplastic breast cancer, as outlined in supplementary material, Figure S1 and Table S1D and described below.

Development of a random forest model to distinguish PT from breast cancer
First, we identified CpG sites of differential methylation between PT (n = 31) and non-metaplastic breast cancer (n = 150).We identified 321,344 DMPs and then performed manual feature selection, selecting probes based on delta beta (Δβ ≥50%), independent predictive ability (AUC ≥0.95), and low correlation to one another (pair-wise absolute correlation <0.75), culminating in a final selection of 10 probes (supplementary material, Table S9 and Figure S1).
Each probe was then independently validated through the Boruta feature selection package (supplementary material, Figure S20) and used to fit the random forest model.

(i) Testing of random forest classifier on PT versus non-metaplastic breast cancer
The random forest model was first tested on the training dataset (PT versus non-metaplastic breast cancer); as expected, we achieved 100% accuracy to discriminate PTs from non-metaplastic breast cancer (supplementary material, Figure S21A).Using the Phyllodes (Hench et al) cohort as our test dataset (n = 63) (supplementary material, Figure S1Di), we achieved an accuracy of 90.5% (Table 1, precision = 80.65%, recall = 100%, F1 = 89.3%),where only 6/63 samples were incorrectly classified (Figure 6B).S1Diii, n = 15), we conducted a blinded, expert clinical re-assessment of all histopathology reports and images for metaplastic breast cancer samples from TCGA.The conclusion from pathological review was that a diagnosis of a malignant PT could not be excluded in four samples from the TCGA cohort (TCGA-AC-A2QJ-01, TCGA-AC-A2QH-01, TCGA-AC-A7VC-01, TCGA-A2-A4S1-01).Intriguingly, we observed two of these four samples clustering with PTs (TCGA-AC-A2QH-01, TCGA-AC-A7VC-01) in our initial visualisation of variable DNA methylation of all samples (Figure 6A).Once we applied the PT/breast cancer classifier to the n = 15 metaplastic breast cancer samples, we found that 13 samples were predicted as breast cancer, and two as PTs (Figure 6C).Those two samples were TCGA-AC-A2QH-01 and TCGA-AC-A7VC-01, which, interestingly, were two of the TCGA samples previously identified as potential PTs by pathologist review and t-SNE clustering (Figure 6A), suggesting a possible misdiagnosis.

Discussion
Currently, diagnosis and grading of PTs primarily rely on histological examination of the tumour, which determines treatment.However, it can be challenging to accurately classify PTs, particularly between borderline and malignant PT tumours, and to distinguish from those tumours that have shared histological features.Furthermore, the molecular underpinnings of the spectra of PTs are not well characterised.While cancer-related genetic alterations occur in most PTs, the mutational profile between tumours is highly variable [26].Mutations in TERT (59%) and MED12 (53%) are the most common but do not occur in all malignant cases [26].Therefore, diagnosis or grading of PT via common mutations would not sufficiently capture all cases.In this study, we sought to interrogate the role of DNA methylation in PTs and its diagnostic potential by performing whole-genome methylation profiling in a novel cohort of PT patients.
Incorporating recently curated publicly available PT methylation data into our study, we showed that PTs have an entirely unique methylome in comparison to other related tumour types, including breast cancer and sarcoma.Cellular deconvolution of the methylation data and PT single-cell expression data showed that both PTs and FAs exhibit the cellular composition expected of a soft tissue tumour.In comparisons of PTs to both normal breast tissue and breast cancer, we showed that differential methylation in genes pertaining to EMT, PRC2 targets, and KRAS signalling pathways were enriched.Rare mutations in the KRAS gene have been found in patients with primary and metastasised phyllodes tumours [27], but our methylation data suggested a larger role for KRAS signalling in PT aetiology than previously thought.
Using the methylation-derived sarcoma classifier developed by Koelsche et al [18], eight PT samples were classified as specific sarcoma subtypes, most commonly DFSP, suggesting a unique biology of these PTs.Further work is required to determine whether a sarcoma-like methylation profile is associated with phyllodes patient survival or opens avenues of alternative treatments.For example, methylation could be used to identify patients that may be at risk of transformation into sarcoma [1] or potential candidates for post-operative radiotherapy [28,29], or chemotherapy if the patient is otherwise inoperable [30].
We identified 53 novel DMRs between malignant and non-malignant PTs.The top DMR in both discovery and validation cohorts was promoter hypermethylation of the HSD17B8 gene (encoding Hydroxysteroid 17-Beta Dehydrogenase 8).Interestingly, HSD17B8 was identified as the only gene with prognostic ability across an impressive collection of breast cancer cohorts, with decreased expression conferring a poor prognosis [24].Combined with our methylation data, this may mean that reduced HSD17B8 expression is a marker of poor prognosis across cancers localised to the breast, regardless of the tumour's cell type of origin.HSD17B8 has a known function in steroid metabolism, particularly as an oxidative enzyme associated with oestradiol breakdown [31].Of the other top malignancy DMR genes, NADK (encoding NAD kinase) has been targeted as a cancer therapeutic target [32], and promoter hypermethylation and downregulation at LXN (encoding latexin) were previously associated with increased tumour volume in haematopoietic cancers [33].While we have found evidence to suggest DNA methylation is associated with a malignant phenotype in PTs, future studies will need to be undertaken on cohorts with long-term survival data to validate these results.Pareja and colleagues propose that there may be two types of PT tumours evolving through distinct paths where PTs with MED12 mutations are likened to a more benign tumour that undergoes progressive malignant transformation, compared to the de novo malignant phenotype driven by mutation of more stereotypical drivers of cancer (NF1, EGFR, TP53) [34].It will be important to address the issue of whether the methylation variability we observed in HSD17B8 might be associated with genotype.

Characterising the methylome of phyllodes tumours
A major clinical challenge for PT is the potential misdiagnosis between malignant PTs and metaplastic breast cancer.This study showed the potential of a DNA methylation classifier to identify differences between PTs and breast cancer; indeed, we found that two metaplastic breast cancer samples from the TCGA BRCA cohort [19] had likely been misdiagnosed PTs.DNA methylation-based molecular profiling has shown great utility in many cancers, including sarcoma [18], brain cancer [35], prostate cancer [36], and breast cancer [37], and could be a useful tool to help improve the diagnosis of metaplastic breast cancer and, in future studies, fibroadenoma.However, the low number of FA samples was a limitation of the current study as we were not able to perform methylation analyses to distinguish PT from FA, a common clinical challenge.
If validated, the methylation signature could be rapidly translated using a targeted technique such as methylation bisulphite PCR sequencing [38] or the highly sensitive droplet digital PCR method, which is emerging as a powerful diagnostic technique in clinical laboratories [39].The sensitivity of these methods means that they can be applied to measure gene methylation signatures in FFPE biopsy samples and could therefore help distinguish PT from FA, sarcoma, and metaplastic breast cancer in a clinical setting.
Overall, our study demonstrates the utility of DNA methylation as a molecular tool to improve diagnostic accuracy among histopathologically similar tumours and stratify patients by risk, with the potential to improve long-term outcomes for patients.The next step for the clinical translation would be a multicentre study for PT biomarker validation, which would offer advantages including increased sample size for improved statistical power and assessment of a range of cellularity and atypia within the PT spectrum (including FA), as well as access to long-term clinical follow-up data.S1.Number of samples and probes used in each subanalysis, broken down by cohort and tissue type Table S2.Statistical analysis comparing cell proportions as predicted by EpiDISH between tumour types Table S3.Validated differentially methylated regions: PT versus normal breast tissue Table S4.GSEA of validated hyper-and hypomethylated differentially methylated regions: PT versus normal breast tissue Table S5.Differentially methylated regions from malignant versus non-malignant PT comparison (discovery) Table S6.GSEA of hyper-and hypomethylated differentially methylated regions from malignant versus non-malignant PT comparison (discovery)

Figure 1 .
Figure 1.Histopathology of different grades of PT and potentially misdiagnosed, similar tumours.Representative haematoxylin and eosin staining (at magnification Â400) from patients with (A) benign, (B) borderline, and (C) malignant PTs.The leaf-like structure defining PTs can be seen in (A), with progressively increasing cellularity, atypia, and infiltration observed as grade of tumour increases.Histopathologically similar tumours used as comparisons in this study include (D) fibroadenoma (FA), (E) undifferentiated sarcoma, and (F) metaplastic breast cancer.FAs and metaplastic breast cancer can share histopathological features with benign and malignant PTs respectively.

Figure 2 .
Figure 2. DNA methylation distinguishes PTs compared to other co-localised or pathologically similar tumours.(A) t-SNE of the top 500 most variable probes shows clustering by sample type.PTs (purple) cluster separately from normal breast (light blue), breast cancer (blue), sarcoma (green), and FA tissue (red).Minimal batch effects are observed between different cohorts with the same tumour type (e.g.TCGA BRCA and TNBC BRCA).(B) EpiDISH, a cellular deconvolution method using methylation, shows different cellular compositions between tissue types.PTs show the highest fibroblast and lowest epithelial proportions.Differences are shown between PTs and (i) breast cancer and (ii) normal breast tissue.(C) The sarcoma classifier generated by Koelsche and colleagues[18] classifies eight PT samples as sarcomas (≥0.9 classifier score), with the majority classified within the dermatofibrosarcoma protuberans subtype (DFSP).Other subtypes with PT samples above the 0.9 threshold include malignant peripheral nerve sheath-like sarcoma (MPNST-like), undifferentiated sarcoma (USARC), and desmoid-type fibromatosis (DTFM).

Figure 5 .
Figure 5. HSD17B8 -Evidence of promoter DNA hypermethylation with malignant PTs and single-cell expression profiling showing low expression in fibroblast populations.(A) Heatmap and relative location of HSD17B8 gene and DMR showing variable hypermethylation in malignant PTs compared to non-malignant PTs.(B) UMAP of single-cell expression data derived from three malignant PTs (resolution = 0.4).Major clusters are defined by cell type-specific expression signatures, with the major cluster predicted as fibroblasts (tumour) by 'SingleR'.(C) Depiction of HSD17B8 expression density overlayed on single-cell cell-type prediction UMAP, showing minimal expression within fibroblast cells and higher expression levels in epithelial cells.

Figure 6 .
Figure 6.Comparison of DNA methylation between PTs, breast cancers, and normal breast tissue.(A) t-SNE of 500 most variable probes among all samples and tissue types.This shows phyllodes-specific (purple, n = 69), breast cancer (blue, n = 150), metaplastic breast cancer (orange, n = 15), and normal breast tissue (light blue, n = 50) clusters.Two metaplastic breast cancer samples (TCGA-AC-A2QH and TCGA-AC-A7VC) cluster with PTs.(B) Fourfold plots of the test results from the PT versus breast cancer classifier models.Green quadrants denote a correct prediction of tumour type compared to reference, while red quadrants indicate incorrect predictions.Testing of model on PT and breast cancer samples from Phyllodes (Hench et al) cohort returns 57/63 correct predictions.(C) Testing of classifier on metaplastic breast cancer samples from TCGA and Phyllodes (Australian) cohorts returns 13/15 correct predictions.

Figure S2 .
Figure S2.Cellular deconvolution using EpiDISH for each tissue type split by cohort Figure S3.Sarcoma classifier score for each sarcoma subtype per PT sample Figure S4.t-SNE of 500 most variable probes across PT and sarcoma samples Figure S5.t-SNE of 500 most variable probes across PTs by grade and normal breast tissue Figure S6.Correlation of methylation difference between PT versus normal breast tissue at significant DMRs in Phyllodes (Australian) and Phyllodes (Hench et al [14]) cohorts Figure S7.GSEA of hypomethylated and hypermethylated validated DMRs between PTs and normal breast tissue

Figure
Figure S8.t-SNE of 500 most variable probes in PTs of different grades Figure S9.Global, CpG island, and promoter methylation levels of PT samples grouped by (A) malignancy classification and (B) grade of PT Figure S10.GSEA of hypermethylated DMRs derived from discovery analysis of malignant versus non-malignant PTs Figure S11.Copy number variation segment means for Phyllodes (Hench et al [14]) candidate genes in Phyllodes (Australian) cohort Figure S12.Heatmap of 53 validated DMRs across all PT samples Figure S13.Plot of DNA methylation at top malignant DMRs by PT grade Figure S14.Plot of DNA methylation at top malignant DMRs by tumour malignancy classification Figure S15.Heatmap and line plots of top 10 validated DMRs between malignant and non-malignant PT samples Figure S16.Copy number variation between malignant and non-malignant PT samples Figure S17.Correlation of HSD17B8 methylation with SGH score, age, and EpiDISH-derived cell fractions Figure S18.Correlation of HSD17B8 methylation from TCGA breast cancer and normal breast tissue samples with RNA expression Figure S19.Bar chart of EpiDISH cell fractions estimated from methylation data compared to cell fractions from single-cell expression data Figure S20.Validation of PT versus breast cancer feature selection via Boruta feature selection package Figure S21.Fourfold plot of training and normal breast cancer test outcomes of random forest classifier Table S1.Number of samples and probes used in each subanalysis, broken down by cohort and tissue type Ethics approval was obtained through the Sydney Local Health District (Royal Prince Alfred Hospital Zone) Human Research Ethics Committee: Protocol No. X15-0388 and 2019/ETH06994 -'Retrospective breast tumour bank and database.'

Table 1 .
Statistical results of training and test cohorts of phyllodes tumour versus breast cancer random forest model.

Table S7 .
Differentially methylated regions from malignant versus non-malignant PT comparison (validation)

Table S8 .
Validated differentially methylated regions: malignant versus non-malignant PT comparison

Table S9 .
Differential methylation statistics of probes selected for random forest classifier 494 B Meyer, C Stirzaker et al