Diagnostic concordance of phyllodes tumour of the breast

Phyllodes tumours (PT) are rare and distinct breast tumours, which span a morphological continuum. Classification into benign, borderline and malignant categories reflects their biology and clinical behaviour and is essential to guide management. This study aims to assess the diagnostic agreement of PT using the UK National Health Service Breast Screening Programme (NHSBSP) breast pathology external quality assurance (EQA) scheme data.


Introduction
Phyllodes tumours (PT) are uncommon biphasic (epithelial and stromal) breast lesions comprising approximately 1% of all breast tumours. These tumours share the term 'phyllodes', which is used to describe their unique architecture; however, they have variable morphology, biology and clinical behaviour. PT represent a broad spectrum of lesions from indolent benign to aggressive malignant tumours. Adding to the challenge, they overlap with entities such as fibroadenoma and hamartoma at the benign end of the spectrum and with metaplastic carcinoma and sarcomas at the other end of the spectrum. 1,2 Recognition of associations between various categories of PTs and risk of recurrence and/or metastasis is essential to guide further management. Although distinguishing between classical benign and malignant PT is easy and straightforward, tumours with overlapping features make the distinction between some forms challenging. Also, classification of PTs depends on a set of differently weighted and subjective criteria which results in variability in PTs classification in routine practice, potentially impacting on management. PTs are classified into benign, borderline and malignant categories based on a constellation of histological variables including the degree of stromal cellularity, stromal cellular atypia, mitotic count, stromal overgrowth and the nature of tumour borders. As each microscopic parameter has two to three tiers of stratification, there are significant challenges in obtaining an accurate, objective and reproducible categorisation. 3 This study aimed to assess the diagnostic interobserver agreement of PT diagnosis and classification utilising a large cohort scored by a large number of pathologists. Cases were histologically reviewed to understand the reasons for any disagreement, and to provide insights for improving the diagnostic concordance for these tumours.

Materials and methods
This study is based on data obtained from the National Health Service Breast Screening Programme (NHSBSP) external quality assurance (EQA) scheme. A description and details of standard operating procedures have been published. 4,5 In brief, sets of 12 cases plus three educational cases are circulated, twice a year, to pathologists in the United Kingdom involved in providing breast pathology clinical service. Each case comprises one representative haematoxylin and eosin (H&E)-stained slide of surgical excision specimens. The cases are submitted by participants for use in the scheme. A standard reporting form is used for each case, which includes the diagnostic classification of the lesion. The scheme includes > 700 UK-and Republic of Ireland-based participants, and each participating pathologist independently examines the slide for each case and completes a tick-box proforma. Participants included breast specialists and non-specialists (general pathologists with interest in breast pathology).
In this study, a total number of 26 PT cases were retrieved. These cases had been circulated over 17 years, between 2003 and 2019, and were assessed by an average of 607 pathologists (range = 454-675). Slides from these lesions were reviewed under a multihead microscope by four pathologists specialising in breast pathology (E.R., R.M., A.A., M.T.) to agree on final classification of the lesion based on current diagnostic criteria agreed by the World Health Organisation (WHO). 6,7 Morphological features were systematically recorded as follows: (1) tumour border (well-defined/focally infiltrative/infiltrative), (2) architecture (leaflike/clefts/absent characteristic architecture), (3) subepithelial stromal condensation (yes/no), (4) stromal cellularity (mild/moderate/marked), (5) highest degree of cellularity (in cases with heterogeneous appearance), (6) extent of hypercellularity (diffuse/focal), (7) degree of stromal atypia (absent/mild/moderate/marked), (8) mitotic activity [number of mitoses per 10 high-power fields (HPF)], (9) presence of atypical mitoses (yes/no), (10) stromal overgrowth (yes/ no), (11) malignant heterologous elements (present/ absent), (12) benign stromal metaplasia (e.g. chondroid, osseous and myoid metaplasia) (yes/no), (13) necrosis (present/absent), tissue infarction was not recorded as necrosis ( Figure 1) and (14) epithelial atypia (yes/no). In this study, stromal overgrowth was defined as stromal proliferation without epithelial elements observed in at least one low-power field (94 microscope objective). 6 Predominance of stroma with increased stromal areas without epithelium but less than that defined as overgrowth, was classified as stromal expansion. We have also introduced a few other less-reported or less-studied features. Multinodularity was recorded when either multiple or small satellite nodules were seen in the periphery of the lesion; these were either poorly or well-defined areas of stromal or biphasic proliferation ( Figure 2). Clefting was defined as the presence of elongated, branching ducts with a staghorn appearance and was recorded in cases where the well-developed leaf-like architecture was absent 7 (Figure 3).
Case classification into one of the four EQA diagnostic categories (benign/atypia/in situ/malignant) and details of other diagnoses proffered by participating pathologists (free text answers) was available for each case. As such, benign PTs were sometimes diagnosed as fibroadenoma or benign fibroepithelial proliferation and grouped under benign lesions. Similarly, malignant PTs were sometimes labelled as soft tissue sarcomas or metaplastic carcinomas and grouped under malignant lesions. Borderline phyllodes tumours were sometimes added under benign/atypia/ in-situ categories. The agreement rate was first calculated based on this initial EQA grouping as either benign lesions, borderline PT or malignant lesions.
A case-by-case analysis was performed to accurately record all replies, regardless of how they were initially assigned to one of the four EQA diagnostic categories. Actual concordance rates were then calculated as the percentage of respondents who agreed with the final diagnosis. Cases were again reviewed based on their true diagnostic agreements rates, and reasons for concordance/discordance were discussed and detailed. The final 'ground truth' diagnosis for each case was based on the majority diagnosis of members of the EQA scheme central coordinating group of pathologists, approximately 20 in number, representing each English National Health Service (NHS) health region and the devolved UK nations plus two representatives from the Republic of Ireland. This final 'consensus diagnosis' was confirmed at review discussion by the authors (pathologists) of this study. Borderline PTs that showed a split of the diagnosis between benign and malignant PTs were considered in this study as borderline PT.
The morphological features recorded for discordant cases were analysed against cases with good concordance to reveal areas of confusion and possible pitfalls; this helped to reveal morphological features easily recognised from more subtle changes which had a lower impact on final diagnosis.

Results
Following the review of the reported diagnoses and of the representative slides, cases were classified into benign (n = 14), borderline (n = 6) and malignant (n = 6) PTs. The diagnostic agreement rate of these cases varied significantly when different diagnostic categories were considered (Table 1). A higher agreement rate was obtained when the cases were classified as 'benign lesions' versus borderline PT versus 'malignant lesions' (86%) compared to when classified as three PT grades (benign, borderline or malignant PTs) (63%). The diagnostic agreement of all cases diagnosed as PT irrespective of its grade was 79% (range = 41-99%). The highest concordance rate was that of benign PT, which was categorised as a benign lesion in 91% of cases. When the term 'benign PT' was used as a final diagnosis, the concordance rate when compared to the ground truth diagnosis decreased to 86%. Similar concordance rates were observed with malignant PT (90%), whereas the lowest concordance rates were found in borderline PT (42%) ( Table 1).
The benign category included benign PT as well as other benign diagnoses made by participating pathologists in each case. In benign PT, other benign diagnoses comprised 25% (range = 3-44%) of the cases. These included other benign fibroepithelial lesions: fibroadenoma (which was the most common diagnosis), followed by hamartoma, fibroadenomatoid hyperplasia and benign fibroepithelial lesion, unclassified in addition to occasional cases reported as pseudoangiomatous stromal hyperplasia (PASH) and fibromatosis. The term 'benign fibroepithelial lesions' was not frequently used (reported 45 times in the whole series). In cases of malignant PTs, they were largely assigned to the malignant category; other diagnoses specifically named by participants for these lesions included sarcoma and metaplastic carcinoma. from one case, which was diagnosed as malignant PT by 13% of the participants. The other diagnoses reported by participants included other benign entities followed by borderline PT.
In the benign PT group, we found that cases with features overlapping between fibroadenoma and benign PT were unlikely to be diagnosed as borderline or malignant PT. In contrast, benign PTs with more borderline appearances were less likely to be diagnosed as fibroadenoma. These features included hypercellularity, leaf-like architecture and prominent clefting, even in the absence of stromal atypia or in the presence of low stromal mitotic activity.
One benign PT showed fat infiltration, which on histological review appeared to represent entrapped fat by multiple coalescent fibroadenomatoid foci in one area of the lesion. Other areas of the same tumour showed changes of benign PT with stromal expansion and clefting. This case was classified as fibroadenoma, hamartoma or fibroadenomatoid changes by 44% of the participants, rather than recognised as benign PT.
Another case showed myoid differentiation resulting in increased cellularity (mild to moderate stromal cellularity) that distorted the predominantly pericanalicular growth pattern, resulting in lack of leaflike architecture, focally ill-defined margins and no stromal atypia or mitosis. This case was diagnosed as benign PT by 48% of the participants, while the remaining classified it as: fibroadenoma (16%), hamartoma (6%), fibromatosis (8%) and benign fibroepithelial lesion, unclassified (1%); other benign entities, including spindle cell tumour, adenomyoepithelioma and myofibroblastoma, were reported by 13% of pathologists.
On review of a case of benign PT that was diagnosed as fibroadenoma in 36% and benign PT in 64%, there was significant stromal hypocellularity in areas of stromal expansion, and no obvious atypia or mitotic activity was seen. However, the presence of clefting and stromal expansion were sufficient for the diagnosis of benign PT.
Interestingly, one benign PT showed stromal changes in keeping with cellular PASH resulting in an appearance of a diffuse mild to moderate stromal cellularity with worrisome spindle cell proliferation. Clefting was focal and the margin was focally infiltrative. This case yielded the lowest concordance of the benign PT (19%) and multiple diagnoses were given, including other benign fibroepithelial lesion in 3%, benign lesion without further description in 33%, fibromatosis in 5%, borderline PT in 9% and even malignant in 34% (including malignant PT in 13% and angiosarcoma in 8%). Even though atypia was mild to moderate, mitotic counts were low (three per 10 HPF) and no atypical mitoses or malignant heterologous elements were noticed.
Lastly, one case showed prominent classical type PASH-like changes and PASH was reported as the sole diagnostic entity by 10% of the participant in that case, despite prominent epithelial clefting.
B O R D E R L I N E P T Borderline PT showed the lowest concordance rates. In borderline PTs, the second most preferred diagnosis was benign PT in most cases. Borderline PTs showing low concordance rates below 50% were classified as such by participants following histological review. The principal cause of low concordance rate was a split of diagnoses proffered between benign PT and malignant PT (  (Figure 4); the latter component, if present, is now considered insufficient for the diagnosis of malignancy in PT on its own. 7 One malignant PT with a well-differentiated liposarcomatous component was circulated prior to this recent recommendation, and this case also had additional features that favoured the diagnosis of malignant PT. One recent case with a well-differentiated liposarcomatous component, but no other malignant features, had a split of proffered diagnoses between malignant and benign ( Table 2).
Only one of the six malignant PT had diagnostic agreement under 80%; we believe this may be a consequence of uncertainty over the nature of multinucleated giant cells present in the stroma ( Figure 5). In this case, the next most frequent diagnosis was in the benign category (13%) rather than borderline PT (6%).

G O O D C O N C O R D A N C E P T
There were six cases that showed diagnostic agreement greater than 80%, which included five malignant (98, 98, 91, 81 and 80%) and one benign (81%) PTs. The five malignant cases showed marked stromal atypia. Three of these cases exhibited malignant heterologous elements (chondrosarcoma, osteosarcoma and/or pleomorphic/high-grade liposarcoma), while the other two showed high mitotic counts (18 and 19 per 10 HPF, respectively) with multiple atypical figures. All cases showing atypical mitoses were in the high concordance groups with one exception, where cellular areas were poorly represented on the slide.
The benign PT with 81% agreement between participants had variable stromal cellularity with areas of moderate to marked increased cellularity, focal clefting, but no stromal overgrowth, three mitoses per 10 HPF and mild to moderate stromal atypia. The margins were well defined. The consensus meeting favoured benign over borderline PT (diagnosed in 15%) rather than fibroadenoma (diagnosed in 3%) or malignant PT (1%).

M O R P H O L O G I C A L F E A T U R E S A S S O C I A T E D W I T H D I A G N O S T I C D I S C O R D A N C E
Diagnostic discordance (< 60%) was observed in 87% (seven of eight) of borderline PT and 58% (seven of 12) of benign PT. The lowest concordance case was a borderline PT, which showed intermediate morphological features or non-uniform changes in terms of cellularity, atypia and architecture ( Figure 6). All these borderline PT cases showed features overlapping with other categories. One of the borderline cases showed infarction with consecutive marked hypocellular areas mimicking benign PT. Additionally, on low magnification, three of these five borderline PT showed a hypocellular appearance, with oedema or PASH-like changes in areas of stromal expansion. The benign PT with low concordance had focal or no clefting, non-uniform stromal cellularity and lacked stromal overgrowth or well-formed leaflike structures. Multinodularity was seen in seven cases, four of which were in the discordant groups. All cases with multinodularity from our series also had at least focally infiltrative margins and were classified as either borderline or malignant PT.
Towards the benign end of the spectrum of fibroepithelial lesions, there were 10 cases of benign PT for which the other main diagnosis was a benign entity, mainly fibroadenoma. Almost all these cases showed epithelial clefting but no leaf-like structures (seven of 10), well-defined margins (eight of 10), focal increase in stromal cellularity (five of 10) and stromal expansion but no overgrowth (six of 10).
Estimating the degree of stromal atypia in PT is a subjective task. Most cases (83%) with a concordance of between 40 and 80% had absent or mild stromal atypia and a main differential diagnosis with fibroadenoma or other benign entities. Marked stromal atypia was usually seen in the unequivocal malignant PT cases. None of the cases showed necrosis or malignant heterologous elements. HPF, high-power field. *Also showed large areas of infarction that impart a hypocellular appearance. **Shows areas in keeping with well-differentiated liposarcoma/atypical lipomatous tumour.

Discussion
PT of the breast remains a controversial entity, not only regarding its clinical management but also its diagnostic histological features and categorisation. 6 This study showed variation in the level of diagnostic agreement of PT and identified a range of other confounding lesions frequently reported by pathologists. Additional histological review of the slides, as a component of this study, has enabled identification of features which may explain the high and low agreement rates observed and which could be used to signal cases that may require a second opinion or further diagnostic work-up. As this study was based on the UK EQA scheme participants' diagnosis, the criteria for grading PTs are supposed to be standardised among participating pathologists and based on the published UK NHSBSP and RCPath Guidelines and minimum data sets. [8][9][10] The overall agreement rate in this study appears to be higher than expected, as it is a common perception among pathologists that PT grading is associated with low concordance. However, this study showed that discordance rates varied significantly between cases. The overall rate of diagnosis agreement was 86% when the data were analysed as benign versus borderline PT versus a malignant lesion. It dropped to 79% when the proffered diagnosis was restricted to PT (irrespective of grade) and to 63% when the diagnosis was based on PT grade. The highest agreement rate was observed in malignant PT and the lowest agreement was observed in borderline PT. For malignant PT the agreement was very good; however, extreme outliers have been identified. In some cases, a range of other diagnoses were given including benign PT, borderline PT, sarcoma, carcinoma and DCIS, mucinous carcinoma and papillomas. There was one malignant PT which, in spite of well-represented malignant features, was surprisingly classified as benign by 13% of participants. This case serves to highlight that malignant areas in PT can be focal, with otherwise benign or borderline characteristics. It may also reflect a degree of subjectivity in weighing the various PT diagnostic features by pathologists. Pathologists should establish the diagnosis based on the most aggressive areas.
Another observation relates to classification of epithelial proliferation in PT which may mimic carcinoma. Although it is understood that PT may be misdiagnosed as metaplastic carcinoma, 2 and 3% of participants classified two malignant PT as carcinoma, of NST and mucinous types. Occasionally, diagnoses of adenomyoepithelioma and pleomorphic adenoma were proffered. Unless the degree of epithelial atypia present is high grade, the diagnosis of DCIS should be questioned. Florid and architecturally complex form of benign epithelia hyperplasia are well recognised to occur in all fibroepithelial lesion types. In doubtful cases immunohistochemistry can be helpful. The stroma in adenomyoepitheliomas is typically not very cellular, and its biphasic nature is epithelial/ myoepithelial rather than epithelial/stromal.
Finally, occasional diagnoses of sarcoma including angiosarcoma, liposarcoma and chondrosarcoma were proffered. When the characteristic clefting architecture of PT is present the diagnosis of primary breast sarcoma is essentially excluded. Sarcomatous areas within malignant PT are usually biologically different from similar morphological types of sarcoma in soft tissue 11,12 and their clinical implications are different. The only benign PT that was overcalled as malignant by 24% of participants was associated with prominent and focally cellular PASH-like areas and focal stromal overgrowth. Benign PT shows overlapping features with cellular fibroadenoma 1,2 and the distinction is especially problematic on core biopsies. 6,13,14 This study showed that this distinction can also be problematic on excision specimens. Fibroadenomas may show a mild to moderate degree of cellularity, but these cases should not exhibit stromal expansion, overgrowth, leaf-like architecture or an infiltrative margin. Classical PT features of epithelial clefting and increased stromal cellularity should raise the suspicion of PT, even when these are focal. However, we accept that for some cases the distinction is extremely difficult to make reliably and that opinions between pathologists will differ, and indeed the outcome and management of such lesions may be similar. Use of the term 'benign fibroepithelial neoplasm' when there is histological ambiguity, with explanation of the diagnostic difficulty, is useful and can avoid overtreatment. 7,15 Diagnostic agreement of borderline PT is suboptimal, but has the potential to increase through guideline improvement focused upon the expanding evidence base, changes in clinical management and greater harmonisation of treatment protocols for benign fibroepithelial lesions and borderline PT. In routine practice, pathologists should examine multiple slides and process more tissue, which may lead to upgrading some cases to the malignant category. In the majority of such borderline cases, seeking the opinion of other pathologists and external experts is advised. All these measures are likely to improve the diagnostic agreement of such uncommon lesions in the clinical setting. In one case, there was a diagnostic split between benign PT and malignant PTs. This case had a well-differentiated lipomatous tumour component which was interpreted by some as malignant and ignored by others who made a benign diagnosis. This case also showed diffuse moderate stromal cellularity and moderate atypia which were sufficient for a borderline PT diagnosis. Current guidelines indicate that lesions showing intermediate features should be diagnosed as borderline PT rather than trying to make a benign or malignant diagnosis. These lesions are likely to be associated with higher local recurrences rates, but have much less metastatic potential than malignant PT. 6,7,16 Additionally, it is now recommended that the presence of welldifferentiated lipomatous tumour should not be regarded as a malignant heterologous element that, per se, can define a PT as malignant. 7 However, pleomorphic or high-grade liposarcomatous components, chondrosarcomatous and/or osteosarcomatous elements are sufficient on their own to designate a PT as malignant.
In this study we found that certain features are associated with better concordance of PT diagnosis and classification and that pathologists may prioritise certain features over others to reach the overall diagnosis. Lower-priority features were stromal expansion, clefting and multinodularity. Malignant heterologous elements, stromal overgrowth and leaf-like architecture are features associated with higher concordance rates. Leaf-like processes, key criteria for the diagnosis of PT, may be found in intracanalicular fibroadenomas, but in such cases they are few in number and often poorly formed. 17,18 We therefore recommend that histological features used to distinguish benign, borderline and malignant PT should be considered holistically, as emphasis on a single feature may result in misclassification. 7,19,20 The lack of weighted score rules for these features, added to their interpretive subjectivity, has led to a higher rate of discordance, especially in nonstraightforward challenging cases. Strict histological criteria for diagnosing malignant PT, with their risk of recurrence and potential, albeit low, for metastatic spread should be adhered to in order to avoid underor overtreatment. Thorough sampling, consultation with colleagues and seeking expert opinion in cases of PTs with overlapping features are advised to improve the diagnostic agreement.
The strengths of our study are the relatively large cohort of cases available and the large number of breast pathologists proffering a diagnosis for each case. One weakness of the study is that not all pathologists examined the same slide. Although sections are quality-checked to ensure that they are all representative of the lesion before being circulated, this can still have an impact upon the amount of tumour tissue present in each slide and invariably on mitotic counts and extent of cellular areas. Also, PTs tend to be morphologically heterogeneous, such that a diagnosis based on a single slide may not receive the same accuracy when multiple sections of the tumour are examined. This may have affected the diagnostic concordance rates.
In conclusion, the concordance of PT diagnosis, as an entity, is high, but its classification into benign, borderline and malignant has variable agreement levels, with borderline tumours having the lowest concordance rate. More research to refine the diagnostic criteria for categorisation of PT is warranted to improve concordance between pathologists. PT with overlapping features should trigger consensus opinion or expert referral. When all else fails, it may be prudent to express a degree of uncertainty when reporting lesions with morphological overlap between the PT entities.