Assessment of tumour proliferation by use of the mitotic activity index, and Ki67 and phosphohistone H3 expression, in early‐stage luminal breast cancer

Aims Phosphohistone H3 (PhH3) has been proposed as a novel proliferation marker in breast cancer. This study compares the interobserver agreement for assessment of the mitotic activity index (MAI), Ki67 expression, and PhH3 in a cohort of oestrogen receptor (ER)‐positive breast cancer patients. Methods and results Tumour samples of 159 luminal breast cancer patients were collected. MAI and PhH3 scores were assessed by three breast cancer pathologists. Ki67 scores were assessed separately by two of the three pathologists. PhH3‐positive cells were counted in an area of 2 mm2, with a threshold of ≥13 positive cells being used to discriminate between low‐proliferative and high‐proliferative tumours. Ki67 expression was assessed with the global scoring method. Ki67 percentages of <20% were considered to be low. The intraclass correlation coefficient (ICC) and Cohen's κ statistics were used to evaluate interobserver agreement. The impact on histological grading of replacing the MAI with PhH3 was assessed. Counting PhH3‐positive cells was highly reproducible among all three observers (ICC of 0.86). The κ scores for the categorical PhH3 count (κ = 0.78, κ = 0.68, and κ = 0.80) reflected substantial agreement among all observers, whereas agreement for the MAI (κ = 0.38, κ = 0.52, and κ = 0.26) and Ki67 (κ = 0.55) was fair to moderate. When PhH3 was used to determine the histological grade, agreement in grading increased (PhH3, κ = 0.52, κ = 0.48, and κ = 0.52; MAI, κ = 0.43, κ = 0.35, and κ = 0.32), and the proportion of grade III tumours increased (14%, 18%, and 27%). Conclusion PhH3 seems to outperform Ki67 and the MAI as a reproducible means to measure tumour proliferation in luminal‐type breast cancer. Variation in the assessment of histological grade might be reduced by using PhH3, but would result in an increase in the proportion of high‐grade cancers.


Introduction
Histological tumour grade is one of the most robust prognostic factors in breast cancer. [1][2][3][4][5] The modified Bloom and Richardson (BR) Nottingham grading system, which has been globally incorporated in breast cancer guidelines, 6 reflects three features, i.e. nuclear polymorphism, tubular formation, and mitotic count, the last of which reflects tumour proliferation. By the assignment of a score to each of these features, tumours are divided into three categories. Category 1 contains the well-differentiated tumours with an inherently good prognosis, and category 3 contains the poorly differentiated tumours. 1,5 Assessment of histological grade is applied worldwide, and adds important prognostic information to other clinicopathological features in order to guide systemic treatment decisions. Patients with grade 3 tumours are often candidates for treatment with adjuvant chemotherapy, whereas those with grade I tumours are candidates for less toxic hormonal therapy. 6 A substantial proportion (30-60%) of patients are diagnosed with grade 2 tumours, and in these patients the indication for adjuvant systemic treatment is less clear. Especially in this category of patients, high interobserver grading variability and institutional inconsistencies have been reported. [7][8][9] Over time, determination of the roles of individual genes in breast cancer dissemination have increased our knowledge. Although studies have revealed an important role for tumour proliferation-related genes, [10][11][12] the functional end result remains cell division. The latter is detectable for the examining pathologist as mitotic figures showing a typical appearance of chromosome sets. Assessment of mitotic figures, expressed as the mitotic activity index (MAI), is the oldest method of evaluating tumour proliferation and an important component of histological grade. The MAI has shown to be an important independent prognostic factor, 13,14 but its reproducibility remains limited. [15][16][17] Tumour proliferation can also be determined immunohistochemically by staining for the proliferation-related antigen Ki67. Several studies have demonstrated prognostic significance of assessing Ki67 in invasive breast cancer, 18,19 but variation in the methodology of this assay has limited its adoption in clinical practice. [20][21][22][23][24] Phosphohistone H3 (PhH3) has been proposed as a novel proliferation marker. This protein is involved in chromatin condensation and decondensation, and is present in the active phases of the cell cycle (G 2 to M transition). Unlike Ki67 assessment, PhH3 assessment is performed according to a standardised protocol, similar to that used for traditional mitosis counting. The contrast-rich PhH3 staining enhances the recognition of mitotic figures, and the scoring resembles assessment of the MAI. PhH3 has been shown to have prognostic value in lymph node-negative breast cancer patients, 25 but studies regarding the reproducibility of PhH3 assessment in breast cancer are scarce. In the present study, we aimed to compare the interobserver agreement for assessment of the MAI, Ki67 and PhH3 in a cohort of oestrogen receptor (ER)-positive breast cancer patients. Furthermore, the impact of replacing the MAI with PhH3 to determine histological grade was assessed.
Materials and methods P A T I E N T S As part of a prospective observational multicentre study regarding the influence of the 70-gene signature on adjuvant chemotherapy decision-making in patients treated for ER+ early-stage (i.e. absence of distant metastasis) invasive ductal breast cancer, tumour samples were obtained between 1 January 2013 and 31 December 2015. The study was approved by the medical ethics committee of the University Medical Centre Utrecht  and by the institutional review boards of participating centres. Patients enrolled in this study were asked for their consent to use their tumour samples for future research. The current side-study was conducted according to the principles of Human Tissue and Medical Research: Code of conduct for responsible use (2011). For the present study, tissue samples of 159 patients were randomly retrieved from seven of the 31 participating centres. Clinicopathological data were obtained from the study database: patient age, tumour size, grade (based on nuclear polymorphism, tubular formation, and mitotic count), histological subtype, lymph node involvement, and ER, progesterone receptor (PR) and HER2 status.

P A T H O L O G I C A L E X A M I N A T I O N
Pathological ER, PR and HER2 assessments had been routinely performed on all tumour samples (n = 159). Immunohistochemistry and fluorescence in-situ hybridisation (FISH) were performed according to local standards at each institution. According to the Dutch guideline, 26 positive ER or PR identification was defined as the presence of nuclear staining in ≥10% of breast cancer cells. Immunohistochemical expression of HER2 was scored as follows: 0 as <10% of tumour cells staining positively; 1+ as >10% of tumour cells staining positively, but no circumferential staining being present; 2+ as >10% of tumour cells showing weak or moderate circumferential staining; and 3+ as >10% of tumour cells showing strong circumferential staining. Scores of 0 and 1+ were considered to indicate a negative result, 2+ an equivocal result, and 3+ a positive result. HER2 2+ scores were re-evaluated with FISH.
Tissue samples were assessed for the MAI by three dedicated breast cancer pathologists, employed in different institutions, who were blinded to the clinicopathological data, according to the protocol guidelines of Van Diest and Baak. 27 One pathologist (observer 2) assessed the MAI in 106 of 159 included patients, whereas the other two pathologists (observers 1 and 3) assessed the MAI in all 159 patients. MAI was categorised on the basis of the total number of mitotic figures in an area of 2 mm 2 , as follows: 0-7 = 1, 8-12 = 2, and ≥13 = 3. Whole tumour tissue sections of the 159 patients were immunohistochemically stained for PhH3 (clone BC37, 1:250; Biocare, Pacheco, CA, USA). The PhH3-based mitotic count was scored by the same observers. As for traditional mitosis counting, the area of highest proliferation, preferably at the periphery of the tumour, was identified to assess the PhH3 mitotic count. PhH3-positive objects, usually with mitosis morphology, were counted in an area of 2 mm 2 , whereas intact nuclei with fine granular PhH3 staining were not counted, as these cells were regarded as not being in the G 2 /M phase ( Figure 1). 28 The previously reported PhH3 threshold of 13 positive cells was used to discriminate between patients with a high or a low number of PhH3-positive cells, as this cut-off value was associated with 20-year recurrence-free survival rates for patients with distant metastases of 58% and 96%, respectively. 25 In a non-selected subset of 105 patients, tumour tissue was additionally stained for Ki67 in one laboratory (Mib-1 antibody, ready-to-use; Dako, Glostrup, Denmark). Ki67 expression was assessed in 105 patients by observers 2 and 3, using the global scoring method. A cut-off value of 20% of nuclei positively stained for Ki67 was used to discriminate between high-proliferative and low-proliferative tumours, as previously established. 29

S T A T I S T I C A L A N A L Y S E S
Data were analysed with R, Version 3.2.2. The intraclass correlation coefficient (ICC), determined with the two-way random effects model for multiple raters [ICC with 95% confidence interval (CI)], was used to assess inter-rater agreement for numerical variables (PhH3, Ki67 and MAI score on a continuous scale), and Cohen's j was used to assess inter-rater reliability for categorical variables (PhH3, Ki67 and the MAI categorised on the basis of the aforementioned thresholds). Furthermore, we created an alternative histological grade by replacing the MAI-based mitotic count with the PhH3-based mitotic count as follows: 1 point for a PhH3 mitotic number of ≤7 per 2 mm 2 ; 2 points for a PhH3 mitotic number of 8-12 per 2 mm 2 ; and 3 points for a PhH3 mitotic number of ≥13 per 2 mm 2 . This PhH3-based histological grade of PhH3 was compared with the traditional MAIbased grade by use of the chi-square test. Two reasonable scales for the interpretation of the ICC and Cohen's j are shown in Table S1. 30

P A T I E N T S
In total, 159 early breast cancer patients with a median age of 57 years were included in this study. All patients had ER+ disease, 88% of patients had PR+ disease, and 98% of patients were HER2À. The majority of the patients had no axillary lymph node involvement (87%) ( Table 1).
On the basis of the original pathology assessment, 16% of patients had low-grade (I) cancers and 67% of patients had intermediate-grade (II) tumours. For traditional mitosis counting, the median total number of mitotic figures were 2 [interquartile range (IQR) of 3], 3 (IQR of 6) and 5 (IQR of 8) for observers 1, 2 and 3, respectively, resulting in an MAI score of 1 in 84%, 70% and 60% of patients.

A S S E S S M E N T O F H I S T O L O G I C A L G R A D E B A S E D O N T H E M A I V E R S U S G R A D E B A S E D O N P H H 3
When PhH3 was used in the modified BR Nottingham grading score instead of the MAI, interobserver agreement in determining histological grade improved (MAI, j = 0.43, j = 0.35, and j = 0.32; PHH3, j = 0.52, j = 0.48, and j = 0.52). At the same time, when the grading score was re-evaluated on the basis of PhH3 assessment, it shifted from grade I to grade II in 8% (observer 1), 12% (observer 2) and 4% (observer 3) of the patients, and from grade II to III in 27% (observer 1), 18% (observer 2) and 14% (observer 3) of the patients (P < 0.001) ( Table 3). Among all three observers, there were a few patients who were downgraded from grade II to grade I (n = 1, n = 2 and n = 3 for observers 1, 2 and 3, respectively) or downgraded from grade III to grade II (n = 1, n = 2 and n = 6 for observers 1, 2, and 3, respectively) ( Table 3). The majority of the patients who were upgraded from grade II to grade III had a PhH3 score of ≥13 (86%, 95% and 69% for observers 1, 2 and 3, respectively), whereas a substantial proportion in whom the histological grade was shifted from grade I to II had a PhH3 score of <13 (31%, 43% and 83% for observers 1, 2 and 3, respectively).

Discussion
In this study, the reproducibility of three different proliferation-related variables that contribute to the assessment of tumour grade was compared in patients with luminal-type breast cancer. Our results demonstrate that PhH3-based mitotic counting provides a more reproducible means for observing tumour proliferation in ER+ early breast cancers than MAI or Ki67 assessment. Incorporating PhH3 as an alternative to the traditional MAI in the BR Nottingham grading system would decrease the variation in histological grading, but would increase the proportion of cancers that would be considered to be highgrade tumours. Assessment of mitotic activity is routinely performed as part of determining histological tumour grade, and has been established as an independent prognostic factor. [31][32][33][34] The reproducibility of the MAI is limited. [15][16][17] This may in part be attributable to a lack of strict protocols, and to difficulties in selecting the mitotically most active area, 16,17 but it may also result from the coexistence of cells that mimic mitosis, such as apoptotic and necrotic cells, especially in cases of poor fixation. 35 Optimal assessment of mitotic activity requires the experience of trained pathologists and dedication, as this may take~10 min. 36 PhH3 showed better interobserver agreement in the present study than did the MAI, supported by higher ICC and Cohen's j scores. PhH3 is a proliferation marker that is specific for mitosis, as it is expressed from the late G 2 phase to M transition, and rapidly degrades on entry into the G 1 phase. 37 Therefore, PhH3 labelling has been reported to closely correlate with mitotic figure detection on standard haematoxylin and eosin (H&E)-stained sections. 38,39 As compared with the MAI, PhH3 is relatively easy to assess, as its bright staining offers easy visualisation of mitotic figures by morphology, resulting in a high accuracy of detection. The results of our study showed that PhH3 revealed higher numbers of mitotic cells than did H&E staining, which is in line with previous literature. 28,40 This difference in sensitivity may be explained by the fact that prophase figures are not well recognised with regular H&E stains, but can be easily identified in PhH3-stained specimens. 28 Because of the sharp contrast with non-stained elements, PhH3 allows rapid detection of the mitotically most active area. 40 A previous study demonstrated that PhH3 staining was particularly useful in detecting mitotic cells in highgrade cancers with dense cellularity and with numerous apoptotic and necrotic cells. 28 In addition, PhH3 assessment may serve as a better means to assess proliferative activity in core needle biopsies, as PhH3 labelling was found to be more accurate at identifying mitotic figures than routine H&E staining. 41 In the light of these advantages, it is conceivable that PhH3 staining results in a higher accuracy of mitotic figure detection, even in specimens with poor fixation, or specimens that contain dense, distorted tumour infiltrate or crush artefacts. Then again, others have shown that antigenicity for PhH3 can be lost if tissue is not immediately fixed after sampling. 42 Hence, fixation delay should be kept as short as possible.
In addition to the conventional factors, immunohistochemical assessment of the proportion of cells staining for the nuclear antigen Ki67 is used for determination of tumour proliferation. Many studies have demonstrated the prognostic value of Ki67. 43 However, the clinical utility of this marker has been disputed because of poor reproducibility, which is also reflected by the results of the present study. Flaws in Ki67 assessment are attributed to a lack of scoring consensus among experts and an undefined cut-off point for clinical decision-making. In an effort to harmonise the analytical methodology of Ki67, the International Ki67 Breast Cancer Working Group proposed a set of guidelines for the analysis and reporting of Ki67. 44 However, even after standardisation, the assessment of Ki67 among some of the world's most experienced laboratories turned out to be poor. 45 Although interlaboratory variability in staining methods contributed to differences in Ki67 scoring, the working group also observed substantial discrepancies in Ki67 interpretation when the staining was performed centrally. These results are in line with those of another study reporting high interobserver variability in Ki67 assessment among 15 pathologists. 46 The Ki67 working group stated that 'unless an individual pathology laboratory has demonstrated that its staining and scoring methodology, including cut-off determination, meet the highest level of evidence for clinical utility, clinicals should use Ki67 results with caution'. 45,47 As PhH3 assessment is also based on immunohistochemistry, one may wonder to what extent PhH3 assessment suffers from similar limitations. In contrast to the variability in Ki67 scoring methods, PhH3 assessment is performed according to a standardised protocol similar to that used for traditional mitosis counting. Furthermore, PhH3-positive cells can be unambiguously identified, even at low-power magnification and by inexperienced observers. 40 Finally, there is less debate regarding cut-off values for PhH3 assessment.
In the present study, the use of PhH3 instead of the MAI to determine the modified BR histological grade resulted in the histological grade being upgraded in 14-27% of cases. This increase in the proportion of patients with high-grade tumours is in line with other studies. 25,28,38,48 PhH3 was shown to have independent prognostic value, which exceeded the prognostic value of the MAI [hazard ratio (HR) of 9.6 versus HR of 3.6]. 49 These findings support the concept of replacing the MAI with PhH3 in order to improve the prognostic value of histological grading through better identification of mitotic figures. At the same time, PhH3-based mitotic indices should be evaluated in larger studies before their use in clinical practice can be recommended.
To our knowledge, this study has provided a unique comparison between the reproducibility of traditional proliferation markers and that of the novel proliferation marker PhH3. Interobserver agreement was reliable, as the pathology examination was performed by three dedicated breast cancer pathologists, working in different institutions. It is important to note that we performed this study in a selection of ER+ cancers, and this should be taken into consideration when the results are interpreted. However, optimisation of the assessment of tumour proliferation is especially needed in this subset of patients, as the patient group was a selected group in whom genomic profiling was undertaken to decide on adjuvant chemotherapy. It is important to note that the prognostic value of the different proliferation markers was not addressed in the present study, as follow-up data were not available, and the follow-up period would have been too short. In due course, outcome data will become available, and these will enable us to also further evaluate PhH3 assessment in terms of prognostication. We also aim to explore deep-learning algorithms to automatically identify PhH3-positive objects, as has successfully been performed before for mitoses in H&E-stained and PhH3-stained sections. 50,51 In conclusion, our results demonstrate that PhH3 is a more reproducible proliferation marker in breast cancer than are the MAI and Ki67. The association between PhH3 and outcome, and the potential increase in the proportion of high-grade cancers when PhH3 is used, need to be further addressed.

Author contributions
All persons listed as authors were actively involved in one or more key aspects of the reported study. J. E. C. van Steenhoven: conception and design, analysis and interpretation of data, drafting of the article, and final approval. A. Kuijer: conception and design, acquisition of data, analysis and interpretation of data, critical revision, and final approval. R. Kornegoor: interpretation of data, critical revision, and final approval. A. M. van Leeuwen: acquisition of data, critical revision, and final approval. J. van Gorp: acquisition of data, interpretation of data, critical revision, and final approval. T. van Dalen: conception and design, interpretation of data, drafting of the article, and final approval. P. J. van Diest: acquisition of data, interpretation of data, critical revision, and final approval.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Table S1. Interpretation of intraclass correlation coefficient (ICC) and Cohen's j score. Table S2. Concordance of PhH3 scored by three different breast cancer pathologists. Table S3. Concordance of MAI classes scored by three different breast cancer pathologists. Table S4. Concordance of Ki67 scored by two different breast cancer pathologists.