The standardized assessment of Ki67 labeling index (LI) is of clinical importance to identify patients with primary breast cancer who could benefit from chemotherapy. In this study, we evaluated the interobserver concordance of Ki67 LI assessment. Six surgical pathologists participated and all the slides were prepared from archival breast cancer tissues fixed in 10% buffered formalin for 24 h and stained with MIB-1. Three independent studies were conducted. In the first study, 30 stained slides were assessed using two different methods: the scoring system, with a positive rate scored from 1 (0–9%) to 10 (90–100%) by visual estimate; and the counting method, with approximately 1000 cells counted in hot spots. In the second study, 20 tumors with Ki67 LI 5–25% were assessed, and in the third study, 15 printed photographs of stained slides were assessed to avoid variations by selecting different fields. In study 1, the counting system (intraclass correlation coefficient [ICC], 0.66 [95% confidence interval 0.52–0.78]) demonstrated a better correlation than the scoring system (ICC, 0.57 [0.42–0.72]). In study 2, the assessment for Ki67 LI of 5–25% demonstrated a correlation (ICC, 0.68 [0.50–0.81]) similar to that of study 1 (unrestricted range of Ki67 LI). In study 3, the assessment of Ki67 LI by counting yielded a good concordance (ICC, 0.94 [0.88–0.97]). In conclusion, there was better concordance with the counting system, and concordance was high when the assessed field was predetermined, indicating that the selection of the evaluation area is critical for obtaining reproducible Ki67 LI in breast cancer.
The introduction of adjuvant therapy into the treatment strategy for breast cancer patients has contributed to a significant reduction in breast cancer mortality. Either one or a combination of chemotherapy, endocrine therapy and molecular targeting therapy has been applied as adjuvant therapy based on the clinical and pathological parameters, including tumor size, lymph node involvement, hormone receptor (HR) expression, human epidermal growth factor receptor 2 (HER2) status and histological grade. However, who should receive chemotherapy among the patients with early stage breast cancer has not been clarified, in particular among those with HR-positive diseases. Chemotherapy may result in serious adverse effects, such as secondary malignancy and cardiac toxicity, and, therefore, it is crucial to develop biomarkers for selection of those who could benefit from systemic chemotherapy.
Several published studies suggest the prognostic value of the cell proliferation marker Ki67 labeling index (LI) in breast cancer. Ki67 LI has also been considered as a promising biomarker to select patients who could benefit from chemotherapy. In preoperative neoadjuvant treatment, Ki67 LI is reported to be associated with pathological response in a number of studies,[4-6] although not all the studies support the predictive value of Ki67 in their multivariate analyses.[7, 8] In the adjuvant settings, Ki67 LI is reported to predict the therapeutic benefits of the addition of taxanes to anthracycline-based regimens in patients with HR-positive diseases,[9, 10] while Ki67 LI is also reported not to predict the relative efficacy of adjuvant chemotherapy consisting of cyclophosphamide, methotrexate and fluorouracil (CMF) compared to endocrine therapy alone. Recently Ki67 LI was reported as a biomarker to distinguish luminal B from luminal A subtypes and it has been widely used in pathological evaluation of breast cancer.[12, 13] Ki67 LI has, therefore, attracted enormous interest from clinical oncologists but it remains to be confirmed whether Ki67 LI is useful for identifying patients who could benefit from chemotherapy. Therefore, the standardization of the assessment of Ki67 LI is considered essential to critically evaluate the clinical value of Ki67 LI and to apply it in clinic.
We have previously reported on the standardization of biomarker assessment, in particular HER2 assessment.[14, 15] In this study, we evaluate the interobserver concordance of the assessment of Ki67 LI in the archival materials by six surgical pathologists and discuss the potential causative factors resulting in the discordance of assessment among the pathologists.
Materials and Methods
All the slides were prepared from archival tissues of 10% formalin-fixed and paraffin-embedded tissue specimens (years 2009–2010) of primary breast cancer at Kyoto University Hospital, Kyoto, Japan. Pathological assessment was performed by six surgical pathologists (A to F) who specialize in breast pathology from six different Japanese institutions: Kyoto University Hospital, National Cancer Center Hospital, Saitama Cancer Center, Nihon University School of Medicine, The Cancer Institute Hospital of the Japanese Foundation for Cancer Research and Tohoku University School of Medicine, all located in Japan.
For the present paper, three independent studies were undertaken to estimate the interobserver concordance.
Six consecutive slides were prepared using five formalin-fixed paraffin embedded (FFPE) blocks from surgical specimens of five different breast cancer cases. A tissue slide from each case was immunostained with an antibody, MIB-1 (DAKO, Glostrup, Denmark), in each institution according to their routine methods. A total of 30 stained slides were collected and shuffled in the data center, located in Kyoto University. The 30 slides were sent to each institute and assessed for Ki67 LI by each pathologist using two different modes of assessment. First, they used the scoring system, in which the rate of positive cells in hot spots, namely areas where Ki67 staining in cancer nuclei is the most dense among the fields, was scored from 1 (0–9%) to 10 (90–100%) by visual estimate without counting the cell number. The second method used was the counting system, for which approximately 1000 cells in total were counted in the hot spots and the positive rate was calculated. Assessment was performed by looking at tissues under the microscope in three institutes or by capturing images in three institutes depending on their routine assessment methods.
To assess the variability of Ki67 LI around 15%, which is clinically relevant to distinguish between luminal A and B subtypes of breast cancer,[12, 13] 20 tumors with Ki67 LI ranging from 5% to 25% (15 ± 10%) determined by a pathologist independent of this study, stained in a single institution (Kyoto University Hospital), were subsequently assessed by the participating pathologists using the counting system.
To avoid variations by assessment in varied microscopic fields and to further evaluate the variation of the threshold of immunointensity interpreted as positive by different pathologists, 15 printed photographs of Ki67-stained slides were taken by a pathologist independent of the assessment. The photographs were assessed for Ki67 LI by each participating breast pathologist using the counting system. Some examples of the photographs are shown in Figure 1.
To assess the agreement regarding Ki67 LI, the intraclass correlation coefficient (ICC) was estimated with a 95% confidence interval (CI). There is no universally accepted standard criteria for the ICC; hence, based on the similarity to the kappa coefficient, the following criteria using the lower limit of a 95% CI were used here to aid interpretation:[16, 17] the lower limit of ICC, 0.41–0.60 as “moderate correlation”; 0.61–0.80 as “substantial correlation”; and >0.80 as “almost perfect correlation.”
The Bland–Altman plot was used to assess the agreement between the two assessment systems because all the pathologists assessed Ki67 LI using the two assessment systems. All statistical analyses were performed using sas software version 9.2 (SAS institute, Cary, NC, USA).
The same 30 slides were used to analyze the concordance of the assessment for Ki67 LI among the different pathologists involved in this study, applying the counting and the scoring systems. The counting system demonstrated a better correlation of Ki67 LI among the six pathologists than the scoring system (ICC, 0.66 [95% CI 0.52–0.78] for the counting system, 0.57 [95% CI 0.42–0.72] for the scoring system) (Fig. 2a,b). To examine an intraclass correlation between the two assessment systems, scores (1–10) from the scoring system were multiplied by 10 and regarded as equivalent to the percentage using the counting system. The two assessment systems demonstrated a moderate correlation (ICC, 0.68 [95% CI 0.60–0.75]) (Fig. 2c).
The assessment of Ki67 LI between 5% and 25% in 20 slides using the counting system demonstrated a moderate correlation among the six pathologists (ICC, 0.68 [95% CI 0.50–0.81]) (Fig. 3). This result is equivalent to the result from Study 1 using the specimens with an unrestricted range of Ki67 LI.
Copies of 15 printed photographs for Ki67 LI in breast cancer tissues (Fig. 1) were sent to each pathologist at one time. The assessment of Ki67 LI using the counting system in the same photographs yielded an almost perfect concordance among the six pathologists (ICC, 0.94 [95% CI 0.88–0.97]), while the scoring systems showed a substantial concordance (ICC, 0.82 [95% CI 0.66–0.91]) (Fig. 4).
Ki67 LI is reported in a number of studies to demonstrate prognostic value for breast cancer patients.[3, 19] However, it has not been accepted as a routine biomarker, mainly because there is no standardization of the assay system.. The LI is also considered possibly predictive for the effects of chemotherapy, although different studies yield contrasting results.[4-11] Therefore, standardization of the Ki67 assessment system is crucial for the evaluation of the clinical utility of the marker and its clinical application. Therefore, in the present study, we evaluated the interobserver concordance of the Ki67 LI assessment. We demonstrated that the concordance was significantly higher with the counting system than with the scoring system (visual estimate) and that the concordance was substantial when the same field was assessed in printed photos. Therefore, the results of the present study did indicate that counting cells is useful for reproducible assessment and that an identification of the fields for the assessment is pivotal for the standardized assessment of Ki67 LI in breast cancer. Standardization of the assessment area selection can, therefore, is crucial for evaluating the clinical usefulness of Ki67 LI in breast cancer tissues.
The number of the cells to count has not been established when obtaining the Ki67 LI. In the majority of studies, 1000–2000 tumor cells were counted,[2, 7, 21-27] and, therefore, 1000 cells were counted in the present study. The International Ki67 in Breast Cancer Working Group recommended counting at least 1000 cells but they also accepted counting 500 cells, as the absolute minimum. However, to the best of our knowledge, studies evaluating the association between the number of counted cells and reproducibility have not been published and further study is required to determine the optimum cell number count to obtain the Ki67 LI.
The field to be assessed has also been controversial in obtaining the Ki67 LI. We assessed hot spots where Ki67 immunoreactivity in cancer nuclei was relatively dense, whereas an approach that assesses the whole section and records the overall average score was recommended by the Working Group. The Working Group also recommended that hot spots be included in the overall score even when the average score is chosen, and, therefore, the selection of hot spots is considered indispensable for the Ki67 LI assessment of breast cancer patients in a clinical setting. In addition, the results of the present study demonstrated that identification of the assessment fields is pivotal for the standardized assessment of Ki67 LI and it is possible that this could be expanded to the overall average score because the selection of the assessment area is also considered critical for the overall average score. This should be evaluated in further studies based on the results of the present study.
There are several possible factors that could lead to variability of the selection of hot spots. One possible factor is the presence of lymphocytes or other stromal cells, which could interfere with the estimation of the density of the immuno-positive carcinoma cells and result in the selection of inappropriate fields of hot spots. A second factor is the difference in carcinoma cell density from site to site in the same specimens, which could result in inappropriate estimation of the rate of positive cells in a particular field. A third factor is cytoplasmic or membrane immunoreactivity of Ki67, which should by no means be counted as positive but could influence the estimation of the density of positive cells. A fourth factor is the relative immunointensity, which could affect the assessment of immunopositivity. Finally, magnification to be used would affect the selection of the fields.
In the present study, we also attempted to assess the variability in relative immunointensity regarded as positive by using printed photos (Fig. 1). Considering the small variations of LI among the pathologists involved in this particular study, the variation of the threshold of immunointensity interpreted as positive is considered small in these printed photos. This should be further assessed with stained slides because printed photos may provide better contrast and clearer distinction between positive and negative.
In regards to the validation of an assay system, a number of issues other than the areas to be selected need to be considered, as specified by the Working Group, such as preanalytical and analytical validity, interpretation, scoring and data analysis. Tissue microarrays (TMA) are being more frequently used in various studies, especially for biomarker assessment in large clinical trials. The present study used the whole blocks from surgical pathology specimens, possibly providing larger areas for assessment than TMA. However, it is also true that routine assessment in clinical laboratories is performed using blocks from surgical specimens or core needle biopsy samples and, thus, it is critical to establish standardized methods to select the assessment fields in these sections.
In conclusion, the counting system yielded better concordance among the pathologists than the scoring system (visual estimate). The results of the present study suggest that appropriate identification of the fields to be assessed could be pivotal for obtaining accurate Ki67 LI of breast cancer tissue. Further study to standardize the selection of the hot spots among pathologists is necessary for the critical evaluation of the clinical value of Ki67 LI in breast cancer tissues.