Grading of oral squamous cell carcinomas – Intra and interrater agreeability: Simpler is better?

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. Journal of Oral Pathology & Medicine published by John Wiley & Sons Ltd 1Department of Medical Biology, Faculty of Health Sciences, UiT – The Arctic University of Norway, Tromsø, Norway 2Department of Clinical Pathology, University Hospital of North Norway, Tromsø, Norway 3Department of Oral Biology, Faculty of Dentistry, University of Oslo, Oslo, Norway 4Department of Pathology, Rikshospitalet, Oslo University Hospital, Oslo, Norway 5Department of Pathology, Haukeland University Hospital, Bergen, Norway 6The Gade Laboratory of Pathology and Center for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Faculty of Medicine, University of Bergen, Bergen, Norway 7Department of Otorhinolaryngology, University Hospital of North Norway, Tromsø, Norway 8Department of Clinical Dentistry, Faculty of Health Sciences, University of Tromsø, The Arctic University of Norway, Tromsø, Norway


| INTRODUC TI ON
Oral cavity cancer originates almost exclusively from squamous cells (SC), and the histopathological evaluation of these tumors is the basis for their classification and further treatment.
The prediction of outcome and the selection of treatment for patients with oral squamous cell carcinomas (OSCC) are today based on the clinical tumor, nodes, and metastasis (TNM) staging. Tumor thickness, as measured during microscopic evaluation, was recently implemented in the T (size) variable. 1 Further, according to the WHO classification of head and neck (HN) tumors, the tumor differentiation should also be reported in order to predict prognosis. 2 The histological grading does not take into account the tumor-host interactions that modulate tumor progression and aggressiveness although several factors such as inflammation are likely to influence on prognosis. 3,4 During the last decades, several histopathological grading systems for SC carcinomas in the HN region have been suggested and tested. The first grading systems only considered the morphological characteristics of the tumor, but later on, the tumor-host relationship also came into consideration. 5,6 For evaluation of tumor differentiation, nuclear polymorphism and keratinization have been important variables. 7,8 The characteristics of tumor invasion in the surrounding tissue have been implemented when evaluating the tumor-host relationship, as well as immune response (plasma-lymphocytic infiltration), vascular invasion, and perineural infiltration. 4,8 In particular, tumor budding (invading clusters of four or less tumor cells at the invasive front) has been proposed to be a simple and reliable prognostic marker for OSCC. 9 Reproducibility in the scoring of histopathological parameters is essential if they are to be used as prognostic markers.

| Observers and calibration
The observers were experienced pathologists/oral pathologist (TMS, EN, HL, ACJ, LUH, and SES), and two oral pathologist under training (DEC and EHO) from three university hospitals in Norway.
Prior to the scoring, all the participants had taken part in two calibration workshops to agree on how to interpret the parameters.
One of the observers performed only one round of scoring, and one observer scored thickness and depth only once. The interrater observations were all calculated on the first set of scoring allowing all eight observers to participate.

| Ethics
The study was approved by the Northern Norwegian Regional

| Statistics
Statistics was performed by using IBM SPSS statistics 24. We did statistical calculations both in percent agreement and Cohen's kappa (ĸ). The variability (spread of scoring) was low, and therefore, Cohen's kappa was of no/little value; thus, all correlations are given in percent.

| Intrarater and interrater agreement
The first nine parameters in Table 1 had three to five different scoring options, and they were all categorized into new groups with fewer options ( Table 2). The first seven were dichotomized, while the worst pattern of infiltration had two and three different scoring options (  Table 3.
Prior to categorization, perineural infiltration showed the highest intrarater agreement, whereas differentiation was most agreed upon after categorization.
Some variables had predefined categories that were not changed (Table 4). These had a mean intrarater agreement of 85.4% (range 79.2%-93.3%), and vascular infiltration and infiltration into deeper tissues showed the highest intrarater agreement.
In order to evaluate interrater agreement, two observers from different hospitals were paired randomly. The average interrater agreement was lower than the intrarater agreement for all variables (  Grading of epithelial dysplasia is poorly reproducible between observers, and to improve reproducibility, some advocate a binary system with only low-and high grade compared to mild, moderate, and severe dysplasia. 11 This has been evaluated in several studies on trials for oral epithelial dysplasia, but to a lesser extent in OSCC where the differentiation still is graded into well, moderate, and poor. In general, high-grade tumors (moderately and poorly differentiated) are related to higher degree of recurrence and shorter survival time than low-grade tumors. 12 Most OSCC are moderately or well differentiated, and the classification has not been found to correlate well with prognosis. 7 In this study, re-categorizing differentiation into low and high improved the reproducibility considerably.

| D ISCUSS I ON
In our study, the mean intrarater agreement was 70% for all variables before categorization compared to an interrater agreement of 48%. This indicates that the pairs of observers agreed on the tumor grading in less than half of the cases. Lack of reproducibility questions the reason for using a sophisticated grading system, and we, therefore, pooled scores into broader categories.
This increased the mean of intra-and interrater agreement to 85% and 71%, respectively, using the best score for lymphocytic infiltration (categorization 1) and the best score for worst pattern of infiltration (categorization 2). The increase of agreement was less pronounced in the intrarater (14.9%) than in the interrater group (23.0%).

| CON CLUS ION
To be of value, a tumor prognostic marker must be both reproducible and significantly associated with disease progression or survival. In this study, we have evaluated the reproducibility of a number of proposed histopathological prognostic markers in OSCC. Our findings suggest that simpler/uncomplicated scoring protocols will increase the reproducibility. However, we have not tested whether the new categorizations influence the prognostic value of the parameters. In our study, we included most of the previously proposed histopathological parameters and many observers, but a limited number of patient samples to avoid fatigue of the observers. We included tumors of different stages and from various intraoral locations; thus, the cohort was not suited for survival/prognostic analyses. The prognostic value of the revised categorization of the parameters should be tested in a larger, more homogenous cohort.

ACK N OWLED G EM ENTS
The publication charges for this article have been funded by a grant from the publication fund of UiT The Arctic University of Norway.
All authors have filled out the ICMJE form for disclosure of potential