Considerable interlaboratory variation in PD‐L1 positivity for head and neck squamous cell carcinoma in the Netherlands— A nationwide evaluation study

Patients with recurrent or metastatic head and neck squamous cell carcinoma (HNSCC) are eligible for first‐line immune checkpoint inhibition if their tumour is positive for programmed death ligand 1 (PD‐L1) determined by the combined positive score (CPS). This nationwide study, using real‐world data, investigated the developing PD‐L1 testing landscape in the first 3 years after introduction of the test in HNSCC and examined interlaboratory variation in PD‐L1 positivity rates.


Introduction
Head and neck cancer is the seventh most common cancer worldwide. 1Despite large variation between tumour sites, most head and neck squamous cell carcinoma (HNSCC) patients are diagnosed with locoregionally advanced cancer. 2,3These patients are at high risk for local recurrence (15-40%) and distant metastasis (3-52%). 2,4,5embrolizumab is a monoclonal antibody against membrane protein programmed cell death protein 1 (PD-1).It blocks the interaction between PD-1 on T cells and programmed cell death ligand 1 (PD-L1) on tumour cells, stimulating the antitumour immune response. 68][9] In the Netherlands it was approved as a second-line treatment in December 2019 and as a first-line treatment in April 2020. 10,11pproval was based on the KEYNOTE-048 study.This Phase III trial randomly allocated patients to treatment with pembrolizumab alone, pembrolizumab plus platinum and 5-fluorouracil, or the same chemotherapy plus cetuximab. 12In a selected group of HNSCC patients, whose tumours had high PD-L1 expression [combined positive score (CPS) ≥ 20], pembrolizumab alone improved overall survival by 4.2 months compared to chemotherapy plus cetuximab. 12nfortunately, the response rate of pembrolizumab in HNSCC is estimated at only 16-18%. 13,14Selection for pembrolizumab in HNSCC relies upon PD-L1 expression by immunohistochemistry (IHC), with CPS as the recommended scoring method. 7CPS is calculated as the number of PD-L1-positive tumour cells, lymphocytes and macrophages divided by the total number of viable tumour cells, multiplied by 100. 15Guidelines indicate that if the CPS ≥ 1 patients are offered pembrolizumab as monotherapy or in combination with chemotherapy. 7,8,16In clinical practice, if the CPS ≥ 20 pembrolizumab is offered as monotherapy. 128][19] The 22C3 pharmDx kit is the only FDA-approved PD-L1 IHC test in HNSCC. 7[22] High interlaboratory variation in PD-L1 assessment can lead to local discrepancies in PD-L1 results, potentially denying patients beneficial treatment or to be exposed to unnecessary toxicity.
This nationwide study evaluated the first 3 years of PD-L1 assessment in HNSCC in the Netherlands using real-world pathology data and captures the complexities and nuances of clinical practice.First, developments in PD-L1 assessment were investigated per year, as this reflects how new predictive tests are implemented and interpreted in actual patient care settings.Secondly, we examined the interlaboratory variation in PD-L1 positivity rates in routine clinical practice.

Materials and Methods
Retrospectively, all pathology reports in the Netherlands containing (synonyms of) squamous cell carcinoma (SCC), PD-L1 and a head and neck location between 20 December 2019 and 19 December 2022 were obtained from Palga, the Dutch nationwide pathology databank.The Palga foundation is the nationwide network and registry of histo-and cytopathology in the Netherlands. 23It has the unique feature that it contains all pathology reports in the Netherlands since 1991. 23The study was approved by Palga's scientific council and privacy committee.All data were handled in accordance with the general data protection regulation.
Additionally, for the selected cases all historical pathology reports containing the term 'SCC' and reports from the head and neck region from 2014 onwards were analysed.The following variables were extracted: age, sex, primary tumour location, tumour grade, type of specimen, source of material, antibody and staining protocol (commercial assay or LDT), number of tests, scoring method applied, CPS (where available), pseudonymised laboratory number and if the laboratory was part of a designated HNSCC care centre. 24Tests on cytological material were excluded from the analyses, as this material is not suitable for CPS assessment.
To evaluate PD-L1 assessment in the first 3 years of testing in HNSCC, all valid PD-L1 tests on histological material were included.Pearson's v 2 tests were performed to assess differences for categorical data between the 3 years.
To study interlaboratory variation in PD-L1 positivity rates, only tests with CPS as the stated scoring method were included.In case of multiple concordant tests on the same material, one test was included (if available, the 22C3 assay).Tests on the same material with discordant results were all excluded.Only the first PD-L1 test was included in patients with multiple PD-L1 tests on different material, or multiple for PD-L1-tested primary HNSCCs.A minimum of 10 tests per laboratory was set for the interlaboratory analysis.
Test results were dichotomised for the two clinically relevant cut-offs of CPS ≥ 1 and ≥ 20.Differences between negative and positive subgroups were analysed with an independent Student's t-test for continuous variables and Pearson's v 2 tests for categorical data.P < 0.05 was considered significant.Per laboratory, the percentage of PD-L1-positive tests was determined for both cut-offs.To evaluate differences between laboratories and to the national mean, funnel plots were created with the mean national percentage of PD-L1 positivity with its 95% confidence limits for proportions. 25The percentage of positive tests per laboratory was plotted against the total number of tests assessed by the same laboratory.Laboratories outside the 95% confidence limits were considered to differ significantly from the mean.
PD-L1 positivity rates of individual laboratories were adjusted for case mix using multivariate logistic regression analysis.The included variables were age, sex, primary tumour location, type of specimen and source of material.Grading and IHC antibody/protocol were not included in the model, because in a considerable group of patients (32.4 and 18.0%, respectively) this information was not mentioned in the pathology reports.Case mix-adjusted positivity rates were obtained by dividing the observed percentage of PD-L1-positive tests by the expected percentage established by the multivariate logistic regression model, and then multiplied by the national mean percentage of PD-L1 positivity. 25IBM SPSS Statistics version 28 was used for statistical analyses.

Results
The flowchart for both research questions is presented in Figure 1.A total of 817 PD-L1 test results from 702 HNSCC patients among 19 laboratories were registered during the study period; 804 (98.4%) tests were performed on histological material and 13 (1.6%) on cytological material.Of the tests on histological material, 685 (85.2%) tests were stated to be positive, compared to six (46.2%) tests on cytological material.In only four tests on cytological material was a CPS reported.
Figure 2 depicts changes in the Dutch PD-L1 assessment landscape for histological material in the first 3 years after its introduction in HNSCC.The number of PD-L1 tests increased over the years from 172 to 366 tests (Figure 2A).Use of the CPS, the recommended scoring method, increased from 94.7 to 98.6% (Figure 2B).During the 3-year period, 685 (85.2%)PD-L1 tests were reported as positive, which included tests with CPS or tumour proportion score (TPS) ≥ 1 and tests reported as positive without a stated scoring method.The percentage of positive tests differed significantly between the 3 years ranging between 79.7 and 89.9% (P = 0.001, Figure 2C).Primary tumour location (P = 0.109, Figure 2D) and source of the tested material (P = 0.382, Figure 2E) showed no significant differences between the years; Figure 2F presents the used IHC antibody/protocol.Use of the 22C3 antibody increased from 59.9 to 74.3%.Male/female ratio and type of histological specimen (biopsy/resection) were comparable between the 3 years (P = 0.249 and P = 0.400, respectively).
Of the 786 tests on histological material with a CPS score, 675 (85.9%) were positive for CPS ≥ 1 and 346 (43.0%) for CPS ≥ 20.Multiple PD-L1 tests (n = 67 tests) were performed 33 times on the same specimens.Two specimens had discordant test results (6.1%).An SP263 assay was performed 31 times alongside a 22C3 assay.In 52 patients multiple PD-L1 tests (n = 106) were performed on different histological specimens.In 32 (61.5%) patients the tests received the same CPS category.
Ultimately, 673 patients and tests among 12 laboratories were included in the interlaboratory variation analysis.Patient characteristics are presented in Table 1.Proportions in PD-L1 positivity rates differed significantly between the different head and neck locations for both cut-offs, with the oral cavity reporting the highest PD-L1 positivity rates.Females had a significantly higher positivity rate for CPS ≥ 20 and biopsies (compared to resections) for CPS ≥ 1.
The mean national proportion of PD-L1 positivity for CPS ≥ 1 was 85.4%.Individual laboratory case mix-adjusted positivity rates ranged from 45.5 to 97.0%, resulting in a maximum variation of 51.5% between laboratories.Four (33%) laboratories deviated significantly from the mean (Figure 3A).
For CPS ≥ 20, the mean national proportion of PD-L1 positivity was 43.8%.Individual laboratory case mix-adjusted positivity rates ranged from 30.9 to 55.9%, resulting in a maximum variation of 25.0% between laboratories.Two (17%) laboratories differed significantly from the mean for both cut-offs (Figure 3B).All 12 laboratories were part of a designated HNSCC care centre.

Discussion
This study evaluated the first 3 years of PD-L1 assessment in HNSCC and investigated interlaboratory variation in PD-L1 positivity rates, using real-world population-based data from the Dutch nationwide pathology registry (Palga).We observed considerable variation in PD-L1 positivity rates between laboratories, even after correcting for case mix.This variation might be problematic, as it could lead to local differences in patient selection for pembrolizumab treatment.
Overall, 1.6% of all PD-L1 tests in HNSCC were performed on cytological material.Only 46.2% were regarded positive compared to 85.2% of the tests on histological material.This aligns with earlier findings, where the negative predictive value of a negative aspirate cell block or small biopsy was only 28%. 26ytological PD-L1 testing in HNSCC is not advised due to the higher likelihood of false negatives.Contributing factors may include differences in fixation methods but, more essentially, scoring of tumour associated infiltrates for CPS determination in cytological samples is unreliable. 19,27he use of CPS increased from 94.7 to 98.6% during the study period.From December 2019 to April 2020 pembrolizumab was officially only available as second-line treatment in the Netherlands for HNSCC patients with TPS ≥ 50. 10,11As nivolumab was available during these months for the same indication, but without the requirement of PD-L1 testing, few PD-L1 tests were performed. 10In practice, most pathology reports from this period already stated a CPS (10 of Use of the 22C3 antibody increased.More laboratories used 22C3 LDTs, while the use of the 22C3 pharmDx assay seemed to decrease.We found no significant differences in PD-L1 positivity rates between the different antibodies and protocols.However, information on antibody and platform was not reported in 18.0% of the cases.In literature, some studies in HNSCC reported an almost perfect agreement between different assays, 17,22,28 while others contradicted these results. 21To our knowledge, the effect of different assays in PD-L1 testing in HNSCC has never been investigated in a real-life clinical setting.Unfortunately, no definitive conclusions could be drawn regarding the actual influence of the assays on interlaboratory variations.Sompuram et al. investigated the lower limit of detection for 59 PD-L1 assays among 41 different laboratories using quantitative IHC Nationwide evaluation of PD-L1 assessment in HNSCC 137 calibrators. 29The 22C3 pharmDx was comparable with most 22C3 LDTs, but the 22C3 assays were, in general, less sensitive compared to SP263 assays. 29It is possible that differences in analytical sensitivity of individual assays contributed to the interlaboratory variations found in our study.As presented in Figure 3, the range of the case mix-corrected individual positivity rates per laboratory was an alarming 51.5% for CPS ≥ 1 and 25.0% for CPS ≥ 20.A contributing factor might be interobserver variation among pathologists.Few studies have investigated interobserver variation in HNSCC for PD-L1 with the current cut-offs for CPS.Nuti et al. reported an average interobserver agreement at CPS ≥ 1 of 94.1% and an average interobserver agreement at CPS ≥ 20 of 86.5% in a worldwide study. 18The precision study thast the FDA approved the 22C3 pharmDx on reported an overall percentage agreement (OPA) of 95.7% for CPS ≥ 1 and an OPA of 92.1% for CPS ≥ 20, scored by three pathologists at three external sites. 30Other studies compared the continuous CPS between pathologists and found good inter-rater correlation for different assays. 22,28lthough these studies overall report good interobserver agreement, it is likely that in clinical practice these small differences in agreement between pathologists contributed to the differences in positivity rates found between laboratories.
There was a large difference in the national positivity rate between years 2 and 3 (79.7 and 89.9%, respectively).This seemed to be a national trend, as 10 of the 12 laboratories with more than 10 PD-L1 tests had a higher positivity rate in year 3.We found no clear explanation for the increase, as patient and tumour characteristics were comparable between the 2 years.We consider that pathologists, when in doubt, might tend to favour a positive assessment, as this will not deny a patient access to pembrolizumab.Further research might elucidate if this finding is temporary.
This study has some limitations.Although we attempted to correct the interlaboratory variation for case mix, not all variables that might influence PD-L1 results are mentioned in pathology reports.2][33][34] It is also possible that differences in (pre-)analytical variables (e.g. in acquisition and processing of histological material) contributed to Nationwide evaluation of PD-L1 assessment in HNSCC 139 interlaboratory variations.In 18.0% of cases information was missing on both antibody and protocol, making it impossible to include these factors in the multivariate logistic regression model for case mix correction.Nevertheless, because we were able to correct for all patient and tumour characteristics that exhibited significant differences between the various CPS categories in Table 1, and with the 95% confidence limits applied, we deem it unlikely that outliers of the interlaboratory analysis are representative of true geographic differences in the population.
Pathologists and clinicians should be aware of the existing interlaboratory variation in PD-L1 testing and their implications.Future emphasis should be on guidelines for uniform testing and additional quantitative methods for assay standardisation, such as IHC calibrators. 29,35Analytical standardisation in IHC can compensate for analytical drift that is inherent in all assays and is due to random changes in test environment, reagents, instruments or protocols. 35Ideally, these standardisations are validated within clinical trials, so the desired sensitivity of the PD-L1 assay is known and can be applied.
In addition, training of pathologists could be helpful to increase interobserver agreement for PD-L1 scoring and subsequently decrease interlaboratory variation. 18Training can further harmonise CPS scoring between pathologists and raise awareness of the preferred scoring method and antibody/ protocol. 36,37Additionally, after completing the training, repeated exposure to PD-L1 scoring can help in gaining and maintaining expertise, which also has a positive effect on interobserver agreement in HNSCC. 36In the current study it was unknown how many pathologists scored the PD-L1 tests and how much experience they had had with PD-L1 scoring in HNSCC.
In the near future, digital image analysis (DIA) is a promising technique to enhance and standardise the evaluation of biomarkers such as PD-L1. 38Deep learning algorithms can be trained to recognise the various types of cells relevant for the CPS and assist pathologists in providing a fast, reproducible and more objective result.In several studies outside HNSCC, PD-L1 DIA performed comparable to or better than manual scoring in predicting patients' response to therapy. 39,40However, generalisability of these algorithms might still be problematic, given the differences in workflow between laboratories in real-world practice. 41Moreover, the challenges encountered by the algorithms in distinguishing immune cells and identifying potential artefacts are the same as those encountered by pathologists. 37,39cluding clinical outcome as one of the benchmarks for algorithms can facilitate a more clinically meaningful prediction. 37,42Furthermore, for development, optimisation and validation of DIA algorithms, large cohorts with outcome data are essential before these algorithms can be implemented in daily practice. 41esults of the individual laboratories were shared with the laboratories to raise awareness of the interlaboratory variation and to allow laboratories to compare their PD-L1 testing performance to the nationwide results.
In conclusion, in the 3 years after introduction, the Dutch landscape of PD-L1 assessment in HNSCC became more uniform.However, there is still considerable interlaboratory variation in PD-L1 positivity rates, especially for the CPS ≥ 1 cut-off.This indicates the need for further test standardisation and training of pathologists.

Figure 2 .
Figure 2. Development of different characteristics of PD-L1 assessment on histological material in the first 3 years after introduction of the test in HNSCC (n = 804 tests).CPS, combined positive score; HNSCC, head and neck squamous cell carcinoma; IHC, immunohistochemistry; LDT, laboratory developed test; PD-L1, programmed cell death ligand 1; ?, IHC protocol unknown.

Figure 3 .
Figure 3. Funnel plots presenting the Dutch interlaboratory variation in PD-L1 positivity rates in HNSCC for the PD-L1 CPS ≥1 (A) and CPS ≥ 20 (B) cut-offs.The case mix-adjusted positivity rates are displayed per laboratory (dots) and are plotted against the total number of PD-L1 tests performed by the same laboratory on the x-axis.The positivity rate per laboratory is shown in relation to the mean national proportion of PD-L1 positivity (black line) surrounded by its 95% confidence limits (dotted lines).PD-L1, programmed cell death ligand 1; CPS, combined positive score; HNSCC, head and neck squamous cell carcinoma.

Table 1 .
Patient and test characteristics for total population and for the two different PD-L1 cut-offs (PD-L1 CPS < 1 versus ≥ 1 and < 20 versus ≥ 20) on histological material unknown.Bold text denotes a statistically significant difference with a p-value <0.05.