Optimized p53 immunohistochemistry is an accurate predictor of TP53 mutation in ovarian carcinoma

Abstract TP53 mutations are ubiquitous in high‐grade serous ovarian carcinomas (HGSOC), and the presence of TP53 mutation discriminates between high and low‐grade serous carcinomas and is now an important biomarker for clinical trials targeting mutant p53. p53 immunohistochemistry (IHC) is widely used as a surrogate for TP53 mutation but its accuracy has not been established. The objective of this study was to test whether improved methods for p53 IHC could reliably predict TP53 mutations independently identified by next generation sequencing (NGS). Four clinical p53 IHC assays and tagged‐amplicon NGS for TP53 were performed on 171 HGSOC and 80 endometrioid carcinomas (EC). p53 expression was scored as overexpression (OE), complete absence (CA), cytoplasmic (CY) or wild type (WT). p53 IHC was evaluated as a binary classifier where any abnormal staining predicted deleterious TP53 mutation and as a ternary classifier where OE, CA or WT staining predicted gain‐of‐function (GOF or nonsynonymous), loss‐of‐function (LOF including stopgain, indel, splicing) or no detectable TP53 mutations (NDM), respectively. Deleterious TP53 mutations were detected in 169/171 (99%) HGSOC and 7/80 (8.8%) EC. The overall accuracy for the best performing IHC assay for binary and ternary prediction was 0.94 and 0.91 respectively, which improved to 0.97 (sensitivity 0.96, specificity 1.00) and 0.95 after secondary analysis of discordant cases. The sensitivity for predicting LOF mutations was lower at 0.76 because p53 IHC detected mutant p53 protein in 13 HGSOC with LOF mutations. CY staining associated with LOF was seen in 4 (2.3%) of HGSOC. Optimized p53 IHC can approach 100% specificity for the presence of TP53 mutation and its high negative predictive value is clinically useful as it can exclude the possibility of a low‐grade serous tumour. 4.1% of HGSOC cases have detectable WT staining while harboring a TP53 LOF mutation, which limits sensitivity for binary prediction of mutation to 96%.


Introduction
High-grade serous ovarian carcinoma (HGSOC) is the most aggressive histological type of ovarian carcinoma and pathogenic TP53 mutation is present in >96% of cases [1]. TP53 mutation is frequently present in fallopian tube precursor lesions suggesting that it is an early driver event [2][3][4]. HGSOC show remarkable intratumoural genetic heterogeneity characterized by divergent copy number abnormalities and frequent passenger substitutions -however owing to their presence in the ancestral clone, TP53 mutations are detectable in all subclones of an individual's HGSOC [5][6][7]. The ubiquitous presence of TP53 mutations in HGSOC provides an important diagnostic feature for small tissue biopsies, particularly in distinguishing low-grade serous carcinoma from HGSOC, and has been used for personalized disease monitoring using circulating tumour DNA in plasma samples [8][9][10][11].
Despite the clinical importance of TP53 mutation, rapid sequencing is not widely available and immunohistochemistry (IHC) remains the commonest method to infer TP53 mutational status. However, the accuracy of IHC as a predictor of TP53 mutation in ovarian carcinoma has not been precisely defined. Early studies showed only a modest correlation between p53 staining pattern and TP53 mutation although these results were based on limited somatic sequencing or only scored overexpression of p53 [12][13][14][15][16][17]. We have previously proposed a 3-tier scoring system to describe p53 staining in ovarian carcinoma: overexpression (OE), complete absence (CA) or wild-type (WT) [18]. OE is most commonly associated with nonsynonymous TP53 mutations, which interfere with MDM2-induced ubiquitination and degradation of p53, resulting in excessive p53 protein accumulation in the nucleus. CA is associated with nonsense mutations, which introduce a premature stop codon that triggers nonsense-mediated RNA decay, or indel and splice acceptor mutations that interfere with correct protein translation by introducing frame shifts or aberrant splicing. WT expression is characterized by a variable staining intensity in a variable number of tumour cell nuclei. Depending on the proliferation and maturation status of tumour cells, the number of variably intense staining nuclei can range from a few to even the majority. Interpretation with the 3-tier pattern increases the sensitivity of IHC and abnormal p53 staining was observed in 88-94% of HGSOC as compared to 14% of endometrioid ovarian carcinoma (EC) [18,19]. However, comparison of the 3-tier pattern to TP53 mutation in a study of 57 ovarian carcinomas showed that abnormal p53 expression predicted pathogenic mutation with a sensitivity of 94% but a specificity of only 38% [20].
The functional consequences of TP53 mutations have been divided into two classes with distinct biological effects (reviewed in [21]). Nonsynonymous mutations have been shown to induce gain of function (GOF) class effects including metabolic reprogramming, chromatin reorganization and increased motility and invasion [22,23]. Loss of function (LOF) class mutations, that include stopgain, frameshift and splicing mutations, have weaker tumourigenic effects than GOF mutations in genetically engineered mouse models [24,25]. Distinguishing between GOF and LOF mutations may be clinically important in HGSOC as LOF mutations have been associated with reduced overall survival [18,26,27]. In addition, the emergence of new clinical trials testing strategies to restore wild-type p53 conformation [28][29][30][31], or to reinstate protein translation of LOF mutations [32,33] emphasizes the need for robust and reliable IHC biomarkers for p53.
The primary aim of this study was to determine the sensitivity and specificity of IHC to predict the presence and class of TP53 mutation. We compared clinically relevant assays for p53 staining to next generation sequencing of tumour tissue as the gold standard reference. The secondary aim was to investigate misclassified cases to categorize TP53 mutations with unexpected patterns of p53 staining.

Study cohort and DNA extraction
This study was granted ethical approval REB15-0945. The study cohort was sourced from the Canadian Ovarian Experimental Unified Resource (COEUR) [19,34] and the Pathology Department of Calgary Laboratory Services (CLS) [35]. All cases underwent additional staining with histotype-specific IHC as part of detailed pathological review to assign the correct ovarian histotype [19]. Initial sequencing analysis was carried out on fresh-frozen tumour tissue from the COEUR cohort and formalin-fixed paraffin embedded (FFPE) tissue from the CLS cohort. DNA was extracted from fresh-frozen material as previously described [34]. For FFPE material, DNA was extracted from two 1 mm tissue microarray cores using the QIAam Micro DNA kit (Qiagen) following the manufacturer's protocol except that additional incubation with lysis buffer was performed at 958C for 15 min before adding proteinase K.

Tagged-amplicon sequencing
TP53 was sequenced starting with amplification of the entire coding sequence exon 2-11 of TP53 with flanking splice sites using tagged-amplicon sequencing with the Fluidigm Access Array 48.48 platform as described previously [9]. Tagged-amplicon sequencing libraries were sequenced on the Illumina HiSeq2000 or MiSeq platforms using paired-end 100bp reads (primer sequences available upon request). Sequencing data and variant verification were performed using an in-house analysis pipeline and IGV software as described [9,36].

Sanger sequencing
The TP53 coding sequences (exons 2-11) were amplified as described [37] with the following modifications: PCR reactions were performed in 25 ll, universal primers M13 forward and M13 reverse were incorporated into primer pairs and used to sequence in both the forward and reverse directions. To sequence exon 7, an alternative forward primer was used (TP53-7F: CAGGTCTCCCCAAGGCGC AC) to avoid a poly A tract downstream of the exon 7 forward primer described in [37] (CATCCTGGCT AACGGTGAAAC). Mutational analysis was performed using Mutation Surveyor Software version 4.0.4 (SoftGenetics) using default settings.

Immunohistochemistry
Tissue microarrays were constructed as described [34] using 0.6 mm cores from FFPE archival tumour tissue. Pathological scoring was performed by subspecialized gynaecological pathologists (MK, SL). Four methods for clinical immunohistochemical analysis of p53 staining were performed as described [18,19,34,38] using 4 mm sections from tissue microarrays and tumour blocks.

Statistical analysis
The diagnostic test performance of p53 IHC was quantified by calculating sensitivity, specificity and accuracy using TP53 mutation status as the reference. Two p53 IHC classifications were evaluated: (1) binary classification where abnormal or normal staining was compared to the presence or absence of a deleterious TP53 mutation and (2) ternary classification where OE, CA or WT staining was compared to GOF, LOF mutation classes or cases with no detectable mutation (NDM), respectively (GOF for any nonsynonymous mutation, LOF for any stopgain, indel or splicing mutation and NDM for normal or synonymous mutations). Cases that were not assessable for IHC were excluded from comparison. Cases with cytoplasmic staining were excluded from ternary classification. Statistical analyses were performed using the R statistical language [39] and classification analysis was performed using the caret package [40]. The complete statistical analysis is provided in the Supplementary Analytical File as a knitr document which fully reproduces the analyses [41].

Results
The study design selected HGSOC and EC cases for TP53 sequencing and p53 IHC from two ovarian car-cinoma cohorts available on tissue microarrays that had been subjected to detailed pathology review and immunophenotyping to accurately determine histotype. For the primary analysis, IHC and sequencing results were independently generated and the interpretation of mutation and staining pattern was blinded. Figure 1a shows the staining patterns recognized for p53 IHC scoring.

Immunohistochemical staining for p53
Four IHC assays commonly used for clinical reporting of p53 status were selected for comparison ( Table  2). Inspection of the 3-tier scores for p53 staining across 171 HGSOC and 80 EC cases on TMAs revealed systematic differences in the performance of the four assays ( Figure 3a and Supplementary Analytical File). Three cases showed strong cytoplasmic staining (CY) without nuclear overexpression and were scored separately (discussed below). Inspection of the IHC results for EC cases, where a low frequency of TP53 mutation was expected, showed that scoring for WT was highest for method 1 (91%, N573) and was progressively lower with methods 2-4 ( Figure 3a). For scoring CA, method 1 had the lowest frequency in EC (5%, N 5 4). For the HGSOC p53 immunohistochemistry predicts TP53 mutation status 249 Primary analysis of p53 IHC as predictor of TP53 mutation class The performance of p53 IHC in predicting TP53 mutation was tested both as (1) a binary classifier for    (2) as a ternary classifier where OE, CA or WT staining was compared to GOF, LOF TP53 mutations or NDM. The sensitivity and specificity results for these comparisons are shown in Figure 4 and Table 3 (see also Supplementary Analytical File). For binary classification, IHC method 2 and 1 had the highest sensitivity and there was a progressive increase in specificity from method 4 to method 1. For ternary classification, method 1 had highest sensitivity for GOF mutations and was markedly better for WT predictions ( Figure 4a). However, the sensitivity for method 1 for prediction of LOF mutation was markedly reduced compared to other methods. For both binary and ternary classification, method 1 had the highest overall accuracy (Figure 4b).

Secondary analysis
Twenty-three (9%) cases were discordant between the class of TP53 mutation and the IHC staining pattern (supplementary material, Table S2). These cases were subjected to an independent second analysis with repeat sequencing from new DNA samples and IHC staining on whole sections from the original tissue blocks using method 1. From 21 evaluable sequences, the mutation result was revised in two cases from nonsynonymous to NDM which was in agreement with the IHC staining. The IHC staining was revised in five cases from staining whole sections: WT to CA (N 5 1), WT to OE (N 5 1) and CA to WT (N 5 3) (supplementary material, Figure S3). The revised data for method 1 was then used to reestimate the classifier performance for method 1. The accuracy of the binary classifier to predict any TP53 mutation increased from 0.94 to 0.97 (Figure 4b) with sensitivity of 0.96 and specificity of 1.00. The accuracy to predict GOF and NDM increased (Table  3) but the sensitivity to detect LOF mutations remained low (0.76).

Discordant cases
Comparison of the data from primary and secondary analysis showed that 13 HGSOC cases that were predicted to have LOF mutations did not show the expected CA pattern ( Table 4). The observed staining patterns (OE, N 5 6; WT, N 5 7) were consistent across multiple experiments, suggesting that these results did not simply arise from mistakes in interpretation.
Comparison of the mutation and staining data provided partial explanations for why p53 protein was detectable (Figures 1a and 5). For example, two unrelated cases had an inframe p.I255del1 mutation that did not alter the reading frame. The observed OE pattern in these cases suggests this mutation may give rise to a nonsynonymous conformational change.  Other LOF mutations predicted mutant p53 protein with lengths >250 amino acids ( Figure 5). WT staining was observed in nine HGSOC (5.3%) of which two (1.2%) did not have a detectable TP53 mutation. Both cases expressed WT1 as evidence of serous cell lineage and showed non-specific solid architecture and moderate nuclear atypia by morphology (supplementary material, Figure S4).

Cytoplasmic staining
Cytoplasmic staining (CY) without nuclear overexpression of p53 was observed in four (2.3%) cases of HGSOC and these results were confirmed on full section staining (supplementary material, Figure S5). For two cases, CY staining was also confirmed in independent specimens collected at recurrence. Other nuclear markers (PAX8, WT1, ER) assessed on all four cases did not show any evidence of abnormal staining, excluding the possibility that CY staining for p53 could have arisen artefactually from

Discussion
These results show that use of an optimized p53 IHC assay is an accurate predictor of the presence and class of TP53 mutations in ovarian carcinoma with high specificity and sensitivity for prediction of GOF mutations and NDM status. The observed TP53 mutation frequency for HGSOC cases was 99% compared to 94-97% in previous studies [1,42]. This is likely to be a result of the higher sensitivity of targeted NGS sequencing [9] and improvements in experimental design in this study (including stringent sample selection, manual review of mutation calls and secondary analysis of discordant cases). We observed significant differences in accuracy for the four clinical IHC assays tested (Figure 4). Method 1 had the highest intensity of staining ( Figure 3) and had less misclassification. Previous studies have focused on OE as the most important determinant of abnormal p53 staining and p53 IHC has not been optimized for the lower cut-off needed to distinguish WT from CA. Across the different methods weakly stained WT cases were frequently misclassification as CA causing false positive mutation predictions. Weakly stained assays performed without intrinsic controls cannot reliably dis-tinguish CA from WT. Therefore, we propose that use of intrinsic control cells provides an internal reference for IHC scoring. Despite a common belief that p53 IHC cannot detect p53 wild type protein, the DO7 antibody used in Method 1 robustly detects p53 expression in normal cells including stromal fibroblasts and lymphocytes when used with recent improvements in polymer-based IHC detection systems. It is possible that p53-positive intraepithelial lymphocytes in a CA case could be falsely read as p53 WT tumour cells, particularly since some BRCA1 and BRCA2 HGSOC can have a very dense intraepithelial lymphocytic infiltrate [43]. Yet we believe that a titration towards a stronger staining IHC assay is preferred, not only for better interpretation at the lower cut-off, but also because of better distinction of OE from high WT cases at the upper cut-off. Because stronger staining moves that interpretation away from a cut-off definition by % positive tumour cells towards a pattern interpretation with virtually all tumour cells strongly staining in OE versus the variable intensity with some negative tumour cells seen in WT. Our data strongly support the contention that further assay comparison and training in interpretation are needed for p53 IHC to be used as a diagnostic and predictive test [44].
Our data show that optimized p53 IHC can have 100% specificity for binary classification of pathogenic TP53 mutation in ovarian carcinoma. This is a remarkable increase compared to 38% reported previously [20]. In the prior study, 7/30 cases with OE and 6/17 cases with CA had NDM, which probably resulted from only sequencing exons 4-9 of TP53 or differences in IHC interpretation. We reduced the number of false positives (cases in which p53 IHC predicts mutation but mutation is absent) for three main reasons: first, we improved identification of TP53 mutation through sensitive NGS of the entire coding sequence, second, we improved identification of p53 IHC patterns corresponding with the TP53 mutation by optimized IHC, and third, we improved    In the initial sequencing analysis, we already performed quality control for cases failing sequencing and HGSOC with NDM. We repeated DNA extraction from new cores taken from FFPE blocks. By doing so, we avoided the pitfall of simply not evalu-ating tumour tissue. This resulted in an additional identification of six TP53 mutations in eight initially NDM HGSOC in a second round of NGS. Short read NGS, which is thought of having lower sensitivity for indel detection, performed well. Only one splice acceptor mutation (c.356-2delA) was missed by NGS and only detected by Sanger sequencing (but was CA by IHC and therefore would have been caught in the secondary analysis). The secondary analysis was performed for NGS-IHC discordant cases and included DNA re-extraction, re-sequencing and full section IHC. After knowing the mutation and p53 IHC status, the interpretation of TP53 sequencing data was revised in 2/21 cases and p53 IHC data in 5/23 cases. Three EC with NDM showed CA on the tissue microarray cores. We did not consider the intrinsic control in the initial interpretation, which was absent in these cases. On full section, the central areas including the TMA punch holes also showed CA without intrinsic control but there was WT towards the edges of the section. Heterogeneous p53 expression is likely caused by antigen degradation due to delayed fixation of the tissue center. To avoid such pitfalls in interpretation, we recommend that only cases with intrinsic control present should be interpreted. The secondary analysis increased the accuracy of p53 IHC as binary or class predictor of the TP53 mutation by 3.6% each. These examples illustrate how both methods can complement each other. A discordance forces re-evalution of both assays. For the most accurate assessment of mutation status currently within the realm of clinical trials, we recommend a combination of NGS and IHC.
Although the secondary analysis resolved issues with discordant GOF cases or NDM cases, it did not improve performance for the LOF mutation class where 13 cases (24%) did not stain as CA. This limits the overall sensitivity of p53 IHC for the binary TP53 mutational status to 96% and the overall accuracy for the class of mutation to 94.7%, the latter is still higher compared to 83% in a previous study [20]. Detailed review of these cases show that the majority have detectable p53 expression owing to 3 0 mutations. Stopgains associated with CA occur before amino acid 213 while stopgains associated with WT occur after amino acids 245. Because the DO7 antibody used in method 1 recognizes the N-terminal region between amino acids 19 and 26 we speculate that early stopgains are subjected to nonsense-mediated RNA decay while later stopgains are resulting in expression of truncated p53 protein. In other cases indels or splicing mutations resulted in OE. An in frame indel is likely having the same conformational effect as nonsynonymous mutations. In two cases, splice site mutations in direct proximity result in CA and OE. It has been reported that slight changes in location of mutation can have different effects on the alternative splicing process leading to expression of alternative splicing variants [45]. P53 IHC is therefore an essential additional method to sequencing to understand the functional effects of TP53 mutations. p53 IHC further subclassifies LOF into true LOF with CA versus truncating mutations with WT versus putative LOF with OE that may be better classified as GOF.
We have also identified a wider pattern of abnormal p53 expression as we observed CY staining in four (2.3%) of HGSOC with complete interobserver agreement. Since OE cases can show minor amounts of CY it is important to note that CY should only be reported in the absence of strong nuclear expression. Hence, only cases for which the question is WT versus CA that show prominent cytoplasmic staining should be considered for CY. Although CY staining can be artefactual this is an unlikely interpretation of our data as (1) the same p.R306X mutation was detected in two unrelated cases that both showed CY and not in any other case (2), CY was observed in paired primary and recurrence specimens and (3) CY was confirmed on full sections. Cytoplasmic localization of mutated p53 has been reported before in colorectal carcinoma [46]. All four mutations associated with CY in our series were indels and stopgains resulting in predicted p53 protein of 292-306 aa length truncated the protein before the nuclear localization domain. However, we observed WT staining for other truncating mutations that resulted in similar protein lengths and there may be alternative mechanisms for cytoplasmic localization. It has been reported that a p.K382fs mutation resulting in a 420 aa protein showed cytoplasmic localization owing to impaired binding to importin. This occurred because of conformational changes from the additional 27 aa and not from any alteration in the nuclear localization domains [47]. In addition, specific p53 mutants may undergo post-translational modifications that can stimulate nuclear transport and/or mitochondrial association, promoting cytoplasmic accumulation. It is important to note that p53 normally shuttles between the nucleus and the cytoplasm, and cytoplasmic functions of p53 are well documented. However, cytoplasmic sequestered p53 cannot exert its nuclear function [48] and CY is likely to be indicative for LOF effects.
For diagnostic pathology, identifying TP53 status in ovarian carcinoma has critical clinical utility: distinguishing HGSOC from low-grade serous carcinoma on small tissue biopsies before commencing neoadjuvant chemotherapy [10], identification of STIC [49] and sub-classification of ovarian carcinomas for inclusion in histotype-specific clinical trials [19]. Our results are transferable to other tumour sites, for example endometrial carcinomas or adenocarcinomas of gastroesophageal junction [50,51] although the interpretation of p53 IHC may not be straightforward for tumours showing longer periods of terminal differentiation allowing for degradation of nonsynonymously mutated p53 protein and interpretation rules may have to be adjusted [52]. For current clinical practice, which relies mostly on IHC without access to sequencing, a diagnostic limitation for p53 IHC should be kept in mind. Nine (5.3%) of HGSOC in our revised series showed WT staining and 7 (4.1%) cases harboured an underlying LOF mutation. Importantly, this means that the finding of WT p53 IHC, particularly in small biopsies, cannot solely be used to diagnose low-grade serous tumours. NGS should always be considered in WT IHC with morphological features suggesting HGSOC. However, the major strength of p53 IHC as a clinical test is its high negative predictive value as abnormal p53 IHC virtually excludes the possibility of a lowgrade serous tumour [10].
Only two HGSOC remained wild type by both sequencing and IHC after secondary analysis suggesting that TP53 wild type HGSOC are rare (1%). Some authorities even question a diagnosis of HGSOC if there is no evidence for TP53 mutation [53]. The classification of our wild type HGSOC cases remains uncertain. These tumours represent a rare subset which should be studied to establish whether other mechanisms such as MDM2 amplification can lead to an alternative pathway of HGSOC oncogenesis [1]. WT staining in these cases effectively excludes the possibility of homozygous deletion of TP53. The prevalence of TP53 mutations in EC was 8.8% in our series, which is similar to reports for endometrioid carcinomas of the ovary (7%) [54] and endometrium (9%) [55] but remarkably lower compared to 51% (N 5 37/72) from previous reports that included high-grade carcinomas, which are now classified as HGSOC [56]. This further underscores the importance of accurate disease classification for study inclusion, which we performed using diagnostic IHC marker panels [19].
Our results show that optimized p53 IHC assay, when interpreted correctly, can be a useful surrogate for the TP53 mutation status. The combination of p53 IHC and sequencing should be considered the gold standard in assessing the p53 functional status for clinical trial inclusion.