Positron emission tomography/computed tomography outperforms MRI in the diagnosis of local recurrence and residue of nasopharyngeal carcinoma: An update evidence from 44 studies

Abstract Studies on nasopharyngeal carcinoma (NPC) in five electronic databases were systematically searched online from the inception to June 5, 2018. Quality of the included studies was assessed using the updated Quality Assessment of Diagnostic Accuracy Studies 2. Data of sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic odds ratio, and the 95% confidence intervals were pooled using a bivariate random‐effect model. Forty‐four studies with 61 groups of data and totally 3369 patients were included in the qualitative and quantitative synthesis analysis. The overall estimated sensitivity and specificity of positron emission tomography/computed tomography/magnetic resonance imaging (PET‐CT/MRI) for local recurrent/residual NPC were 0.90 and 0.85, respectively. The pooled area under the curve of (AUC) of PET‐CT/MRI in the summary receiver operator characteristic curve was 0.94. Subgroup analysis showed MRI vs PET‐CT had lower sensitivity (0.83 vs 0.92) and specificity (0.78 vs 0.89). The AUCs of MRI and PET‐CT were 0.87 and 0.96, respectively. No‐cross of 95% CI was found in MRI vs PET/CT (0.87‐0.90 vs 0.94‐0.98). Meta‐regression showed PET/CT vs MRI was a potential source of heterogeneity. PET/CT and MRI both showed quite high overall ability in diagnosing local recurrent/residual NPC, but the subgroup analysis indicated PET‐CT was superior over MRI in diagnosis of local recurrence and residue of NPC after radiotherapy. The examination methods affected the heterogeneity within studies.

suppressor genes. [2][3][4][5] Due to the specific anatomical structure and position, NPC is preferentially treated by radiotherapy, 6,7 which has greatly improved the remission rate of NPC treatment and raised the overall average of 5-year survival rate to over 70%. 8 However, residues, local recurrence, and metastasis still impede the prognosis of NPC patients and limit further improvement in survival. Thus, it is of great importance to accurately and early identify the residues and recurrence of NPC.
However, some side effects would appear after radiotherapy, such as edema, inflammation, fibrosis, and scar. 9 The resulting morphological changes could make traditional examination methods such as computed tomography (CT) and magnetic resonance imaging (MRI) insensitive to recurrence and residues and cause false positive or negative diagnosis. 10 In recent years, 18F-fluorodeoxyglucose (or 18-fludeoxyglucose; 18F-FDG) positron emission tomography (PET)/ CT has been implemented. The perfect combination of CT morphological imaging and PET functional metabolic imaging has increased the sensitivity and specificity to lesions. Moreover, the overall diagnostic value of MRI and PET/ CT in diagnosing local residual and recurrent NPC has been summarized, but this review only includes 14 studies. 11 In the current study, we systematically searched several online databases and included 44 studies involving 61 groups of data in order to more accurately estimate the diagnostic ability of PET/CT and MRI for local recurrent and residual NPC.

| METHODS
This study follows the Cochrane Handbook for Systematic Reviews and the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA, Data S1). 12 No ethical approval was applicable for this secondary study based on previous articles.

| Search strategy
PubMed, Web of Science, EMBASE, China National Knowledge Infrastructure, and Wanfang were systematically searched online from the inception to June 5, 2018. The following medical subject heading terms and keywords were used: ("nasopharyngeal carcinoma" OR "nasopharynx cancer OR "NPC") AND ("positron emission tomography" OR "PET" OR "PET/CT" OR "PET-CT" OR "18-fluoro-2-deoxyglucose positron emission tomography" OR "18F-FDG PET/CT" OR "MRI" OR "magnetic resonance imaging" OR "nuclear magnetic resonance scanner" OR "magnetic resonance angiography") in combination with some keywords: recurrent or recurrence, residue, diagnosis or diagnostic (sensitivity and specificity), receiver operating curve or ROC. The references of some reviews and articles were also reviewed in order to obtain the potentially eligible trials. Languages were restricted to Chinese and English.

| Study selection
Two authors independently scanned and screened the titles, abstracts, and full texts of the initially retrieved studies. Disagreements were resolved by explicit consensus. The inclusion criteria were as follows: (a) Recurrence and residues of NPC were confirmed by golden standard (biopsy or follow-up); (b) the aim was to assess the diagnostic ability of PET/CT or MRI or both for recurrence or residues or both of NPC; (c) enough data were provided for further pooling analysis, including true positive (TP), false positive (FP), false negative (FN), and true negative (TN). For duplicates, the latest publication was used. Studies with duplicated or unqualified data, or focused on animal or experimental design were excluded. Reviews, comments, letter, and case reports were also excluded.

| Data extraction
Two authors independently extracted data and resolved discrepancies by mutual discussion. From each included study, the following information was extracted: surname of first author, year of publication, country, examination method (PET/ CT vs MRI), study design (prospective vs retrospective), age (range, mean or median), time of examination, golden standard (biopsy vs follow-up), sample size, four folds data (TP, FP, FN, TN), sensitivity, and specificity. The extracted data were put into a standardized Excel sheet.

| Assessment of quality
Quality of the included studies was assessed using the updated Quality Assessment of Diagnostic Accuracy Studies 2, which consists of two parts: risk of bias and applicability concerns. The risk of bias includes four items: patient selection, index test, reference standard flow, and timing. Each item has three options: high, unclear, and low. A study with ≥1 item scored "high" is considered as high risk of bias, whereas a study with all items scored "low" is treated as low or unclear risk of bias. The applicability concerns consist of three options: high, low, and unclear. 13

| Statistical analysis
Statistical analyses were performed on Stata 13 (StataCorp LP, College Station, TX, USA), and the quality was assessed on Review Manager 5 (Nordic Cochrane Centre, Cochrane Collaboration, 2014). Firstly, the threshold effect was evaluated by Spearman correlation coefficient, significant value of which means the existence of the threshold effect. [14][15][16] Heterogeneity was evaluated by chi-square and I 2 statistic, with the significance level at P < 0.05 or I 2 > 50%. 17 Sensitivity, specificity, positive likelihood ratios (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR), and the 95% confidence intervals (CIs) were pooled using a bivariate random-effect model. 18 The summary receiver operator characteristic curves (SROCs) were also plotted. Subgroup analyses were conducted by population (China vs other countries), sample size (≤45 vs >45), examination methods (MRI vs PET/CT), study design (prospective vs retrospective), and golden standard (biopsy, follow-up, or both). The potential influencing factors of heterogeneity were explored through meta-regression involving the variables of publication year, country, examination methods, study design, golden standard, and sample size. Publication bias was evaluated via Deek's linear regression 19 with the significance level at P < 0.05. After 503 duplicates were removed, the remaining 1171 records were screened by scanning titles and abstracts, which excluded 1086 records because of unrelated topic, reviews, comments, case reports, or animal or experimental study. Then, 85 full-text articles were left for eligibility assessment, which excluded 41 studies, including five duplicates, seven cases, nine reviews, comments or letters, 12 studies unrelated to diagnostic value, and eight studies with insufficient data. Finally, 44 studies involving 61 groups of data and 3369 patients were included in the qualitative and quantitative synthesis analysis (Data S2).

| General characteristics of included studies
The general characteristics of the included studies were presented in Table 1. These studies were published from 1991 to 2016. The sample sizes ranged from 17 to 179 patients. The 61 groups of data were from China (55), Turkey (2), Singapore (2), Italy (1), and Saudi Arabia (1), adopted MRI (23) and PET/CT (38), and were prospective (29) and retrospective (32). The golden standard of the 61 groups of data was biopsy (10), follow-up (8), or both (43). The median sample size was 45, but was <45 and >45 in 31 and 30 groups of data, respectively. The sensitivity and specificity of the included studies ranged from 55.6% to 100.0% and from 15.4% to 100.0%, respectively.

| Assessment of quality
Data S3 and S4 summarized the details of risk of bias. Overall, the whole quality of the included studies was pretty good. The proportion of high-risk bias studies was very low. The main issue was flow and timing (unclear if there was an appropriate interval between index test and reference standard). Totally, five and 24 studies were categorized as low and unclear risk of bias, respectively, because of flow and timing. Two studies were categorized as unclear risk of bias in index test and three studies as unclear risk of bias in reference standard.

| Pooled results
The estimated results about the diagnostic ability of PET-CT/MRI for local recurrent and residual NPC were shown in Table 2. The random-effect models were used because of the high heterogeneity (I 2 > 50%). The other overall estimated results were as follows: sensitivity = 0.90 [95% CI: 0.86-0.93, Figure 2], specificity = 0.85 [95% CI: 0.81-0.89, Figure 3 Figure 4A], which indicated a high diagnostic ability. Table 1 also presented the results of subgroup analyses by population (China vs other countries), sample size (≤45 vs >45), examination methods (MRI vs PET/CT), study design (prospective vs retrospective), and golden standard (biopsy, follow-up, or both). No significant difference was found in different standards, populations, sample sizes, or study designs. Similar sensitivity and specificity were found among different subgroups. Similar pooled AUCs were found in golden standards (Figure 4 Table 2). The AUCs of MRI and PET-CT were 0.87 and 0.96, respectively. No-cross of 95% CI was found between MRI and PET/CT (0.87-0.90 vs 0.94-0.98). The pooled sensitivity and specificity forest plots and Fagan's Nomogram of the examination methods were presented in Data S5. The overall Fagan's Nomogram was presented in Figure 5. If the pre-test probability was 30%, the post-test probability would reach about 73% with PLR of 6.

| Meta-regression analysis
Considering the high heterogeneity within studies, we conducted the meta-regression to explore the potential influencing factors. The meta-regression results indicated the examination method (PET/CT vs MRI) was a potential source of heterogeneity (P < 0.001; Table 3 analysis showed the diagnostic ability of MRI was slightly weaker than PET/CT.

| Publication bias
The publication bias was assessed by Deek's line regression plot. The X-and Y-directions were effective sample size and diagnostic odds ratio, respectively. The angle between the regression line and the X-direction was close to zero, which means no publication bias (P = 0.954, Data S6). The regression line was almost parallel with the X-direction. Begg's test did not indicate publication bias (Z = 1.200, P = 0.230), indicating the publication bias of the current study was limited, but Egger's test showed some publication bias (t = 5.430, P < 0.001).

| DISCUSSION
PET/CT and MRI both show relatively high overall accuracy in diagnosing local recurrence and residue of NPC, but PET-CT is superior over MRI according to the subgroup analyses. Meta-regression suggests the examination method is the main source of heterogeneity. This is the largest study so far that presents more accurate estimation about PET-CT and MRI in diagnosing recurrent and residual NPC. Two other studies also compared 18F-FDG PET/CT, MRI, and singlephoton emission computed tomography (SPECT) in diagnosing local residual/recurrent NPC, 20,21 but these studies had several limitations. First, their results were reported in 2007 and 2016, respectively, but the search period was from 1990 to 2014 after which many new studies were reported. Our study includes 27 new studies. Though Wei's report included 17 studies, only <10 studies were focused on MRI or PET/ CT. Second, our subgroup analysis by the gold standard (biopsy, follow-up, or both) showed no significant diagnostic differences, which excluded the verification bias mentioned by the two studies. Third, though they reported PET/CT and SPECT were superior over MRI in distinguishing recurrent NPC from fibrosis after radiotherapy, the supplementary data indicated the SROCs of SPECT and MRI overlapped, which means the significant difference was doubtful. Finally, the two studies and the present study all found high heterogeneity, but our meta-regression analysis identified the examination method as one of the heterogeneity sources. Moreover, the latest version of Assessment of Methodological Quality was used in the present study. Whether there is local residue or recurrence is extremely important for NPC staging and treatment plan. As reported, NPC patients with local residue had poorer prognosis and higher risk of recurrence. 22 MRI was previously considered as the golden standard of local therapy efficacy in NPC. 23 However, the inflammatory changes after radiotherapy interfered the image interpretation and lowered the specificity (range from 44% to 83%). 24 On the contrary, PET/CT shows strong diagnostic ability of efficacy evaluation and lesion distinguishing (specificity: 93.4%). Some studies compared PET/CT and MRI in distinguishing residual/recurrent NPC, but the results were inconsistent. 25 Most studies reported PET/CT was superior over MRI in diagnosing local recurrence and residue of NPC. [26][27][28] However, a retrospective study involving 63 consecutive patients showed MRI vs PET/ CT had slightly, but not significantly, higher overall accuracy in diagnosing residual and/or recurrent NPC (92.1% vs 85.7%). 29 This difference from other studies may be attributed to the overestimated overall diagnostic accuracy due to the small sample size. Our results with a larger sample indicated PET/CT vs MRI showed higher overall diagnostic accuracy with sensitivity (92% vs 83%), specificity (89% vs 78%), and SROC (0.96 vs 0.87). The differences of overall accuracy between PET/CT and MRI may be attributed to the imaging principle. It is generally agreed that MRI outperforms CT in detecting residual and recurrent NPC. 30 MRI can efficiently distinguish tumor lesions from normal tissues and identify the fibrosis and tumor recurrence after local radiotherapy. The tissue-specific signals of MRI clearly outline the scope, size, and depth of tumor invasion and localize the nasopharyngeal mass, involved areas (especially the parapharyngeal space), perineural skull infiltration, skull damage, and intracranial invasion. With the wide clinical application, MRI has become an important method for the pretreatment examination and post-radiotherapy efficacy judgment of NPC. However, MRI still has limitations in identifying the swollen lymph nodes, since the diagnosis is dependent on the lymph nodes size. The pathology patterns of lymph nodes are unclear, which may lead to misdiagnosis or missed diagnosis of diseases. Different from MRI, the 18F-FDG PET/CT with unique metabolic imaging features can more correctly diagnose lymph node properties.
PET/CT has higher overall diagnostic accuracy for recurrent NPC and generally consists of a PET scanner, a high-resolution spiral CT scanner, and an operating system that will combine two types of scan images. PET and CT can be obtained simultaneously with one scan. PET/CT images combine the metabolic imaging characteristics of PET scanners with the anatomical imaging characteristics of CT scanners, which make up for the unclear positioning of PET and solve the low accuracy of CT. Given the biological characteristics of specific tumor tissues and the imaging characteristics of PET/CT, PET/CT has significant advantages in differentiating post-radiotherapy NPC from fibrosis and tumor local recurrence or lymph node metastasis. Currently, the most commonly used nuclide tracer is 18F-FDG, imaging of which distinguishes benign and malignancy mainly according to the difference in glucose metabolism between normal tissues and tumor tissues in the human body. The principle is that SPECT 18F-FDG after entering human malignant tumor cells is decomposed by hexokinase into an undecomposed 6-phosphoric acid deoxidizing glucose, which largely accumulates in the tumor cells and significantly increases the metabolism activity of tumor tissues and uptake of 18F-FDG. However, as a tumor-nonspecific imaging agent, the uptake of 18F-FDG in the irradiation area can also be increased by inflammatory changes. 31 Therefore, PET/CT contains some false positives and false negatives.
The present study has several limitations. First, the heterogeneity within studies is quite high, which was addressed here by two ways. The subgroup analysis only by the golden standard found the source of heterogeneity, but not population, study design, examination methods, or sample size. Only significant difference of overall accuracy was found. Then, multivariate meta-regression including the above factors indicated the examination method may be associated with heterogeneity. Second, some factors and unmeasured or unreported study characteristics such age gender and stage cannot be obtained for further subgroup, which may overestimate or underestimate the overall pooled results. The reason is that the sample size was too small to further subgroup analysis in each study. Third, the golden standard was mixed (biopsy, follow-up, or both), but biopsy would be better. However, the subgroup analysis did not indicate significant difference among three types. Moreover, MRI or PET ∓CT had enormously evolved during the long search period from 1991 to 2018. However, the meta-regression indicated publication year seemingly had no effect on the estimated covariate effect.
In conclusion, PET/CT and MRI both show quite high overall diagnostic ability for local recurrence/residue of NPC. But the subgroup analyses indicate PET-CT is superior over MRI in diagnosis of local recurrent and residual NPC after radiotherapy. The examination methods affect the heterogeneity within studies. The present study provides stronger evidence for clinical practice.

ACKNOWLEDGMENT
LZZ and SLF designed this study and contributed substantially to the design of the search strategy. LZZ and LYY searched and selected the trials and extracted data. LZZ and LN performed the analysis and interpreted the data. LZZ wrote the manuscript. LZZ and LYY critically reviewed the manuscript. LZZ and LYY participated in the data extraction and critically revised it. LZZ and SLF proofread the final version. All authors read and approved the final manuscript.