Imaging modalities for characterising T1 renal tumours: A systematic review and meta‐analysis of diagnostic accuracy

Abstract Objectives International guidelines recommend resection of suspected localised renal cell carcinoma (RCC), with surgical series showing benign pathology in 30%. Non‐invasive diagnostic tests to differentiate benign from malignant tumours are an unmet need. Our objective was to determine diagnostic accuracy of imaging modalities for detecting cancer in T1 renal tumours. Methods A systematic review was performed for reports of diagnostic accuracy of any imaging test compared to a reference standard of histopathology for T1 renal masses, from inception until January 2023. Twenty‐seven publications (including 2277 tumours in 2044 participants) were included in the systematic review, and nine in the meta‐analysis. Results Forest plots of sensitivity and specificity were produced for CT (seven records, 1118 participants), contrast‐enhanced ultrasound (seven records, 197 participants), [99mTc]Tc‐sestamibi SPECT/CT (five records, 263 participants), MRI (three records, 220 participants), [18F]FDG PET (four records, 43 participants), [68Ga]Ga‐PSMA‐11 PET (one record, 27 participants) and [111In]In‐girentuximab SPECT/CT (one record, eight participants). Meta‐analysis returned summary estimates of sensitivity and specificity for [99mTc]Tc‐sestamibi SPECT/CT of 88.6% (95% CI 82.7%–92.6%) and 77.0% (95% CI 63.0%–86.9%) and for [18F]FDG PET 53.5% (95% CI 1.6%–98.8%) and 62.5% (95% CI 14.0%–94.5%), respectively. A comparison hierarchical summary receiver operating characteristic (HSROC) model did not converge. Meta‐analysis was not performed for other imaging due to different thresholds for test positivity. Conclusion The optimal imaging strategy for T1 renal masses is not clear. [99mTc]Tc‐sestamibi SPECT/CT is an emerging tool, but further studies are required to inform its role in clinical practice. The field would benefit from standardisation of diagnostic thresholds for CT, MRI and contrast‐enhanced ultrasound to facilitate future meta‐analyses.


| INTRODUCTION
Increasing use of cross-sectional imaging has resulted in a rise in detection of incidental renal tumours.Current standard of care for T1 renal tumours, as defined by the Union for International Cancer Control, 1 is surgical resection. 2However not all renal tumours are cancer, with up to 30% of partial nephrectomy specimens being benign. 3Partial or radical nephrectomy represents overtreatment of benign renal tumours and can be avoided if the distinction is made accurately before surgery.
Despite high diagnostic accuracy of renal tumour biopsy, it has not been widely adopted due to concerns about bleeding, tumour seeding, non-diagnostic samples, difficulties in accessing anatomically complex tumours and assessment of only localised areas within the tumour. 4Diagnostic imaging therefore overcomes several important limitations of biopsy.
A recent descriptive review of novel imaging techniques for renal tumours concluded that [ 99m Tc]Tc-sestamibi SPECT/CT and radiolabelled girentuximab are the closest to clinical adoption. 5However, the lack of quantitative analysis of diagnostic accuracy and how they compare to existing imaging techniques limits conclusions that can be drawn from the review.
In order to address the evidence gap, this systematic review was performed to determine and compare the diagnostic accuracy of various imaging modalities for detecting cancer in renal tumours.

| Protocol and registration
The protocol was developed according to PRISMA-DTA 6 and principles outlined in the Cochrane Handbook for systematic reviews of diagnostic accuracy v2, 7 and prospectively registered with PROSPERO (CRD42022303473).Protocol deviations are summarised and justified in the protocol.

| Eligibility criteria
Primary research articles evaluating the diagnostic accuracy of any imaging modality to characterise T1 renal tumours as malignant or benign as defined by a histopathological reference standard from surgery or biopsy were included.Prospective and retrospective studies were included.Studies that did not report sufficient diagnostic accuracy data, that is, the number of true and false positives and true and false negatives, were excluded.Studies that included participants with renal tumours of any stage were included if measures for T1 tumours could be extracted separately.Case-control studies were excluded as they are at high risk of bias.Full manuscripts and conference abstracts with sufficient information to meet the inclusion criteria were included.

| Information sources
Comprehensive searches of electronic databases MEDLINE, EMBASE, Science Citation Index, The Cochrane Library, Clinicaltrials.govand WHO trials register were performed from inception to 12 January 2023.

| Search
Individual search strategies are detailed in Appendix S1.Due to the high number of texts during scoping searches (>40 000) we used a sensitivity-maximising diagnostic filter to limit the results to a feasible number to review. 8,9No language restrictions were applied.
Returned articles from each database were combined and duplicates removed using systematic review management software Covidence (available at covidence.org).

| Study selection
Titles and abstracts were screened independently by two authors followed by full-text screening in the same manner (JBF, VM, PI, VWSC, EZ or HW).Disagreements were discussed with a third author to reach consensus.Multiple publications from the same authors and institution with an overlapping recruitment period were managed by excluding the report with the smaller sample size.Reasons for exclusions were recorded.Hand searches of reference lists of included studies were performed to identify additional relevant literature.Non-English language texts were translated to allow for screening and data extraction.

| Data collection process
Data extraction was carried out independently by two authors from the research team (JBF, VM or PI) using a pre-prepared and piloted form.
Disagreements were reviewed and resolved by a third author (HW).
Further information was sought from study authors where necessary.

| Definitions for data extraction
The following data were extracted: study characteristics (authors, year of publication, institution, single or multi-centre, country, language of publication, study period, study design, number of patients enrolled), patient characteristics (age, gender, ethnicity, number of tumours, lead tumour size, lead tumour volume), index test(s) (modality, manufacturer, model, specific settings, number of interpreters, presence of consensus interpretation, interpreter experience), reference standard(s) (modality, diagnostic criteria, number of interpreters, presence of consensus interpretation, interpreter experience), number of true positives, false positives, true negatives and false negatives.If data from multiple interpreters was presented, the results were averaged or the results from the authors' primary analysis were used.If results were reported at multiple thresholds, the diagnostic accuracy measures at each threshold were collected and the threshold used for the authors' primary analysis was used in our analysis.If studies explicitly stated that they had classified a malignant subtype of renal cell carcinoma (RCC) as benign due to indolent nature, we treated them as malignant in this review.

| Risk of bias and applicability
Risk of bias and applicability concerns were assessed by two independent review authors (JBF, VM, PI) using the QUADAS-2 tool and QUADAS-C tools. 10,11QUADAS-2 and QUADAS-C tools were customised to be relevant for this review (Appendix S2).Differences were resolved by a third author (HW).

| Diagnostic accuracy measures
Sensitivity and specificity were reported as the principal measures of diagnostic accuracy.The unit of assessment was per lesion.

| Synthesis of results and meta-analysis
Study estimates of sensitivity and specificity were plotted on forest plots and receiver operating characteristic (ROC) space to explore between-study variation in performance of each test.For imaging modalities with measures of diagnostic accuracy reported at the same threshold, bivariate analysis was attempted but convergence was not obtained.Therefore, univariate fixed-effect model (determined by the model fit) was performed to calculate summary point estimates of sensitivity and specificity at that threshold. 12Comparison of these tests was attempted using a hierarchical summary receiver operating characteristic (HSROC) model, but convergence was not obtained.For imaging modalities reported at different positive thresholds, metaanalysis was not performed as the result is clinically uninterpretable. 7en meta-analysis was not performed, we reported the sensitivity and specificity with 95% confidence intervals from the individual studies, calculated with Review Manager version 5.4.1 (The Cochrane Collaboration, Software Update, Oxford, UK).Statistical analyses were performed with SAS v.9.4.The data and the code used for metaanalysis are available from Appendix S3.

| Study selection
The search identified 5350 unique records following removal of duplicates.Of these, 5065 were excluded on title and abstract screening.
An additional 23 references were identified through scanning reference lists of the identified studies, related search function and citing reference search.Of the resulting 308 references, 281 were excluded following full-text review, with reasons stated in Figure 1.Twentyseven studies including 2277 tumours in 2044 patients were included.
Nine studies with 314 lesions in 306 participants were included in the meta-analysis of diagnostic accuracy of [ 99m Tc]Tc-sestamibi SPECT/ CT and [ 18 F]FDG positron emission tomography (PET).
Participant demographics were as follows: mean age 59 years, 63% male, mean lesion size 3.2 cm, prevalence of renal malignancy 69% (IQR 50%-78%).For comparison, population level age-specific incidence of kidney cancer is highest in >65 year olds, and 62% of kidney cancer cases occur in men. 13[16] For comparison, US census data reports population-level ethnicity to be 76% White, 14% Black, 6% Asian and 4% mixed/other. 17Hispanic origin, considered a distinct concept to race, is 19% (of any race). 17 34

| Risk of bias and applicability
Overall, there was a high or uncertain risk of bias for at least one domain in all included studies (Figure 2).

| Participant selection
Patient selection was heterogeneous across studies, with the majority of participants included based on management strategy, including partial nephrectomy, 31 nephrectomy, 40 any surgical resection, 15,16,18,[24][25][26]29,32,34 ablation 14 or patients who underwent CT guided biopsy. 30 Other20,22,27,28,32,33,[36][37][38][39] One study included patients referred for CEUS when CT, MR or US was indeterminate.21 Two small studies included all-comers, 23,41 and in one study, the criteria for case selection were unclear. 35We considered surgical-only populations to have high applicability concerns.Surgical patients are likely to be younger and fitter than surveillance populations, 42 reflected in the study population of this review being younger on average than population-level data for kidney cancer.Younger patients are more likely to have benign tumours, 43 reflected in the high proportion of benign tumours in this review, and cause applicability concerns for the wider population of patients with renal masses. 37

| Index test
Criteria for a positive CEUS and MRI tests were at different thresholds in each study, or the threshold was not reported.For CECT, contrast enhancement was generally included in the description of a positive test, with 40 or without [29][30][31] a defined increase in Hounsfield units between pre and post contrast phases.Alternative criteria were also described 16,24 or the threshold not defined. 28l five studies reporting diagnostic accuracy of [ 99m Tc]Tcsestamibi SPECT/CT used the same threshold of absent radiotracer uptake in the tumour to signify malignancy. 15,18,36,38,39[ 99m Tc]Tcsestamibi SPECT/CT images were reported by two clinicians in collaboration to reach consensus, limiting the applicability to clinical practice where most diagnostic imaging is reported by a single clinician.
Four small studies, each with 4-15 participants reported the diagnostic accuracy of [ 18 F]FDG PET 25,27,32,33 with a common positive threshold of FDG uptake in the tumour greater than the surrounding renal parenchyma.

| Reference standard
Generally, there was poor reporting of reference standard conduct and therefore unclear risk of bias.However, where histology was performed as part of standard care, we deemed applicability concerns to be low in all but one study that described pathologic diagnosis made solely on morphology, 40 when the addition of immunohistochemistry is a minimum standard.Diagnostic criteria used to identify the target condition were not reported for 23 studies, 14,16,[18][19][20][21][22][23][24][25][27][28][29][32][33][34][35][36][38][39][40][41] one study reported International Society of Urological Pathology      15 and four studies reported the World Health Organisation classification system 2004 26,30,31 and 2016 editions. 37 Ten studies std that the reference test was interpreted without knowledge of the results of the index test.F I G U R E 2 Risk of bias and applicability concerns summary: review authors' judgement about each domain for each included study.

| Results of individual imaging modalities
Forest plots of estimates of sensitivity and specificity along with the 95% confidence intervals for each included study are presented in Figure 3.

| CECT
Seven studies including 1118 patients with 1320 renal lesions reported estimates of sensitivity and specificity for CEUS to detect malignancy in T1 renal tumours ranging from 71% to 100% and 44% to 98%, respectively (Figure 3A).One study was an outlier in forest plots and ROC space, 40 likely due to the study population of 23 participants with end-stage renal failure with 222 renal lesions, mostly uncomplicated renal cysts, thus overestimating measures of diagnostic accuracy.Another study reported diagnostic accuracy of a model including clinical and radiomic data (i.e.artificial intelligence-guided data characterisation) from CT and was therefore not comparable. 16The remaining studies used different thresholds to define a positive test, so meta-analysis was not performed. 12

| CEUS
Seven studies including 197 patients with 504 renal lesions reported estimates of sensitivity and specificity for CEUS to detect malignancy in T1 renal tumours ranging from 35% to 100% and 0% to 100% (Figure 3B).These studies used different thresholds to define a positive test, so meta-analysis was not performed. 12

| [ 99m Tc]Tc-sestamibi SPECT/CT
Five studies including 271 renal lesions in 263 patients reported estimates of sensitivity and specificity for [ 99m Tc]Tc-sestamibi SPECT/CT to detect malignancy in T1 renal tumours (Figure 3C).All included studies reported measures of diagnostic accuracy at the same positive threshold that was radiotracer uptake in the tumour less than the surrounding renal parenchyma.Meta-analysis using a univariate fixedeffect regression model because of sparse data and determined by best model fit returned summary estimates of sensitivity and specificity for [ 99m Tc]Tc-sestamibi SPECT/CT to detect malignancy of 88.6% (95% CI 82.7%-92.6%)and 77.0%(95% CI 63.0%-86.9%),respectively (Figure 4).

| [ 18 F]FDG PET
Four studies including 43 patients with 43 lesions reported estimates of sensitivity and specificity for [18F]FDG PET/CT to detect malignancy in T1 renal tumours (Figure 3D).All included studies reported measures of diagnostic accuracy at the same positive threshold of radiotracer uptake in the tumour relative to the surrounding renal parenchyma.Meta-analysis using univariate mixedeffects regression model because of sparse data and determined by best model fit returned summary estimates of sensitivity and specificity for [ 18 F]FDG PET to detect malignancy of 53.5% (95% CI 1.6%-98.8%)and 62.5% (95% CI 14.0%-94.5%),respectively (Figure 5).An HSROC model to compare diagnostic accuracy of [18F]FDG PET/CT with [ 99m Tc]Tc-sestamibi SPECT/CT was attempted but did not converge.

| Findings in the context of existing evidence
Previous meta-analyses have reported diagnostic performance of CEUS versus CECT and/or MRI for renal tumours. 44,45In the event of different positive thresholds across studies, an HSROC model is recommended to produce a summary curve rather than point estimates for sensitivity and specificity, 7 which was not the statistical approach adopted in either review. 44,45Further, these reviews chose to include imaging follow-up as a reference standard.While a period of initial surveillance provides helpful information on the trajectory of a renal lesion, growth rate does not differentiate benign from malignant disease as many cancers remain stable in size 46 and benign tumours can exhibit growth. 47In our own review, we excluded studies where the reference test included imaging surveillance.
There have been two previously published systematic reviews of  48,49 These reviews reported higher estimates of sensitivity (90%-91%) and specificity (86%) than our own, albeit with overlapping confidence intervals.Several new studies have been published since the former, and the latter is limited by inclusion of case-control studies that are at high risk of bias, excluding histological subtypes other than RCC, oncocytoma or angiomyolipoma and classifying hybrid oncocytic/chromophobe tumours (HOCT) as benign.While misclassifying HOCT as benign is likely of little clinical consequence given their indolent nature, the World Health Organisation defines them as malignant and has recently included them in the emerging entity of 'low-grade oncocytic tumours' (LOT). 50Furthermore, both reviews differed from our own by including tumours of all T stages and therefore had a higher proportion of malignant histology (78%-83% vs. 69%).The ability to differentiate benign from malignant tumours is most relevant in the T1 setting where clinicians report higher willingness to manage benign tumours conservatively. 51Further prospective studies of [ 99m Tc]Tc-sestamibi SPECT/CT are awaited, 52 and work evaluating its role as an replacement test for biopsy, add-on test, or triage test is needed.
A MRI-based 'clear cell likelihood score' has shown pooled estimates of sensitivity and specificity of 80% (95% CI 75%-85%) and 74% (95% CI 65%-81%) to detect clear cell RCC in a systematic review and meta-analysis of six studies including 825 T1a renal masses. 53Additionally, [ 89 Zr]DFO-girentuximab PET/CT has been reported in a conference abstract to have sensitivity of 86% [80%, 90%] and 87% [79%, 92%], also for detecting clear cell RCC with the full manuscript awaited. 54These studies were not included in our review as it was not possible to extract diagnostic accuracy data for benign versus malignant lesions.Clinically, these tests may have a triage role supporting active treatment for patients with a positive test for clear cell RCC; however, patients with a negative test would still require further diagnostics.
Radiomics has received growing interest, including in the setting of renal tumours. 5,55Advanced computing may allow extraction of quantitative spatial information from medical imaging to detect differences imperceptible to the human eye.Only one manuscript including radiomics from CT was of sufficient quality for inclusion in this review and reported area under the curve of 0.77 (95% CI 0.69-0.85)for a model including radiomics and clinical factors. 16No comparison was made with radiologist reporting of imaging.

| Limitations
We applied diagnostic filters in our search strategy to limit the returned texts to a feasible number to screen.The filters used have a sensitivity of 98.6% for MEDLINE 9 and 100% for Embase, 8 so the risk of having omitted relevant studies is low.
A limitation of our review is that most participants underwent surgical resection or diagnostic biopsy, due to our inclusion criteria necessitating a histopathological reference standard.In doing so, we limit the applicability of our results to patients on surveillance without histopathological diagnosis.
Eighty-four studies were excluded from our review because they included all stages of renal tumour and it was not possible to extract diagnostic accuracy data for T1 tumours alone.We advocate future diagnostic accuracy studies reporting measures of diagnostic accuracy for each tumour stage to facilitate future reviews.
We chose per-lesion rather than per-participant analysis as information at the level of the lesion is important for clinical decision making.For example, if a patient had multiple synchronous renal lesions-some malignant and others benign-then urologists would favour treating the malignant tumours, and not the benign ones in an effort to preserve renal function.However, this approach assumes independence of the lesions in a single participant, and therefore, measures of diagnostic accuracy are likely overestimated for studies that included participants with multiple lesions. 56

| Deviations from the protocol
We revised our original protocol from including only T1a to all T1 renal tumours due to sparse data for T1a lesions alone.The protocol change was registered with PROSPERO.T1a renal tumours have the highest prevalence of benign histology when compared to tumours of greater size and T stage, 43 and extended eligibility to larger tumours has likely resulted in a higher prevalence of malignant histology, although the mean size of included tumours was 3.2 cm.For all imaging modalities, it is conceivable that the diagnostic accuracy increases with increasing tumour size due to both resolution limits and less signal contamination in the tumour volume from normal surrounding renal parenchyma.

| CONCLUSIONS
Imaging-based diagnostics for risk stratifying renal tumours is an unmet need.Currently, the optimal imaging strategy to characterise T1 renal tumours is not clear because of heterogeneity and sparse data as well as a lack of direct comparisons.[ 99m Tc]Tc-sestamibi SPECT/CT is an emerging tool, but further studies are required to inform its role in clinical practice.We advocate future diagnostic accuracy studies reporting performance at each tumour stage and standardisation of the diagnostic threshold used to consider CT, MRI and CEUS positive for cancer.

AUTHOR CONTRIBUTIONS
The study was conceived and designed by HW, ME, MGBT and KG.

F
I G U R E 1 PRISMA flow diagram showing study selection process and reasons for exclusion from the meta-analysis.T A B L E 1 Individual characteristics of included studies.

3 . 5 |
Risk of bias in the comparisonFor the single study that included a direct comparison of CEUS versus CECT,28 risk of bias in the comparison was unclear for patient F I G U R E 3 Forest plot of estimates of sensitivity and specificity of (A) contrast-enhanced computed tomography, (B) contrast-enhanced ultrasound, (C) [ 99m Tc]Tc-sestamibi SPECT/CT, (D) [ 18 F]FDG PE, (E) multiparametric magnetic resonance imaging, (F) [ 68 Ga]Ga-PSMA-11 PET and (G) [ 111 In]In-girentuximab SPECT/CT for the diagnosis of tumour malignancy.CI, confidence interval; FN, false negatives; FP, false positives; TN, true negatives; TP, true positives.selection, conduct or interpretation of the index test, conduct or interpretation of the reference standard and at low risk of bias in the comparison for flow and timing.

[
99m Tc]Tc-sestamibi SPECT/CT by Wilson et al. in 2020 and Basile F I G U R E 4 Summary receiver operating characteristic curve of five included studies reporting the diagnostic accuracy of [ 99m Tc]Tc-sestamibi SPECT/CT to detect malignancy in patients presenting with T1 renal tumours.= estimate from individual study • = summary estimate = 95% confidence region.Summary estimates of sensitivity and specificity to detect cancer are 88.6% (95% CI 82.7%-92.6%)and 77.0%(95% CI 63.0%-86.9%),respectively.et al. in 2023.
The protocol was developed and published (HW, JBF, VWSC, RH, VK, ME, MGBT, KG).Study screening and data extraction was performed by HW, JBF, VM, PI, VWSC and EZ.Analysis was performed by HW with support from KG.The manuscript was written by HW, with revisions from JBF, RH, VK, MGBT and KG.All co-authors approved the final version of the manuscript.