Variability in prostate cancer detection among radiologists and urologists using MRI fusion biopsy

Abstract Objectives The aim of this study is to evaluate the impact of radiologist and urologist variability on detection of prostate cancer (PCa) and clinically significant prostate cancer (csPCa) with magnetic resonance imaging (MRI)‐transrectal ultrasound (TRUS) fusion prostate biopsies. Patients and methods The Prospective Loyola University MRI (PLUM) Prostate Biopsy Cohort (January 2015 to December 2020) was used to identify men receiving their first MRI and MRI/TRUS fusion biopsy for suspected PCa. Clinical, MRI and biopsy data were stratified by radiologist and urologist to evaluate variation in Prostate Imaging‐Reporting and Data System (PI‐RADS) grading, lesion number and cancer detection. Multivariable logistic regression (MVR) models and area under the curve (AUC) comparisons assessed the relative impact of individual radiologists and urologists. Results A total of 865 patients (469 biopsy‐naïve) were included across 5 urologists and 10 radiologists. Radiologists varied with grading 15.4% to 44.8% of patients with MRI lesions as PI‐RADS 3. PCa detection varied significantly by radiologist, from 34.5% to 66.7% (p = 0.003) for PCa and 17.2% to 50% (p = 0.001) for csPCa. Urologists' PCa diagnosis rates varied between 29.2% and 55.8% (p = 0.013) and between 24.6% and 39.8% (p = 0.36) for csPCa. After adjustment for case‐mix on MVR, a fourfold to fivefold difference in PCa detection was observed between the highest‐performing and lowest‐performing radiologist (OR 0.22, 95%CI 0.10–0.47, p < 0.001). MVR demonstrated improved AUC for any PCa and csPCa detection when controlling for radiologist variation (p = 0.017 and p = 0.038), but controlling for urologist was not significant (p = 0.22 and p = 0.086). Any PCa detection (OR 1.64, 95%CI 1.06–2.55, p = 0.03) and csPCa detection (OR 1.57, 95%CI 1.00–2.48, p = 0.05) improved over time (2018–2020 vs. 2015–2017). Conclusions Variability among radiologists in PI‐RADS grading is a key area for quality improvement significantly impacting the detection of PCa and csPCa. Variability for performance of MRI‐TRUS fusion prostate biopsies exists by urologist but with less impact on overall detection of csPCa.

for performance of MRI-TRUS fusion prostate biopsies exists by urologist but with less impact on overall detection of csPCa.

K E Y W O R D S
magnetic resonance imaging, practice variation, prostate biopsy, prostate cancer, prostate cancer detection

| INTRODUCTION
An estimated 248 530 new cases of prostate cancer were diagnosed in 2021, with an increasing role played by magnetic resonance imaging (MRI) to aid in identification and localisation of clinically significant prostate cancer (csPCa). 1 The PROMIS trial proposed the use of MRI as a triage test which could theoretically avoid up to 27% of primary biopsies with 93% sensitivity of csPCa. 2 The PRECISION and PRECISE trials demonstrated improved detection of csPCa between 5% and 12% with MRI-targeted biopsy over transrectal ultrasound (TRUS) template biopsy and improved exclusion of Gleason Grade 1 (GG1) disease by 12%. 3,4A combined technique with MRI-targeted and standard template biopsy in a real-world cohort improved detection of csPCa by 10% although reduction in GG1 diagnoses was minimal at 0.5% over standard template biopsy, while the Trio study showed the addition of targeted biopsy upgraded 12.7% of cases to GG ≥ 2. 5,6 Use of MRI with combined targeted and template biopsy in biopsynaïve patients is now a guideline recommendation. 710] The Prostate Imaging-Reporting and Data System (PI-RADS) grading system provides a standardised interpretation paradigm to predict the presence of csPCa for a specific lesion in the prostate gland, with the intent to improve both diagnostic performance and reproducibility between radiologists. 11Despite widespread adoption of PI-RADS grading, there appears to be a broad range of radiologists' interpretations of prostatic lesions.Furthermore, the positive predictive value of MRI for csPCa varies significantly across institutions, between 27% and 48% (interquartile range) for PI-RADS scores ≥3. 12 Therefore, the primary goal of this study was to assess variability in PI-RADS classification across radiologists and the relative impact of variability between radiologists and urologists on prostate cancer detection.

| Study procedures
MRI was performed using 3-Tesla MRI (Siemens Magnetom Triop and Verio).In rare instances, 1.5-Tesla MRI (GE Optima MR450W) was employed when use of 3-T coil was contraindicated.An endorectal coil was used for cases prior to 2019, after which it was routinely omitted for most patients.Postimage processing used DynaCAD software (Philips Healthcare, Best, Netherlands).MRI images were graded by experienced but nondedicated faculty using PI-RADS version 2.0 or 2. 1. 11 All MRI-TRUS fusion biopsies were performed transrectally by experienced urologists using the UroNav system (Invivo, Philips Healthcare).Transperineal biopsy was introduced after the study period.Standard template biopsies included two biopsy cores taken from each sextet region.Targeted biopsies were obtained at the discretion of the urologist; PI-RADS 3-5 lesions were routinely biopsied with two cores, while PI-RADS 2 lesions were rarely sampled.

| Statistical analyses
Highest PI-RADS lesion categorisation and number of lesions on MRI were stratified by radiologist and by urologist to evaluate the grading and distribution of lesions among patients undergoing biopsy.Radiologists' and urologists' performance and variability in prostate cancer detection (positive predictive value) were assessed by the detection of PCa (GG ≥ 1) and of csPCa (GG ≥ 2) by either template or targeted biopsy among men with PI-RADS 3-5 lesions.The Chi-square test was used to evaluate for significant unadjusted differences between individuals.
Multivariable logistic regression (MVR) models evaluated the associations of clinical and MRI parameters with detection of PCa and of csPCa found on biopsy.The respective impact of radiologists and urologists to prostate cancer diagnosis was then evaluated by adding these variables to the base MVR models.Individual providers were evaluated relative to each other with odds ratios generated based on the (1) highest performing peer as reference for statistical significance and the (2) median performing peer as reference for bar graph visualisations.As an alternative measure of performance, observed probabilities were compared with predicted probabilities by individual provider.
Impact of experience and case volume was evaluated by comparing the highest volume performers with the lowest based on median case volumes.Area under the curve (AUC) comparisons between MVR models assessed the relative impact of variability on discrimination for the outcome of PCa detection.All statistics were performed using STATA version 15.0 (STATA Corp, College Station, TX). 13 3 | RESULTS

| Baseline characteristics
A total of 865 patients were included who underwent their first MRI with subsequent TRUS-guided fusion prostate biopsy between 2015 and 2020 (Table 1).Of these men, 132 (15.3%) identified as African- Patients presenting to urologists for biopsy were similar in their PI-RADS assessment as the distribution of PI-RADS grades did not demonstrate significant variability (p = 0.577).The number of PI-RADS lesions per patient was also similar across urologists ( p = 0.459).

| DISCUSSION
Variability in radiologist interpretation is a common critique of MRI use for prostate cancer diagnosis. 2,3Although we found that the overall distribution of PI-RADS grading between radiologists was similar, PCa and csPCa detection rates varied significantly between individual radiologists, with positive predictive values ranging from 34.5% to 66.7% for any PCa and 17.2% to 50.0% for csPCa.Furthermore, MVR revealed a small but significant increase in AUC when including radiologists as an independent factor impacting PCa detection while inclusion of the urologist performing biopsy had less impact.Additionally, practice case volume as a measure of experience did not appear to impact performance.These findings suggest that interreader variability remains a critical opportunity for improvement in prostate cancer diagnosis.
PI-RADS 3 grading may represent a significant area of variability between radiologists that translates to differences in prostate cancer detection.A meta-analysis of interreader agreement using PI-RADS v2 demonstrated only moderate agreement for PI-RADS ≥3 (pooled κ = 0.57), while substantial agreement for PI-RADS ≥4 (pooled κ = 0.61). 8Previous MVR studies found that PI-RADS 4 and 5 grades remained the greatest predictors of cancer diagnosis with biopsy, yet PI-RADS 3 did not predict PCa detection. 3,10,12In this study, despite the lack of statistical disparity in the overall distribution of PI-RADS grades, there was a relatively broad distribution of PI-RADS 3 lesions across radiologists (15.4% to 44.8% of PI-RADS ≥3 lesions) and differences in relative cancer detection rates (0% for three individuals yet 21.6% for another).These findings implicate PI-RADS 3 as an area of disagreement or inconsistency, despite the efforts of PI-RADS v2 to standardise radiologists' interpretation and emerging predictive tools. 14consistency in PI-RADS 3 grading is largely responsible for concerns over the use of MRI alone as a triage test in the PCa diagnostic pathway.While the PROMIS trial reported high sensitivity of MRI for csPCa, Sonn et al. reported 24% (22/90) of patients graded as PI-RADS 1-2 harboured csPCa on biopsy and the individual false negative rate varied between 13% and 60% across radiologists. 2,10fortunately, many patients at our institution with negative MRI may not have undergone prostate biopsy limiting their representation in our sample focused primarily on fusion biopsy.Importantly, some patients excluded from biopsy may still harbour undetected csPCa and have a delayed diagnosis attributable to MRI interpretation.Early studies have suggested that dedicated reader education improves accuracy and confidence in prostate MRI interpretation, while computer aided diagnosis may improve sensitivity in detecting peripheral zone lesions. 15,16At our centre, regular quality improvement sessions reviewing MRI and feedback after biopsy between radiologists and urologists were conducted during the early experience with prostate MRI which may have contributed to the improved PCa and csPCa detection we observed in recent years.
Other considerations such as advanced serum and tissue biomarkers, prior biopsy status and validated risk calculators may all play a role in improving decisions regarding prostate biopsy to optimise detection of csPCa. 17,18e impact of individual urologists on PCa detection appears to be significantly less pronounced than radiologists.While there was demonstrable variability in the detection of any PCa, only one urologist underperformed relative to the highest performer; additionally, there was no statistically significant difference in the potentially more important outcome of csPCa detection.Case volume experience between urologists also did not appear to significantly impact PCa detection.In an analysis by Stabile et al., they suggested that the learning curve to perform MRI-fusion biopsies was between 60 and 80 cases. 19Interestingly, the use of MRI-fusion in their study was associated with significantly higher rates of csPCa detection compared with cognitive-fusion (57% vs. 36%, p = 0.002). 19While all urologists in our study were experienced in prostate biopsy, the use of MRI-fusion was introduced during the study period such that the learning curve of the targeted biopsy technique is captured within the data.The lack of significant performance variability in csPCa detection suggests that real-time image guidance may standardise performance and reduce the learning curve.
T A B L E 3 Multivariable logistic regression model for prostate cancer detection.As this study was performed at a tertiary academic referral centre, generalisations may be limited.However, the presence of impactful interreader variability among radiology specialists underscores that greater variability may exist within low-volume centres and may pro- The study included men who underwent their first MRI for clinical suspicion of prostate cancer followed by transrectal MRI-TRUS fusion prostate biopsy between January 2015 and December 2020.Patient records were abstracted retrospectively from the Prospective Loyola University Multiparametric MRI (PLUM) prostate biopsy cohort.All cases were performed at a single tertiary academic referral centre.Men who failed to have a biopsy or who had prior prostate cancer diagnosis were excluded.Cases were grouped by individual radiologist interpreting MRI and urologist performing prostate biopsy; individuals with <10 cases were included in the sample but excluded from performance comparisons.The Institutional Review Board approved the research protocol with informed consent waived for participants.The primary study outcome was prostate cancer detection stratified by (1) radiologists performing PI-RADS grading on prostate MRI and (2) urologists performing the prostate biopsy.Secondary outcomes examined the relative distribution of PI-RADS lesions and number of lesions on MRI.
duce a proportional impact on PCa detection.Similarly, the study contained relatively few urologists with a broad range of practice volumes.Small provider groups likely reflect the reality for most practices, where wide distribution of patients may amplify differences in cancer detection between high and low performers.Regarding study design, only single radiologist interpretations were performed for MRI, limiting the ability to study interobserver agreement.This is also largely a study of patients with positive MRI findings and does not sufficiently study potential cancer detection in patients with negative MRI.Despite its limitations, the study provides a robust analysis of radiologists' interpretation of MRI for PI-RADS and lesions as well as downstream impact on PCa detection.It also identifies variability in urologists' biopsy execution but that the impact on PCa detection was more driven by radiologists' interpretation.The measured effect on PCa detection underscores the need to implement quality improvement efforts through continued evaluation of PI-RADS grading, educational interventions and feedback through relationships between urologists and radiologists to further improve care for patients with a clinical suspicion for PCa.5 | CONCLUSIONSVariability among radiologists in PI-RADS grading significantly impacts the detection of PCa and csPCa based on MRI-TRUS fusion prostate biopsy making it a key area for quality improvement efforts.While there is variability across urologists in performance of prostate biopsy, the impact on PCa detection is minimal compared with MRI interpretation.Notably, cancer detection was improved in more recent years.F I G U R E 2 Adjusted relative variation in prostate cancer detection by (A) radiologist and (B) urologist.Median individual is set as the reference for visualisation.
Clinical and prostate MRI characteristics.
patients (supporting information FigureS2A,B).Of note, three radiologists had no cases of prostate cancer identified from PIRADS 3 lesions.Prostate cancer detection by urologist varied from 29.2% to 55.8% (p = 0.01) (Figure1C), but csPCa detection demonstrated less variability (24.6% to 39.8%, p = 0.36) (supporting informationT A B L E 2 aRadiologist numbering is based on case volume in this table and does not necessarily correspond to prostate cancer detection position in Table3to preserve anonymity.PATEL ET AL. Figure S1B).PCa detection stratified by PI-RADS was 9.1% to 15.7% for PI-RADS 3, 31.8% to 49.1% for PI-RADS 4, and 57.1% to 85.0% for PI-RADS 5. Stratified by biopsy status, the rate of PCa detection for urologists ranged from 41.7% to 65.1% for biopsy-naïve patients and 26.4% to 34.8% for prior negative biopsy patients (supporting information Figure S2C,D).F I G U R E 1 Variation in Prostate Imaging-Reporting and Data System (PI-RADS) grading by (A) radiologists and variation in prostate cancer detection by (B) radiologists and (C) urologists.3.4 | Multivariable analysis1.45,p=0.79)norwhen the top two urologists were compared with the bottom 3 (OR 1.33 (95% CI 0.89-1.98,p=0.16).As an alternative measure of performance, observed probabilities were compared with predicted probabilities from the baseline model for each provider.The degree and rank order of variation was similar to the model in Table3, ranging from an absolute difference of +2.9% to À8.2% for PCa detection among urologists (relative +5.9% to À21.9%) and +12.6% to À6.9% for PCa detection among radiologists (relative +24.8% to À14.8%) (supporting information TableS1).
Relative to the highest performing provider; radiologist numbering is based on prostate cancer detection in this table and does not necessarily correspond to case volume in Table2to preserve anonymity.