Prostagram magnetic resonance imaging in a screening population: Prostate Imaging‐Reporting and Data System or Likert?

To compare biopsy recommendation rates and accuracy of the Prostate Imaging‐Reporting and Data System, version 2 (PI‐RADSv2) with the Likert scale for detection of clinically significant and insignificant prostate cancer in men screened within the Imperial Prostate 1 Prostate Cancer Screening Trial Using Imaging (IP1‐PROSTAGRAM).


Introduction
Two widely recognised scoring systems exist for reporting of multiparametric MRI (mpMRI) scans in men with a clinical suspicion of prostate cancer (PCa): the Prostate Imaging-Reporting And Data System (PI-RADS) and the Likert scale [1,2].Although both systems employ a 5-point category scale for suspicion of cancer, they have key conceptual differences.Updated in 2019, the PI-RADS version 2 (PI-RADSv2) system aims to standardise the acquisition and image-only reporting of mpMRI.Despite its validation in several large multicentre trials, it has limitations related to intermediate specificity [3][4][5].As a result, UK guidelines recommend a Likert scale that takes into account clinical parameters and does not require a specific sequential review of MRI images [6].
Population screening with PSA is not recommended because of concerns regarding overdiagnosis of clinically insignificant PCa (ciPCa) [7].The underdiagnosis of clinically significant PCa (csPCa) is also a recognised issue in secondary care [8].The Imperial Prostate 1 Prostate Cancer Screening Trial Using Imaging (IP1-PROSTAGRAM) study recently highlighted the potential benefits of a short biparametric MRI (bpMRI) scan (without gadoliniumcontaining contrast medium) over PSA for screening [9].However, a specific scoring system for screening MRI scans has yet to be developed.We aimed to compare the biopsy recommendation rates and accuracy of PI-RADSv2 with the Likert scale for the detection of csPCa and ciPCa in men screened with bpMRI in the IP1-PROSTAGRAM study.

The IP1-PROSTAGRAM
The IP1-PROSTAGRAM study is a prospective, blinded, population-based screening study for PCa approved by the UK National Research Ethics Committee [9].Men aged 50-69 years were invited for PCa screening from October 2018 to May 2019.All participants underwent screening with a PSA test, Prostagram MRI, and ultrasonography (B-mode and shear wave elastography).Both imaging tests were reported on a validated 5-point scale of suspicion.If any test result was positive, a systematic 12-core transperineal biopsy was taken.Additional targeted biopsies were taken if MRI or ultrasonography results were positive.Fusion-targeted biopsies were taken with lesions marked on validated PI-RADS sector maps then contoured onto the image-fusion software by a third party prior to biopsy to preserve blinding [1].This study focuses on MRI reporting, so ultrasonography is not discussed henceforward.

Prostagram MRI
Men underwent abbreviated bpMRI (denoted henceforward as 'Prostagram MRI') consisting of T2 and diffusionweighted images with multiple b-values (0, 150, 400, 1000, and 1500 s/mm 2 ) across two imaging centres.The average acquisition time was ~15 min.The protocol did not include gadolinium contrast-medium enhancement and was performed using one 1.5-and one 3.0-T scanner (Aera 1.5T or Verio 3T; Siemens Healthineers, Elangen, Germany) with pelvic phasedarray coils at two geographic sites.The complete MRI protocol is  1.Each site's MRI scans were centrally reviewed and optimised for quality before patient scanning.Furthermore, scans with poor quality images were repeated and if the quality of the diffusion-weighted imaging sequences was compromised by air, participants were offered a rectal flatus tube to decompress the rectum.An antispasmodic agent was administered to all participants to reduce motion artefacts caused by bowel peristalsis.
Prostagram MRI scans were independently reported using both the PI-RADSv2 (revised for scoring without the contrast-enhanced sequence) and Likert scale [2,10].They were interpreted by two expert uro-radiologists (one at each site; H.T. and H.S.) who were blinded to the demographic (excluding age) and clinical information including PSA level.In all, 20% of the MRI scans, stratified by score to ensure a representative sample, were selected for random review by a second reporter (A.P.).Biopsy was performed based on a 'positive' Prostagram MRI, defined as a PI-RADSv2 or Likert score of ≥ 3. Clinically significant PCa was defined as the presence of any Gleason Grade 3 + 4 or greater (International Society of Urological Pathology [ISUP] Grade Group [GG] ≥ 2) based on either targeted or systematic biopsy.Clinically insignificant PCa was defined as the presence of Gleason 3 + 3 only (ISUP GG1).

Outcome Measures
The primary outcome of this study was the proportion of patients recommended for biopsy.Secondary outcomes were the number of GG ≥ 2 and GG1 cancers identified by the scoring systems, and the accuracy of Likert, PI-RADSv2, and when both systems were combined.

Statistical Analysis
Descriptive statistics for baseline characteristics of the cohort were performed.After testing for normality, Spearman's rank correlation coefficient was calculated to test the strength of the correlation between Likert and PI-RADSv2 for reporting of Prostagram MRI.The proportion of patients with screenpositive Prostagram MRI finding was compared overall and on a per-score basis (scores 3-5).MRI thresholds of ≥ 3 and ≥ 4 for screen-positivity were analysed.Detection rates for GG ≥ 2 and GG1 cancers were also calculated and compared on an overall and per-score basis.Statistical comparisons were made using two-tailed z test or chi-squared test unless the variable contained less than five data points, in which case a Fisher's exact test was used.Receiver operating characteristic (ROC) analysis was performed and the area under the ROC curve (AUROC) was calculated for each scoring system and the systems were combined to estimate accuracy.The DeLong method was used to compare the AUROC [11].The Wilson score method was used to obtain 95% CIs.All statistical analyses were conducted using R Statistical Software (version 4.2.0;R Foundation for Statistical Computing, Vienna, Austria).

Ethics
The IP1-PROSTAGRAM was approved by the UK National Research Ethics Committee and was conducted in accordance with the Good Clinical Practice guidelines and the Declaration of Helsinki [12].All the participants provided written informed consent.All data were pseudo-anonymised.The study was managed by the Imperial Clinical Trials Unit and overseen by an independent trial steering committee.

Results
A total of 408 men were screened, of which 406 underwent Prostagram MRI.The baseline characteristics of the cohort are presented in Table 1.

Analysis of Negative Patients
In all, 30 men underwent biopsy based on a positive Likert or PI-RADSv2 score (Fig. 2).Four of the 72 men were above the PI-RADSv2 biopsy threshold but below the Likert threshold, none of which had GG ≥ 2 cancer.One had GG1.A total of 26/94 men were above the Likert threshold for biopsy but below the PI-RADSv2 threshold.One patient had GG ≥ 2 and three had GG1 lesions.
Three patients with GG ≥ 2 were missed by MRI (both Likert and PI-RADSv2 systems).One patient had GG ≥ 2 identified by PSA and systematic biopsy, and two patients were positive on ultrasonography with csPCa identified on targeted biopsy.

The ROC Analysis
The overall accuracy for PI-RADSv2 and Likert systems for diagnosis of GG ≥ 2 was moderate, with no statistically significant difference between the two (AUROC 0.64 vs 0.65, P = 0.95).Combining the systems did not lead to a significant increase in accuracy (AUROC 0.67, P = 0.71; Fig. 3).Combined reporting led to the detection of one more GG ≥ 2 than either score alone, three further GG1 than PI-RADS, and one more GG ≥ 2 than Likert with biopsy recommendation rate at 24.1% (95% CI 20.1%-28.7%,98/406).

Discussion
In this pre-planned analysis of the IP1-PROSTAGRAM screening study, we compared the two most widelyrecognised scoring systems for reporting prostate MRI, Likert and PI-RADS, in a screening population.The Likert scale led to a greater number of patients undergoing biopsy when using a threshold of ≥ 3 but without a meaningful increase in cancer detection.Further, PI-RADSv2 scores were more likely to be scored 5/5 (very high likelihood of cancer) whereas Likert scores were more likely to be scored 3/5 (indeterminate for the presence of cancer).Both scoring systems demonstrated moderate accuracy for detection of csPCa.A modest improvement may have been evident when implementing a biopsy pathway based on reporting of the highest score from both systems.However, this came at the expense of greater recommendation for biopsy, more detection of insignificant cancer, and more negative biopsies.
In the at-risk population, mpMRI is pivotal in detecting highrisk cancers whilst minimising detection of low-risk cancers [13].Structured reporting of these scans is vital for the clinician in deciding whether to perform prostate biopsy, a procedure which has significant implications to both patients and healthcare systems [14,15].The Likert and PI-RADSv2 structured reporting systems demonstrate good diagnostic performance with high cancer detection rates but are conceptually different [16][17][18].The Likert scale allows the radiologist to give an overall impression of a report that is not lesion specific, permits for the experience of the radiologist, and can include clinical and biochemical data [2].Comparatively, the PI-RADS incorporates fixed criteria in a sequential zone-specific image sequence-only approach [10].Whilst the cancer detection rate for the Likert scale was comparable to the PI-RADSv2 system in this study, a greater proportion of men were recommended for biopsy and more lesions were scored as indeterminate.The lack of specific criteria upon which the Likert scale derives its flexibility may have contributed to its reduced clinical utility when employed to report abbreviated bpMRI in a screening setting.
We showed the accuracy of PI-RADSv2 and Likert as determined by the AUROC to be 0.64 and 0.63, respectively.These figures are significantly lower than reported in our previous comparison of the two systems in secondary care populations where the AUROC for PI-RADSv2 was 0.77 and Likert was 0.70 [18].In IP1-PROSTAGRAM, we screened the general population with Prostagram MRI where the pre-test probability of csPCa was much lower compared to the at-risk population.Results from the IP1-PROSTAGRAM study demonstrate the potential better performance of Prostagram MRI over PSA-screening alone, but reducing unnecessary biopsy remains a key concern.Whilst the PI-RADSv2 was superior to Likert in this regard, nearly one fifth (18%) of the patients were still recommended for biopsy.An imaging-based screening pathway may still be feasible but alternate strategies combining MRI with PSA, biomarker combinations such as PSA density, or interval imaging pathways must be explored.In our study, increasing the biopsy threshold to a MRI score of ≥ 4 did reduce the biopsy recommendation rates for both Likert (from 23% to 9%) and PI-RADSv2 (from 18% to 11%) whilst maintaining adequate cancer detection rates.
This study has some limitations.First, as a secondary but planned analysis from a completed trial it is underpowered to distinguish differences in cancer detection between the PI-RADS and Likert reporting systems.Despite this, and to our knowledge, it is the first prospective study that directly compares Likert and PI-RADS for detection of PCa in a population screened with bpMRI.It should form the basis of future studies aiming to identify the most acceptable trade-off between cancer detection and biopsy reduction in prostate screening with MRI.Second, PI-RADS was designed to identify cancer based on the presence of a dynamic contrastenhanced sequence, which is absent in bpMRI.There is no current consensus on the optimal scoring method of MRI in the absence of intravenous contrast medium although alternative scoring systems have been proposed [19].The definitive study aiming to determine the clinical utility of bpMRI compared to mpMRI (IP7-PACIFIC randomised controlled trial, NCT05574647) is currently underway.

Conclusion
Using a Prostagram MRI for screening in the community, the PI-RADSv2 and Likert scoring systems were moderately accurate, although the Likert scale led to a greater number of men undergoing biopsy without increasing PCa detection rates.To improve reporting of Prostagram MRI, either a modified PI-RADS or Likert scale or a standalone scoring system should be developed.

Fig. 2 Fig. 3
Fig. 2 Venn diagram illustrating number of men with positive bpMRI by scoring system.

Table 1
Baseline characteristics of the cohort.

Table 2
Screen positivity for PI-RADS vs Likert scores on a per-score basis.

Table 3
Per-score comparison of detection rates of csPCa and ciPCa between PI-RADS and Likert scales.