To determine the reproducibility of 3D proton magnetic resonance spectroscopic imaging (1H-MRSI) of the human prostate in a multicenter setting at 1.5T.
To determine the reproducibility of 3D proton magnetic resonance spectroscopic imaging (1H-MRSI) of the human prostate in a multicenter setting at 1.5T.
Fourteen subjects were measured twice with 3D point-resolved spectroscopy (PRESS) 1H-MRSI using an endorectal coil. MRSI voxels were selected in the peripheral zone and combined central gland at the same location in the prostate in both measurements. Voxels with approved spectral quality were included to calculate Bland–Altman parameters for reproducibility from the choline plus creatine to citrate ratio (CC/C). The repeated spectroscopic data were also evaluated with a standardized clinical scoring system.
A total of 74 voxels were included for reproducibility analysis. The complete range of biologically interesting CC/C ratios was covered. The overall within-voxel standard deviation (SD) of the CC/C ratio of the repeated measurements was 0.13. This value is equal to the between-subject SD of noncancer prostate tissue. In >90% of the voxels the standardized clinical score did not differ relevantly between the measurements.
Repeated measurements of in vivo 3D 1H-MRSI of the complete prostate at 1.5T produce equal and quantitative results. The reproducibility of the technique is high enough to provide it as a reliable tool in assessing tumor presence in the prostate. J. Magn. Reson. Imaging 2012;35:166-173. © 2011 Wiley Periodicals, Inc.
REGARDLESS OF MANY IMPROVEMENTS in diagnosis and treatment of prostate cancer that were achieved in the last decades, it is still estimated to be the second leading cancer-related cause of death in American men in 2010 (1). Prostate cancer can be suspected with either a positive traditional digital rectal examination (DRE) and/or when high levels of prostate-specific antigen (PSA) are present in the blood. For conclusive evidence of tumor presence the histopathologic confirmation determined from a biopsy specimen is required.
In order to improve the noninvasive detection of cancer in the prostate, its location, heterogeneous extent, grade, and stage, several techniques have been and are being explored. Among these proton magnetic resonance spectroscopic imaging (1H-MRSI) in combination with conventional magnetic resonance imaging (MRI) has shown promising results in several single-site studies and in one multicenter study (2–6). However, another multicenter trial did not report a benefit of combined 1H-MRSI and MRI in the localization of prostate tumors (7). Lack of reproducibility data, a more qualitative rather than quantitative spectroscopic evaluation, and the lack of multicenter validation of the used technique obscured the exact reason for this outcome.
With 1H-MRSI 3D metabolic information is obtained in vivo in addition to the morphological data of MRI (8). In clinical MR spectroscopy, knowledge of metabolite levels, their distribution, and the variation within and between subjects is necessary to distinguish significant differences in metabolite levels within the prostate of one patient or between patients and healthy subjects. Metabolites of interest in the prostate are citrate (Ci), choline-containing compounds (Cho), creatine (Cre), and polyamines. The levels of some of these metabolites change in prostate cancer tissue; Ci and polyamine levels are reduced and Cho levels are elevated (9, 10). These changes are frequently combined in metabolite ratios to differentiate between benign and malignant tissue with 1H-MRSI. Both the ratio of Cho plus polyamines and Cre to Ci (CC/C) and Cho to Cre (C/C) can be quantified and were reported to be elevated in cancer (11). Because the spectral peaks of Cho (3.22 ppm), Cre (3.04 ppm), and polyamines (3.1 ppm) often overlap at 1.5T, quantification of the C/C ratio is not always possible.
To be of value in clinical diagnosis, the changes in metabolites as recorded by 1H-MRSI caused by cancer tissue have to be larger than the biological variation in metabolites in noncancer prostate tissue and larger than changes that occur due to imperfections of the measurement method (12). Moreover, to make 1H-MRSI generally applicable the technique should produce homogenous and reproducible results in comparable patient groups of different institutions.
Standardized threshold-based scoring systems that translate the metabolite ratios into a localized cancer likelihood score (range 1–5) have been developed to acknowledge the clinical relevance of differences in metabolite levels in the prostate (5, 13). To our knowledge, however, the measurement error of 1H-MRSI in the prostate has not been investigated yet to validate this approach.
As part of the IMAPS (International Multi-Centre Assessment of Prostate MR Spectroscopy) study, in which 1H-MRSI of the prostate at 1.5T was performed in a mulitcenter setting to assess its possibilities to distinguish between cancer and benign prostate tissue, one objective was to estimate the reproducibility of the technique. Data were acquired from different zones of the prostate from subjects suspected of prostate cancer and from healthy volunteers at four different centers using the same protocol. The reproducibility of this measurement technique was quantitatively evaluated based on the measurement error in CC/C ratios and qualitatively evaluated with a clinical scoring system based on both the CC/C and C/C ratio.
Institutional Review Board approval was obtained for all centers of the IMAPS trial and all participating subjects provided written informed consent to undergo two subsequent MRI and spectroscopy examinations with an endorectal coil at 1.5T. To cover a broad range of CC/C ratios, 11 subjects suspected of prostate cancer and three young healthy volunteers were included from four different institutions (Shanghai Changhai Hospital [SHA]: seven subjects; Ghent University Hospital [GENT]: four subjects; Radboud University Nijmegen Medical Center [RUNMC]: two subjects; Loma Linda University Medical Center [LL]: one subject). The age of subjects suspected of prostate cancer ranged from 33–78 years (mean age: 65 ± 12 years). These subjects did not undergo any therapy or treatment before the MR examinations. The median PSA level for the 11 subjects suspected of prostate cancer was 10.24 ng/mL (range 1.78–59.39 ng/mL). Mean time between MR measurements of the subjects was 45 days (range 1–161 days). The volunteers were age 20, 38, and 38 years. Time between MR measurements of the volunteers was respectively 12, 16, and 223 days.
MRI and spectroscopic imaging in all subjects was performed on 1.5T MR systems (Magnetom family; Siemens Healthcare, Erlangen, Germany) with a disposable endorectal coil (Medrad, Pittsburgh PA) for signal reception. Multislice T2-weighted fast spin-echo imaging was performed in three orthogonal planes (measurement parameters: TR = 4000 msec, TE = 129 msec, field of view [FOV] = 140 × 140 mm2, resolution = 0.7 × 0.5 × 4 mm). The axial slices were angulated perpendicular to the rectal wall and served as background images for 3D MRSI planning purposes.
Localized 3D 1H MR spectra of the entire prostate were obtained using a 3D PRESS (point-resolved spectroscopy) pulse sequence with slice-selective optimized 180° pulses (TE = 120 msec, TR = 650 msec) and weighted k-space acquisition (14). Water and lipid signals were suppressed with two dual-frequency selective refocusing pulses and crusher gradients (15). The MR signal was received by the endorectal coil only. Additional suppression of surrounding lipid tissues was achieved by seven manually placed saturation slabs. Nominal resolution of the spectroscopic voxels was 6 × 6 × 6 mm3, which was enlarged by apodization of k-space for accurate localization and decreased voxel bleed. The true voxel size could best be approximated by a sphere with a diameter of 10.7 mm and a volume of 0.64 cm3. Total acquisition time of the 1H-MRSI sequence was between 10 and 12 minutes, depending on the exact number of phase-encoding steps and averages. All measurements were performed by local experts of the techniques.
All data were anonymized and sent to one institute (RUNMC) for postprocessing and evaluation. A spectroscopist (8 years of experience), blinded to the MR spectra, assigned between 8 and 20 voxels of interest in the different prostate zones based on the axial T2-weighted images and an overlay of the MRSI voxel matrix. Voxels of interest had to be located either completely within the peripheral zone (PZ), within the central gland (CG, as a combination of the transition zone and central zone), or in the periurethral area (U) and had to lie at the same location in the repeated measurements. Care was taken not to choose neighboring voxels in the same tissue to prevent correlation from voxel-to-voxel signal overlap. As the apparent spatial resolution of the spectroscopic grid was much higher than the true spatial resolution, voxels largely overlap and regridding is not necessary to select a certain location in the prostate.
The spectra of all selected voxels were evaluated and quantified with the PRISMA software package (University of Bremen and Siemens Healthcare, Erlangen, Germany). This program fits the spectral data in the time domain with quantum-mechanically simulated model functions for the citrate, creatine, and choline signals (16). The curve fit was used to calculate the CC/C ratio by dividing the sum of the integrals of the fitted choline and creatine by the integral of the fitted citrate.
A visual quality control was performed by the spectroscopist, which consisted of inspection of the original spectrum together with the curve fit and residual plot produced by PRISMA. An incorrect curve fit would result in unreliable CC/C ratios. Voxels with a correct automatic frequency alignment of the resonances, without lipid signal contamination and baseline distortions around the resonances of interest, and minimal intensity in the residual plots passed this quality check.
Voxels with a CC/C ratio more than four standard deviations (SDs) above the mean normal value in noncancer prostate tissue can—according to a 5-step standardized clinical evaluation system for prostate spectroscopic imaging—be interpreted as definitely malignant (5, 13). When citrate levels decrease close to noise levels in the spectrum, the exact number of the CC/C ratio (with denominator close to zero) in these voxels may highly vary in the range beyond mean +4 SD. The exact CC/C ratios are clinically of no importance, since all these high numbers indicate malignant tissue changes. Therefore, a zone specific (PZ or CG) cutoff value of 5 SD above the mean noncancer value of the prostate tissue was used to truncate CC/C ratios that exceeded this value.
Data of the IMAPS study (99 patients) of noncancer prostate tissue (6) were used to define thresholds for the 5-step standardized scoring system for both PZ and CG. Since CC/C ratios in CG and U did not differ significantly, the thresholds determined for CG were also applied for voxels located in U. With this system, the CC/C ratios of the repeated measurements were assessed for their corresponding standardized clinical score. Adjustments to the score were made by the assessment of the C/C ratio if available, as described before (5) (Table 1). The percentage of agreement per clinical score of the first to the second measurement was calculated and compared to the percentage of chance agreement based on the average score distribution of the repeated measurements.
|Score and score definition||PZ CC/C ratio||CG CC/C ratio||C/C ratio adjustment|
|1: Definitely benign tissue||≤0.44||≤0.53||If C/C ratio ≥2, then: adjust 3 and 2 into 4.|
|2: Probably benign tissue||0.45-0.57||0.54-0.68|
|3: Possibly malignant tissue||0.58-0.70||0.69-0.83||If C/C ratio < 2, then: adjust 5 into 4 and 4 into 3.|
|4: Probably malignant tissue||0.71-0.83||0.84-0.98|
|5: Definitely malignant tissue||≥0.84||≥0.99|
Voxels with truncated CC/C ratios in both measurements were not included in the statistical analysis. If in only one measurement the CC/C ratio of a voxel exceeded the zone-specific cutoff value, the truncated value was used for statistical analysis. Of all selected voxels that passed the quality check in both measurements, maximal eight voxels were randomly selected per patient to reduce dependency of the statistical measures on number of voxels per patient. Of the remaining voxels the differences in CC/C ratio between the two separate measurements (Dk) were calculated and plotted against the mean of the two measurements in a Bland–Altman-type plot. A Kruskal–Wallis test was performed to check for significant differences in Dk among institutes.
To check for a relation between the difference and the mean, the Kendall rank correlation coefficient was calculated (17). The mean difference and SD of the differences (σdiff) were determined to indicate how well the first and second measurement agreed on average. The measurement error, defined as the within-voxel SD (σwithin) for all repeated measurements was calculated as (17):
where k is the voxel number and n is the total number of voxels. If the mean difference between the repeated measurements is zero, σdiff is times σwithin. The measurement error was calculated for all voxels together and compared to the between-subjects SD of noncancer PZ and CG tissues (σbetween, PZ and σbetween, CG) that were obtained in the complete IMAPS study (99 patients) (6).
Anatomical T2-weighted image slices and overlaying spectral maps from two measurements of the prostate of one subject illustrate the similarities of repeated measurements (Fig. 1). Both the shape and quality of the spectra and the CC/C ratios of most voxels in both MRSI matrices are highly comparable. The spectroscopist selected between 8 and 20 independent voxels per subject which met the inclusion criteria concerning the location, resulting in a total of 205 selected voxels. The spectra of 124 voxels (60%) in the first measurement and 144 voxels (70%) in the second measurement passed the quality control, 99 voxels (48%) passed the quality control in both measurements. Per subject, 1 to 15 voxels passed the checks, but the restriction of a maximum of eight voxels per subject resulted in 84 voxels selected for reproducibility analysis. Of these voxels, 33 were assigned to PZ, 44 to CG, and 7 to U. For all analyses, CG and U voxels were taken together, since in the complete IMAPS study no significant differences in CC/C were found between these two tissues.
In 10 voxels assigned to CG, the CC/C ratio was higher than the cutoff value in both measurements; these voxels were excluded from statistical analysis. In six voxels assigned to CG and two voxels assigned to U, the CC/C ratio was higher than the cutoff in one measurement, these eight voxels were included in statistical analysis. Due to truncation of the high values to the mean +5 SD CC/C of noncancer CG, the CC/C ratios ranged from 0.05–1.13 (median 0.38). For the 74 voxels remaining for statistical analysis the differences between the two repeated measurements per voxel were visualized in a Bland–Altman-type plot against the mean CC/C ratio of the voxels (Fig. 2). A weak but significant relationship was found between the differences and the mean CC/C of the two measurements (Kendall's τ = 0.21, P = 0.008). The mean difference was 0.004 and σdiff was 0.19. The measurement error for all voxels together, σwithin, was 0.13.
As a reference for these reproducibility numbers, mean CC/C and SDs of noncancer prostate tissue (σbetween, PZ and σbetween, CG) from the IMAPS study (99 patients) are given in Table 2 (6). For all voxels together, σwithin of the repeated measurements was equal to σbetween, PZ and slightly lower than σbetween, CG.
No significant differences in Dk seemed to exist between the institutes (P = 0.45, Fig. 3).
|Mean CC/C (noncancer, n=99)||0.31||0.41||0.37|
|σbetween (noncancer, n=99)||0.13||0.15||0.14|
CC/C ratios of the first versus the second measurement for peripheral zone (PZ) voxels and central gland (CG) voxels are depicted together with the zone-specific thresholds in Fig. 4. The clinical evaluation with the standardized threshold approach resulted in the following numbers (Fig. 5): 63 voxels (75.0%) were classified with the same standardized score in the repeated measurements, in 13 voxels (15.5%) the standardized score shifted by one, in two voxels by two (2.4%), and in six voxels by three units (7.1%). The percentage of agreement in clinical scores was the highest for an initial score of one (Table 3). The observed reproducibility in clinical score is well above chance agreements for all scores, which were calculated from the averaged observed clinical score distribution of the repeated measurements.
|No score difference||Score difference 0, ±1|
|1: Definitely benign||94.4||64.9||96.3||68.5|
|2: Probably benign||33.3||3.6||100.0||73.3|
|3: Possibly malignant||66.7||4.8||66.7||24.5|
|4: Probably malignant||37.5||16.1||81.3||31.6|
|5: Definitely malignant||37.5||10.7||75.0||26.8|
In this study we assessed the reproducibility of 3D 1H-MRSI of the prostate using an endorectal coil at 1.5T in healthy volunteers as well as subjects suspected of prostate cancer in a multicenter setting. A high reproducibility of calculated metabolite ratios is one of the prerequisites for widespread application of 1H-MRSI of the prostate in clinical practice. To detect metabolic changes caused by prostate cancer, the measurement error of the technique has to be smaller than these changes. We chose to calculate reproducibility on the basis of selected (quality-controlled) voxels at the same location in two examinations, rather than trying to analyze all voxels in every subject, to avoid bias because of largely overlapping voxels in every measurement. Only pure PZ or CG or U tissue voxels were selected to avoid partial volume effects from differences in metabolite ratios between these tissues.
It has been shown that prostate cancer is characterized by a combination of elevated choline, reduced citrate, and reduced polyamines (9). Different single-site MR spectroscopic imaging studies investigated these changes quantitatively and reported significant differences in CC/C between healthy and cancer tissue in the peripheral zone (8) and the transition zone of the prostate (11). Standardized clinical scoring systems for MRSI are based on the calculated CC/C ratios, with adjustments to the score according to the C/C ratio. These scoring systems utilize the variation in CC/C ratios in noncancer prostate tissue, expressed in SD, to define threshold values for five risk categories. With this definition, the SD-based thresholds do not present a definite percentage of likelihood, but do give an indication of tumor probability. Other MRSI studies discriminate benign and malignant tissue by defining one threshold value, which is also based on the mean and SD of CC/C ratios in noncancer prostate tissue (2, 18). Both methods resulted in moderate to high tumor localization accuracies of 67%–88% (2, 4, 13, 18).
As could be expected, this study showed that the mean difference between the two repeated measurements was nearly zero. The measurement error of prostate 3D 1H-MRSI at 1.5T determined in this study is almost equal to the SD of CC/C values of noncancer tissues of different patients, also obtained in a multicenter setting. This variation of the CC/C ratio in normal prostate tissue can therefore not solely be ascribed to biological variation, as might have been assumed in the prostate cancer discrimination studies using thresholds described above. It is not possible to conclude from this study which factor—measurement error or biological variation—predominates in the variation in noncancer tissue. Mean differences in CC/C between cancer and noncancer prostate tissues, however, are much larger than the measurement error or the SD of noncancer tissue (6, 8).
Our reported overall measurement error, or σwithin, is probably overestimated. Measurement errors in CC/C ratios could arise from a manual selection of the voxel location in repeated measurements. The weak positive relation between the individual measurement errors and the mean of the measurements leads to an inflation of σwithin. In spite of careful voxel selection, small subvoxel localization differences could still exist. If regional differences in the CC/C ratio are present, which is more likely to be the case in deviating prostate tissue with elevated CC/C (eg, cancer tissue), localization differences of voxels with larger CC/C explain the larger differences between the repeated measurements in these voxels. The actual values of CC/C ratios of smaller tumor foci with elevated CC/C will therefore be less reproducible in repeated measurements.
In this study it was assumed that no significant change in prostate tissue composition occurred between the two subsequent measurements. In subjects with less than a week between the two measurements this assumption is very plausible, but in subjects with quite some time between the measurements the assumption could not be verified without histopathological data, which is a limitation of this study. The within-voxel SD of four out of five patients with the longer measurement interval were well within the range of subjects with a short time between the measurements. Therefore, the overall within-voxel SD seems to represent the real measurement error of the technique and not a change in tissue composition.
Both σwithin and σbetween were obtained in a multicenter setting and did not significantly differ between the institutions, so our reproducibility results are not institution-dependent. Spectroscopic data were acquired of the complete prostate, and reproducibility in terms of σwithin was determined for all tissues together, covering the full range of CC/C ratios. We did not distinguish between cancer and noncancer voxels, as this was not relevant for testing the reproducibility of measuring a CC/C ratio, and because no localized histopathological gold standard was available for patients suspected of prostate cancer. Once spectra had passed the quality control, distance from the endorectal coil, and with that, signal-to-noise ratio (SNR) of the spectrum, should not make a difference any more in calculated CC/C ratios in two measurements.
Comparison of our results with clinical MR spectroscopy reproducibility studies of other organs is cumbersome, mainly because of the different nature of our study group. In this study we aimed to investigate reproducibility of 1H-MRSI in the prostate in general, independent of possible pathology. Other reproducibility studies investigate reproducibility in healthy tissue or in tissue with known malignancy or in both but separately. With our study group we described the complete range of biologically interesting CC/C values, without having spatial pathological information. The calculation of the intraclass correlation coefficient (ICC), a measure often used in other reproducibility studies, was not suitable for our mixed study group because of the large differences in CC/C ratios between subjects. The ICC, however, is a descriptive statistic that basically describes the same as what we assessed by comparing σwithin and σbetween. In general, other MR spectroscopic imaging reproducibility studies show highly reproducible results and consider the technique robust and reliable (19–24).
In the quantitative statistical analysis pairs of truncated CC/C ratios (at mean +5 SD), ie, with a difference of zero between the measurements, were not included to avoid artificial improvement of σdiff and σwithin. Without truncation, these ratio values could differ quite a lot, due to uncertainty in the very small denominator in the ratio. So, although not incorporated into σwithin, these values are clinically very important, since they all indicate the highest tumor probability.
The reproducibility of 1H-MRSI in the human prostate was additionally confirmed by the high percentage of selected voxels (90.5%), which obtained the same clinical score in the repeated measurements or a shift in score of only one. The highest reproducibility in clinical score originates from voxels categorized to score 1 in the first measurement (Table 3). The range in CC/C ratio in this category is larger than for the other scores, which makes it easier to reproduce by chance. However, the percentages of observed agreement for all clinical scores are far above the chance agreements.
From Fig. 4, in which the zone-specific CC/C ratios were combined with the zone-specific thresholds, several points can be identified that are close to a threshold line, sometimes resulting in a deviation in standardized score in the repeated measurements. It is clear that a small difference in CC/C between the two measurements due to biological variation or measurement error just around a threshold in the scoring system can result in a step change in clinical score of one. This might have clinical consequences, but does not reflect the real reproducibility of the technique itself. This can be seen as a shortcoming of the standardized scoring system and the effect would be most pronounced at the intermediate scores (2, 3, and 4), because the ranges in CC/C ratios covered in these scores are smaller than for scores 1 and 5. A clinical score change of more than one is likely due to small subvoxel localization differences in deviating prostate tissue with regional differences in CC/C ratio, as described above.
The C/C ratio was not quantitatively analyzed for reproducibility because in voxels representing healthy tissue the spectral resolution of 1.5T is not high enough to separate the Cho and Cre peaks from the polyamine resonance in between. In the reproducibility evaluation with the standardized clinical score system, however, the C/C ratio was taken into account if it was adequately determined.
Efficiency in terms of the number of voxels available for quantitative analysis can be less than in qualitative analysis, since an experienced spectroscopist can also identify suspicious patterns in data masked by relatively large noise, lipid contamination, or baseline distortions. Lipid contamination of the spectra might occur more frequently close to the endorectal coil and close to the border of the volume of interest. In our data 65% of the correctly located voxels had adequate quality for quantification analysis, which is better than reported with the introduction of the standardized scoring system (60%, peripheral zone only (13)), but worse than data from 3T analyzed in a similar way (74%, PZ and CG (25)) (5). In this study reproducibility in terms of agreement between readers was not investigated, but other quantitative prostate MRSI studies show a high interobserver agreement (5, 13).
In conclusion, this study assessed the reproducibility of 1H-MRSI of the prostate at 1.5T in two ways. By analyzing the data in both a quantitative and a more clinical way, the theoretical and the practical point of view in reproducibility assessment were covered. Differences in the CC/C value of repeated measures are equal to the SD of the CC/C value of noncancer tissues of a large patient population. As mean differences in CC/C between cancer and noncancer tissues are much larger than this SD, the reproducibility of 1H-MRSI at 1.5T supports the use of this technique as a reliable tool in assessing tumor probability in the prostate.
The authors thank Dr. Oyen (Leuven University); Prof. Bachert and Dr. Baudendistel (DKFZ); Dr. Broome (Loma Linda University); Mr. Chao, Dr. Haller, and Dr. van Velthoven (Bordet Institute); Dr. Lichy (University of Tübingen, currently at Siemens Healthcare); Dr. Praet (Ghent University, Medrad Inc., and the Belgian FWO WOG on Advanced NMR Applications).