In recent years, the use of positron emission tomography (PET) has become widespread for the staging and follow-up of several malignancies. In the current study, the authors conducted a metaanalysis of the published literature to evaluate the diagnostic performance of 18F-2-deoxy-2-fluoro-D-glucose PET (FDG-PET) in the staging of patients with lymphoma.
The authors conducted a systematic MEDLINE search of articles published between January 1995 and June 2004. Studies that evaluated FDG-PET with a dedicated camera and that reported sufficient data to permit the calculation of sensitivity and specificity were included in the analysis. Two reviewers independently reviewed the eligibility of the studies and abstracted data (sample population; characteristics of FDG-PET; and the number of true-positive results, true-negative results, false-positive results, and false-negative results). The authors estimated the pooled sensitivity, false-positive rate, and maximum joint sensitivity and specificity.
Twenty studies were eligible for the metaanalysis. Fourteen studies included patient-based data, comprising a sample size of 854 subjects, and 7 studies included lesion-based data, totaling 3658 lesions. Among those studies with patient-based data, the median sensitivity was 90.3% and the median specificity was 91.1%. The pooled sensitivity was 90.9% (95% confidence interval [95% CI], 88.0–93.4) and the pooled false-positive rate was 10.3% (95% CI, 7.4–13.8). The maximum joint sensitivity and specificity was 87.8% (95% CI, 85.0–90.7). The pooled sensitivity and false-positive rate appeared to be higher in patients with Hodgkin disease compared with those with non-Hodgkin lymphoma.
In the U.S., lymphoma is a common cancer. Non-Hodgkin lymphoma is 1 of the 10 leading cancers diagnosed in the U.S., and in general is reported to have a worse prognosis than Hodgkin disease. The survival rates of Hodgkin disease and non-Hodgkin lymphoma reportedly vary widely by cell type and stage of disease.1 Improvements in staging methods and the monitoring of patients can significantly improve the prognosis of patients with lymphoma. In recent years, the use of positron emission tomography (PET) has become widespread for the staging and follow-up of several malignancies.2, 3 The use of 18F-2-deoxy-2-fluoro-D-glucose PET (FDG-PET) in the staging of lymphoma patients offers advantages over other conventional imaging techniques. FDG-PET provides information regarding the metabolic activity of tumors that can complement the anatomic information provided by other imaging methods and, because FDG-PET can survey the entire body in a single scan, it can be particularly useful for determining the extent of the disease.
Recent studies have indicated that FDG-PET is an accurate method for the staging of patients with lymphoma.4–7 In addition, a study evaluating the impact of FDG-PET on the staging and management of patients from the clinician's perspective showed that PET results led to changes in the clinical stage in 44% of patients8; 21% of the patients were upstaged and 23% were downstaged. Furthermore, changes between treatment modalities (i.e., from surgery to radiation therapy) were reported in 42% of the patients. The purpose of the current study was to conduct a systematic review of the published literature to evaluate the diagnostic accuracy of FDG-PET in the staging of lymphoma, to address whether the diagnostic accuracy is similar for Hodgkin disease and non-Hodgkin lymphoma, and to evaluate the impact of FDG-PET on patient management.
MATERIALS AND METHODS
Data Sources and Eligibility
Published studies of the accuracy of FDG-PET in the staging of lymphoma were identified by systematic searches of MEDLINE, supplemented by a manual search of the references listed in original and review articles. Searches included the following keywords: lymphoma OR Hodgkin disease OR non-Hodgkin's lymphoma, staging, positron emission tomography, sensitivity, specificity, diagnostic accuracy, and test performance. Searches were limited to the period between 1995 and June 2004, and were performed with the assistance of a professional librarian.
Eligibility criteria included the use of FDG, dedicated PET camera, and sufficient data to allow for the calculation of sensitivity and specificity. Studies were not excluded based on sample size or the language of the publication. Because the validity of the individual studies may affect the interpretation of a metaanalysis, we adapted the criteria for study quality reported by Gould et al.9 and the Society of Nuclear Medicine Guidelines for performing FDG-PET studies.10 The criteria for assessing study quality are listed in Table 1.
Positive test results defined according to specific criteria
Technical quality and application of the reference test or tests
Description of reference standard
Independence of test interpretation
FDG-PET readers blinded to the results of the reference test or tests
Clinical characteristics of the study sample described
Age, gender, and number of patients enrolled, reason for performing PET
Participants enrolled prospectively
Individual patient used as unit of data analysis
Information extracted from each study included authors; year of publication; sample size; age of subjects; reference standard; unit of analysis (patients or lesions); technical characteristics of PET; use of attenuation correction; method of image interpretation (qualitative or quantitative); and the number of true-positive results, false-positive results, true-negative results, and false-negative results. Data were extracted by two of the investigators (P.L. and C.I.) and any differences were resolved by consensus. Data abstraction was not blinded to authors, institution, or source of publication.
We calculated the true-positive rate (sensitivity) and the false-positive rate (1–specificity) for each study. In addition, we estimated the summary (pooled) true-positive rate (sensitivity) and false-positive rate (1-specificity). Summary receiver operating characteristics (sROCs) were computed using random effects methods.10 We calculated the maximum joint sensitivity and specificity, Q*, as a global measure of diagnostic accuracy (the point on the sROC curve at which the sensitivity and specificity are equal).11, 12 The maximum joint sensitivity and specificity has a similar interpretation to the area under the ROC curve, and its values range from 0.5 (no diagnostic value) to 1.0 (perfect test). The presence of heterogeneity between studies was examined using the chi-square test. Several subgroup analyses were performed to explore the presence of heterogeneity: the use of attenuation correction, visual interpretation of scans, whole-body scans, and blinding among other characteristics. The effect of study characteristics on parameters of diagnostic accuracy was evaluated using regression methods. Furthermore, because studies reporting only lesions as the unit of analysis may bias the estimates of diagnostic accuracy, we conducted separate analyses for studies with patient-based data and those with lesion-based data. The presence of publication bias was evaluated with funnel plot and the Begg test. All data analyses were performed using Stata software (StataCorp, College Station, TX).13
The literature search yielded 47 potentially relevant studies, 27 of which were excluded. The reasons for exclusion were insufficient data (n = 15 studies),14–28 use of a coincident gamma camera (n = 5),29–33 and no evaluation of staging performed (n = 7).34–40 Twenty studies were eligible for inclusion in the metaanalysis. Of these, 13 studies reported patient-based data, 6 studies reported lesion-based data, and 1 study reported both. For the metaanalysis, the studies with patients as the unit of analysis comprised a total sample size of 854 subjects. The studies with lesions as the unit of analysis totaled 3658 lesions.
The sample size of the 20 studies ranged from 15–93 subjects (median, 50.5 subjects) (Table 2). The age of the subjects ranged from 7–90 years. The percentage of male participants in these studies ranged from 44.6–67.8% (median, 55.6%). Five of the studies included only patients with Hodgkin disease, 3 studies enrolled only patients with non-Hodgkin lymphoma, and 12 studies included patients with both Hodgkin disease and non-Hodgkin lymphoma. Among the studies including both Hodgkin disease and non-Hodgkin lymphoma patients, the percentage of Hodgkin disease patients included in the sample ranged from 6.5–70% (median, 47%). Among the studies including non-Hodgkin lymphoma patients, 13 reported the histological grade: 6 studies included patients with low-grade, intermediate-grade, and high-grade lymphoma; 6 studies included patients with low-grade and high-grade lymphoma; and 1 study included only patients with low-grade lymphoma. Seven studies reported that FDG-PET was performed as part of the staging workup. One study had histologic results as the reference standard, 6 studies used clinical follow-up, and 13 studies used both histologic results and clinical follow-up as the reference standard. Among the studies using clinical follow-up, 10 indicated the follow-up period, which ranged from 3–72 months. Thirteen of the 20 studies compared FDG-PET with other imaging methods (computed tomography [CT] in 4 studies, gallium67 in 2 studies, C11 methionine-PET in 1 study, PET/CT in 1 study, CT plus ultrasound in 1 study, bone scan in 1 study, and “conventional imaging” in 3 studies).
Table 2. Characteristics of Studies evaluating FDG-PET for the Staging of Patients with Lymphoma (January 1995–June 2004)
Among the 20 eligible studies, 7 were performed prospectively. Fasting was reported in 19 studies, and the fasting period was reported in 18 of these 19 studies. In 13 of the studies the fasting period was 6 hours or less, and was 4 hours or less in 5 studies. Eight studies reported measuring glucose levels prior to PET, and seven of these studies indicated that hyperglycemic patients were excluded. The dose of FDG and the uptake period were reported in 17 of the 20 studies. The reported uptake period ranged from 15–90 minutes. The acquisition time was reported in 16 of the 20 eligible studies. Fifteen studies reported using attenuation correction, and in 11 of these studies attenuation was performed in the entire sample population. The method of image reconstruction was reported in all studies but 1, with 11 studies reporting the use of an iterative reconstruction method, 5 studies used a filtered-back projection method, and 4 studies reported using both methods.
Readers were blinded to the results of the reference standard in 12 of the 20 studies, and 4 studies did not specify whether readers were blinded. The definition of a positive PET scan was clearly stated in 17 studies. Scans were interpreted visually in 15 of the 20 eligible studies.
Ten studies reported that the PET findings led to changes in the staging of patients. The percentage of patients who were upstaged ranged from 7.7–17.4% (median, 13.2%), and the percentage of patients who were downstaged ranged from 2.3–23.4% (median, 7.5%). Six of the 20 eligible studies reported changes in patient management as a result of PET findings.
Diagnostic Accuracy of FDG-PET
Among the studies with patient-based data, the median sensitivity was 90.3% (range, 70.6–100%) and the median specificity was 91.1% (range, 50–100%) (Table 3). The summary (pooled) true-positive rate (sensitivity) was 90.9% (Table 4) and the summary false-positive rate was 10.3%. The maximum joint sensitivity and specificity, a global measure of diagnostic accuracy, was 87.8%. The test for homogeneity indicated the absence of statistical heterogeneity. One of the studies reported a very low specificity (50%)41 and another study demonstrated low sensitivity (70%).42 The study with the lowest specificity included only patients with Hodgkin disease, and found two true-negative results and two true-positive results. The study with low sensitivity evaluated PET in the detection of bone marrow involvement only, and found 12 true-positive results and 5 false-negative results. When these 2 studies were excluded from the analysis, the pooled sensitivity and the maximum joint sensitivity and specificity increased to 91.8% and 89.6%, respectively, whereas the false-positive rate decreased to 9.5%. Figure 1 presents the sROC after the exclusion of these outliers.
Table 3. True-Positive Rate and False-Positive Rate of FDG-PET in the Staging of Patients with Lymphoma: Patient Based Data
Excluding studies with lowest sensitivity and lowest specificity
Excluding study with lowest specificity
Nine studies provided enough information to conduct a separate metaanalysis for Hodgkin disease5, 41, 43, 44, 45, 46 and non-Hodgkin lymphoma patients6, 44, 45, 47, 48 (Table 4). Among patients with Hodgkin disease, the median sensitivity and specificity were 93.2% (range, 85.7–100%) and 87.7 (range, 50–100%), respectively. The pooled sensitivity and false-positive rate were 92.6% and 13.4%, respectively, and the maximum joint sensitivity and specificity was 89.4%. Among patients with non-Hodgkin lymphoma, the median sensitivity and specificity were 87.5% (range, 81.5–97.6%) and 93.8% (range, 80–100%), respectively. The pooled sensitivity and false-positive rate were 89.4.% and 11.4%, respectively, and the maximum joint sensitivity and specificity was 85.0%.
Among the studies with lesion-based data, the median sensitivity was 96.6% (range, 92.1–99.3%) and the median specificity was 99.1% (range, 33.3–100%) (Table 5). The pooled estimates of the true-positive rate and the false-positive rate were 95.6% and 1%, respectively. The maximum joint sensitivity and specificity was 95.6% (Table 4). The test of homogeneity indicated the presence of statistical heterogeneity. One of the studies was found to have very low specificity41; when this study was excluded from the analysis, the pooled estimates of diagnostic accuracy did not vary, but the heterogeneity disappeared.
Table 5. True-Positive Rate and False-Positive Rate of FDG-PET in the Staging of Patients with Lymphoma: Lesion Based Data
The effect of the study design characteristics on the parameters of diagnostic accuracy were explored in subgroup analysis and by regression methods. The pooled estimates of the true-positive rate and the false-positive rate were found to be higher when the readers were not blinded and when the studies were conducted retrospectively (Table 6). In addition, the false-positive rate was higher when PET was performed as part of the staging workup compared with studies in which patients underwent PET scans because of equivocal findings. Using regression methods, the method of scan interpretation (visual vs. nonvisual), blinding, and the reason for patient referral to PET scanning were found to be significant predictors of the true-positive rate. Blinding and the reason for PET referral were found to be significant predictors of the false-positive rate (Table 7). The funnel plot and the Begg test did not indicate the presence of publication bias.
Table 6. Summary True-Positive Rate and False-Positive Rate by Study Characteristic in Patient Based Dataa
PET referral (PET as part of staging work-up vs. equivocal findings/other reasons)
False-positive rate (1 specificity)
Study design (prospective vs. retrospective)
Interpretation of scans (visual vs. non-visual
Reference standard (clinical follow-up only vs. pathology plus clinical follow-up)a
Blinding (readers blinded vs. no blinding)
PET referral (PET as part of staging work-up vs. equivocal findings/other reasons)
The results of this metaanalysis indicate that FDG-PET has a high diagnostic accuracy for the evaluation of staging and restaging in lymphoma patients. The summary true-positive rate (sensitivity) was 91% and the summary false-positive rate was 10% using a patient-based analysis, whereas the joint maximum sensitivity and specificity was 88%. The summary true-positive rate and summary false-positive rate were found to be slightly higher in patients with Hodgkin disease compared with patients with non-Hodgkin lymphoma. Despite the clinical relevance of the number of patients whose FDG-PET results led to changes in the staging of disease and its management, not all the studies addressed this finding. Among the studies reporting changes in staging, between 8–17% of the patients were upstaged, whereas 2–23% were downstaged. Changes in management were reported in 30% of the studies. Similar to what has been reported previously in the literature,9 the pooled parameters of diagnostic accuracy were higher in the lesion-based analysis than in the patient-based analysis.
Treatment decisions in patients with lymphoma are made based on the stage of the disease; patients with Stage I or II disease (according to the Ann Arbor/Cotswold classification) would receive short courses of chemotherapy whereas patients with more advanced stage disease would be eligible for more extended courses of chemotherapy and radiotherapy.49, 50 Therefore, the accuracy of staging procedures is critical for such decisions. In this way, FDG-PET represents a valuable addition to the staging procedures already available. To our knowledge few studies have been published to date comparing FDG-PET with other imaging or staging procedures with which to perform a concurrent metaanalysis. However, these studies suggest a relative advantage for FDG-PET over other imaging modalities. Stumpe et al.44 compared FDG-PET to CT for the staging and restaging of patients with Hodgkin disease and non-Hodgkin lymphoma. In the study by Stumpe et al., FDG-PET was found to have a higher sensitivity and specificity than CT in both Hodgkin disease and non-Hodgkin lymphoma patients; the overall accuracy for FDG-PET was 94% in patients with Hodgkin disease and non-Hodgkin lymphoma, whereas the overall accuracy of CT was 60% in patients with Hodgkin disease and 73% in patients with non-Hodgkin lymphoma. Similarly, Freudenberg et al.4 reported a higher sensitivity and specificity for FDG-PET when compared with CT in the restaging of lymphoma patients. The accuracy of FDG-PET was reported to be 95% and the accuracy for CT was 84%. Their study also evaluated the diagnostic accuracy of FDG-PET in combination with CT, and found an improvement in sensitivity and specificity over each imaging modality alone.
There are several potential limitations to conducting a metaanalysis of diagnostic tests. The presence of clinical heterogeneity (heterogeneity originated by the inclusion of patients at different stages of disease and other clinical characteristics) affects the generalizability of the results51, 52 and it is not necessarily ruled out by the lack of statistical heterogeneity.53 It is important to note that the majority of the studies included a mix of patients with Hodgkin disease, non-Hodgkin lymphoma, and non-Hodgkin lymphoma from different cell types. Furthermore, due to the nature of this disease, biopsy results were available in only a few studies; the majority had to rely on clinical follow-up, including a variety of imaging modalities and clinical examinations, not all of which were performed in the same manner in all the studies. The use of an imperfect reference standard, together with variability in the quality of the primary studies, introduces important limitations for the interpretation of this metaanalysis. In addition, the verification bias potentially present in the primary studies cannot be fully addressed in a metaanalysis. Nevertheless, despite these limitations, metaanalytic techniques have been very useful for demonstrating the significant role of FDG-PET imaging in the diagnosis and staging of several malignancies.54–57
The results of this metaanalysis suggest that the diagnostic accuracy of FDG-PET may be higher in patients with Hodgkin disease than in those with non-Hodgkin lymphoma. The summary sensitivity and the joint maximum sensitivity and specificity were found to be higher among patients with Hodgkin disease; however, the false-positive rate also was higher in this group compared with non-Hodgkin lymphoma patients. These findings should be interpreted with caution because they are based on a small number of studies. Conversely, non-Hodgkin lymphoma is a highly heterogeneous disease that includes a large series of different entities; therefore, the diagnostic accuracy of FDG-PET may differ within the group of patients with non-Hodgkin lymphoma. Unfortunately, there was not enough information available in the primary studies to address this issue in the current metaanalysis. In recent years, there has been an increased trend toward the use of PET/CT as a routine procedure, and it has been suggested that this practice will improve the sensitivity and specificity of PET. However, the use of PET/CT was not addressed in the current metaanalysis because of a lack of currently available data.
The results of the current metaanalysis demonstrated that FDG-PET is a very accurate imaging modality for the staging and restaging of patients with lymphoma, with a high sensitivity and high specificity reported. Clinicians should consider adding FDG-PET to the routine staging workup of patients with lymphoma.
The authors thank Ms. Karen Sorensen for her assistance with the MEDLINE searches.