Bone marrow biopsy superiority over PET/CT in predicting progression‐free survival in a homogeneously‐treated cohort of diffuse large B‐cell lymphoma

Abstract Several studies have reported uneven results when evaluating the prognostic value of bone marrow biopsy (BMB) and PET/CT as part of the staging of diffuse large B‐cell lymphoma (DLBCL). The heterogeneity of the inclusion criteria and not taking into account selection and collinearity biases in the analysis models might explain part of these discrepancies. To address this issue we have carried a retrospective multicenter study including 268 DLBCL patients with a BMB and a PET/CT available at diagnosis where we estimated both the prognosis impact and the diagnostic accuracy of each technique. Only patients treated with R‐CHOP/21 as first line (n = 203) were included in the survival analysis. With a median follow‐up of 25 months the estimated 3‐year progression‐free survival (PFS) and overall survival (OS) were 76.3% and 82.7% respectively. In a multivariate analysis designed to avoid a collinearity bias with IPI categories, BMB‐BMI [bone marrow involvement](+) (HR: 3.6) and ECOG PS > 1 (HR: 2.9) were independently associated with a shorter PFS and three factors, age >60 years old (HR: 2.4), ECOG PS >1 (HR: 2.4), and abnormally elevated B2‐microglobulin levels (HR: 2.2) were independently associated with a shorter OS. In our DLBCL cohort, treated with a uniform first‐line chemotherapy regimen, BMI by BMB complemented performance status in predicting those patients with a higher risk for relapse or progression. In this cohort BMI by PET/CT could not independently predict a shorter PFS and/or OS.


Introduction
In the up-front workup of B-cell Non-Hodgkin lymphoma (B-NHL), evaluating the extension of the disease is crucial because of its impact on prognosis and management. Therefore, assessing bone marrow involvement (BMI), an extranodal site, is critical to establish stage and therefore outcome [1,2].
The impact of both 18-F fluorodeoxyglucose (FDG) positron emission tomography with computed tomography (PET/CT) and bone marrow biopsy (BMB) in evaluating BMI in newly diagnosed B-cell high grade lymphoma patients is at the moment a matter of intense debate [3][4][5]. Given the lack of a "gold-standard" to examine the most accurate test to assess BMI at diagnosis, the prognostic value of each technique emerges as a key factor to determine the correct management at baseline. Recently, several studies have reported different results when evaluating the prognostic value of BMB and PET/ CT in the staging of diffuse large B-cell lymphoma (DLBCL) [6][7][8][9]. The retrospective nature of the studies, the heterogeneity of the inclusion criteria, and not taking into account the collinearity/multicollinearity of variables in the multivariate regression analysis might explain part of these discrepancies.
We have extended our previously reported DLBCL multicenter series with the aim of further evaluating the impact of both techniques on progression-free and overall survival, refining the assessment criteria, and re-checking accuracy [10].

Patient populations
Two hundred and sixty-eight consecutive patients with a diagnosis of DLBCL between August 2007 and June 2015 were retrospectively enrolled in this study from 12 tertiary centers of Spain. Patients 18 years old and older were included in the study if both BMB and PET/ CT were performed simultaneously (time interval between both procedures less than or equal to 30 days) as part of the routine pre-therapy staging for newly diagnosed DLBCL. Patients had received neither chemotherapy nor corticosteroids and no concomitant malignancy was known to be present at the time of both procedures. Pathology and PET/CT results were unknown to each other specialist.

Bone marrow biopsy
So far, in Spain, unilateral posterior iliac crest trephine biopsy and marrow aspirate are recommended pretherapeutically in newly diagnosed patients with NHL according to GELTAMO guidelines.
BMBs were evaluated by an experienced hematopathologist in each hospital; results were obtained from the individual reports and were not reviewed thereafter. Data from bone marrow aspirate, flow cytometry, or molecular analysis were not used for the analysis in this work.

PET/CT imaging and analysis
PET/CT studies were carried out using the following PET/ CT devices: Gemini TF64, Gemini GXL, and Gemini TF16 (all three Gemini devices from Philips), Discovery LS (GE Healthcare), and either Biograph mCT 20 Flow, Biograph TP16 and Biograph 6 (the last three from Siemens). Procedure, quality control, and interpretation guidelines are commented in detail in our previous work [10]. Briefly, the low-dose CT components of the PET/CT were used for both co-localization and attenuation correction of the PET emission data.
Coronal, sagittal, and transversal PET/CT projections were reconstructed by iterative methods and analyzed using the manufacturers' software. Image interpretation was performed by qualitative (visual) analysis, considering the presence or absence of BMI using glucose activity in the liver as a reference. If present, bone marrow lesions were characterized as focal lesions or diffuse uptake exceeding that of the liver. Semiquantitative analysis was performed by means of the maximum standardized uptake value (SUVmax), normalized to body weight, as the voxel with maximum uptake in a region or volume of interest.

Data analysis and ethics
Overall survival (OS) was defined as the time from diagnosis to death of any cause or the last follow-up date.
>60 years old (HR: 2.4), ECOG PS >1 (HR: 2.4), and abnormally elevated B2-microglobulin levels (HR: 2.2) were independently associated with a shorter OS. In our DLBCL cohort, treated with a uniform first-line chemotherapy regimen, BMI by BMB complemented performance status in predicting those patients with a higher risk for relapse or progression. In this cohort BMI by PET/CT could not independently predict a shorter PFS and/or OS. Progression-free survival (PFS) was defined as the time from diagnosis to first relapse or progression, death of any cause or last follow-up date.
OS and PFS were analyzed using the Kaplan-Meier estimate survival curves with the log-rank test according to either BMB or PET/CT bone marrow status. Multivariate Cox proportional hazards regression models were performed including sex, B2microglobulin upper limit of normal (>2.5 mg/L), presence of bulky mass (≥10 cm), BMB and PET/CT bone marrow status, and IPI factors [age, LDH upper limit of normal (>250 UI/L), ECOG performance status (>1 score), and number of extranodal sites and Ann Arbor stage assessed by PET/CT and BMB] [11]. To avoid collinearity in the Cox models, we deconstructed the IPI composite by not considering the BMI in the score of the extranodal site and Ann Arbor Stage categories. We used a P value threshold of <0.150 in the univariate analysis for a factor to enter the multivariate model, where P values less than 0.05 were considered significant.
The final diagnosis of BMI by lymphoma was achieved in cases of positive PET/CT or positive BMB [6]. No guided biopsies were considered for the analysis. All performance tests were calculated as previously stated [10]. Statistical analysis was done using SPSS software (IBM SPSS Statistics 21, IBM Corporation, Chicago, IL) and Epidat 3.1 (http://dxsp.sergas.es). Data were obtained by review of medical records. This study was approved by the Hospital Morales Meseguer Ethics Committee and thereafter by each of the involved hospitals.

Patient characteristics
A total of 268 DLCBL patients were initially identified from all institutions. Performance of BMB and PET/CT was assessed in this population (see below). Main characteristics at baseline are referred in Table 1.
The remaining 203 DLBCL patients, homogeneously treated with R-CHOP/21 showed a median age at diagnosis of 61 years old (range 18-85), with a balanced gender distribution (102 females/101 males). They received a median number of 6 cycles (range 1-8). Twenty-eight patients (13.8%) had an IPI score of 4-5. Approximately, three-quarters of patients (74.9%) were scored as stage III or IV according to Ann Arbor classification and 131 (64.5%) had elevated lactate dehydrogenase (LDH). The baseline characteristics of patients included in the global series and the subgroup who were treated with R-CHOP/21 as first-line treatment are shown in Table 1.

Performance of PET/CT and BMB
In the global series (N = 268), 35 patients (13.1%) had BMI on BMB, whereas 60 (22.4%) had BMI according to PET/CT findings. Seventy-one patients (26.5%) had BMI according to either BMB or PET/CT. Concordant BMI by means of both techniques was present in 24 (8.9%) patients. In Table 3, we specified the distribution of uptake according to BMB and PET/CT in the global series and in the R-CHOP/21 cohort. In patients with BMI according to PET/CT, 15 (5.6%) had a diffuse pattern and 45 (16.8%) focal lesions. Among the 60 PET/ CT positive patients, the median maximum standard uptake value (SUVmax) was 9.5 (p25-p75; 5.9-13.5); out of these, 45 and 15 showed a focal [median SUVmax, 9.8  (Table 1). Within these 60 patients, 3 had SUVmax ≤ 4 with focal uptake and 1 with a diffuse pattern.
Among the 36 patients with BMI by PET/CT and negative BMB, 2 with diffuse and 12 with focal uptake were upstaged from stage II to stage IV, respectively, among the 11 patients with negative PET/CT and positive BMB, 5 were upstaged from stage II to stage IV.

Discussion
In the setting of B-NHL, and particularly in the DLBCL category, the use of PET/CT and/or BMB for staging purposes is presently one of the most debated issues as different strategies have been proposed according to the divergent results of a number of distinct series or metaanalysis [3][4][5][6][7][8][12][13][14][15], or experience-based recommendations from experts in the field [4,5,[16][17][18][19][20]. With an endeavor to avoid susceptibility and collinearity biases in our study design, we show here the superiority of BMB over PET/CT in predicting PFS in a homogeneously-treated cohort of 203 DLBCL patients.  Diagnostic performance has been considered an argument to abrogate the routine use of BMB in this setting due to a claimed superiority of PET/CT. However, we and others have shown before that the performance of PET/CT in terms of sensitivity and accuracy was not as high as stated in other series [10,12,13]. To this regard, it has been proposed that this could rely in histologically unproven cases that could have been inappropriately considered true positives [12,19,21]. Our present results regarding accuracy are very close to our previous ones and reinforce the suboptimal performance of both PET/ CT and BMB. Regarding the raw numbers relating stage, 11 out of 268 DLBCL patients would have been understaged if using only PET/CT and this would have been the case for 36 patients if only BMB would have been used.
One consistent criticism regarding previous works addressing only diagnostic performance, including ours, was that independently of accuracy, PET/CT did not impact survival [3,5,19]. However, in our view, this criticism was based on conflicting data as a number of papers have shown that lack of prognostic value [6,7], whereas others did find it [8,9]. Interestingly, the largest series study in the setting of DLBCL so far sustains the abandon  of the routine use of BMB based on the impact of PET/ CT on both OS and PFS, and the unlikelihood of BMB upstaging or driving clinically significant changes [9]. The inconsistency of data in the current literature can be explained by some methodological issues. A multivariable analysis is the most favored approach when assessing associations between risk factors and disease end points. However, the efficiency of multivariable models depends on the correlation architecture among potentially predictive variables. The situation in which what a regressor explains about the response is overlapped by what another regressor does is called collinearity, which ultimately leads to biased estimations [22]. Adding BMI by BMB and/or PET/CT to all the component clinical indicators from the IPI in a multivariate model would inarguably suffer from this bias, as BMI is a part of the extranodal site and Ann Arbor Stage IPI clinical indicators. To our knowledge, only Khan et al. and Berthet et al. have addressed this building-block bias by studying the impact of BMI on prognosis exclusively in stage IV patients and separately assessing each IPI factor prognostic value, respectively [6,8]. We chose to "deconstruct" the IPI composite by not considering the BMI in the extranodal site and Ann Arbor Stage categories. This is, in our view, the most accurate procedure for a correct assessment of the prognostic weight of each BMI detection technique.
We recognize that our current results have modified our view regarding the up-front workup in the DLBCL setting. Though in our previous work, and in line with others [12,17], we suggested that a positive PET/CT could abrogate the need for a BMB, according to our present results, we now believe that BMB should be performed in all DLBCL patients at baseline. In the absence of a feasible gold standard for a methodologically indisputable assessment of the sensibility and specificity of both techniques, we now agree with Adams, et al. when stating that the lack of prognostic implications of bone marrow involvement detected by FDG-PET/CT strongly suggests there is a considerable proportion of false-positive FDG-PET/CT cases in patients with DLBCL [23]. In addition,  it could seem contradictory that BM involvement based on BMB had prognostic impact on PFS but not on OS. However, in our view, fewer end-point events and the inclusion of all causes of death could account for this apparent contradiction.
We acknowledge that our study design (multicenter series and retrospective nature) confers both strength and weakness to our study. We have not discriminated between germinal-like and activated subtypes, an issue than sooner or later should be addressed in this setting. Nevertheless, the coherence of the present data with our previous work and others′ in terms of performance suggests that our results reflect "real life" clinical practice. In addition, it could be argued a relative heterogeneity regarding PET/ CT procedures, particularly the lack of a unique strict protocol. In this regard, it is of note that all the NM units involved in this study are carrying out PET/CT lymphoma staging since, at least, 2006 and that analysis has been performed in all cases by experienced senior NM specialists following the European Association of Nuclear Medicine recommendations for visual and semiquantitative analysis of tumor PET imaging as we stated before. In this regard, it must be underlined that there are still no guidelines or agreement for the up-front baseline studies of DLBCL, as stated by the uneven recommendations stated in the Lugano meeting in 2015 [16,17], and the updated NCCN guidelines.
In summary, in our DLBCL cohort, treated with a uniform first-line chemotherapy regimen, BMI by BMB complemented IPI in predicting those patients with a higher risk for relapse or progression, while IPI defined a subset of patients with a worse survival. In this cohort, BMI by PET/CT could not independently predict for a shorter PFS and/or OS. At the present moment, both techniques seem to offer different and complementary clinical information.