A low total metabolic tumor volume independently predicts for a longer time to first treatment in initially observed, low tumor burden follicular lymphoma

Watchful waiting is an acceptable management strategy for advanced‐stage, low tumor burden (LTB) patients with follicular lymphoma (FL). However, the prediction of how long this treatment‐free observation period will last remains imperfect. We explored whether total metabolic tumor volume (TMTV) and other positron emission tomography parameters were predictive of time to first treatment (TTFT). We analyzed 97 grade 1–3A advanced‐stage LTB FL patients and found that a high TMTV was associated with other tumor burden features at diagnosis. Patients with a TMTV above our established cutoff of 50 mL had a significantly shorter median duration of observation (2.6 vs. 8.8 years; p = 0.001). At 5 years, 77% of patients with a high TMTV and 46% of patients with a low TMTV required treatment. In the multivariable analysis, a high TMTV was the only independent factor predicting TTFT (hazard ratio = 2.09; p = 0.017). Overall, TMTV is a strong predictor of the duration of observation in LTB FL patients. Upon validation of our cutoff in external series and standardization of the methodology, the TMTV could become an additional factor to consider deferring or initiating treatment in otherwise LTB patients.


| INTRODUCTION
Follicular lymphoma (FL), the most common indolent B-cell lymphoma, is considered an incurable malignancy.Although some patients relapse early (POD24) 1 or develop histological transformation (HT), 2 both of which dramatically worsen patients' outcomes, most individuals experience a protracted disease history, with long remissions and recurring relapses, and a median overall survival (OS) now exceeding 20 years. 3It is also acknowledged that around 10% of FL patients can experience spontaneous regression. 45][6][7] Comparable outcomes between both strategies (in terms of OS and the risk of HT) have made it customary in many countries to conservatively manage asymptomatic patients.
In a population characterized by its advanced age and the presence of comorbidity, 8 sparing therapy and thus toxicity seems more than reasonable.Individuals with localized disease can be treated with anti-CD20 immunotherapy and/or radiation, while advanced-stage cases are generally divided into those with and without high tumor burden features, which are an indication for starting treatment. 9rious high tumor burden criteria have been proposed, 10,11 being the one by the Groupe d'Étude des Lymphomes Folliculaires (GELF) 12 among the most widespread.Requirements for WW include the patient's will to undergo such a strategy (some individuals have a poor tolerance to having cancer and not receiving any therapy), absence of lymphoma-related symptoms, of large lymphoid masses, and of lymphoma-related organ dysfunction (including bone marrow (BM)).GELF criteria only consider CT-derived morphological parameters measured in a single plane.Due to CT limitations to define tumor limits, difficult-to-measure lesions such as those located in the spleen, BM, and pleura are not included in the assessment.
For LTB patients undergoing observation, the median time to first treatment (TTFT) has been set at around 3 years. 4,7,13,14However, the identification of specific factors predicting treatment initiation in these individuals remains elusive.In a recent single-center study, 15 the Follicular Lymphoma International Prognostic Index (FLIPI) score, Ki67 index, and the proportion of CD4 þ and FOXP3 þ cells were predictive of WW discontinuation.Of note, the FLIPI score only evaluates tumor burden by CT (enlarged lymph nodes in a single plane), without considering the size of non-measurable splenic and other extranodal lesions.
Follicular lymphoma is considered to be fluorodeoxyglucose (FDG)-avid, and [ 18 F]-FDG positron emission tomography/computed tomography (PET/CT) imaging is recommended before first and subsequent lines of therapy. 16Besides its usefulness in staging, 17 for guiding biopsy toward the most active lesion, and for assessing response, semiquantitative PET calculations allow for a whole-body volumetric tumor burden measurement that predicts progressionfree survival (PFS) 18 and has been incorporated into novel prognostic indexes. 19In the setting of initially observed LTB FL, two small Chinese (n = 38) 20 and Italian (n = 54) 21 single-center studies demonstrated that the maximum standardized uptake value (SUVmax), total lesion glycolysis (TLG) and total metabolic tumor volume (TMTV) predicted outcomes, with the caveat that the TTFT cutoff was obtained by receiver operating curve (ROC) analysis, thus treating it as a categorical variable (need of treatment within 2 years from diagnosis).
The aim of our study was to further explore the potential of semiquantitative PET/CT parameters to predict TTFT (performing a time-to-event analysis) in a larger bicentric cohort of LTB, initially observed FL patients.

| Patients
We retrospectively identified 97 grade 1-3A FL patients (43 females, All patients underwent PET/CT staging within 3 months of FL diagnosis, were considered to have LTB disease (i.e., not fulfilling criteria for initiating treatment as per GELF, 12 i.e., no bulky masses, no involvement of ≥3 nodal sites, each with a diameter >3 cm, no systemic symptoms, no symptomatic splenomegaly, no compression syndrome, no tumor effusions, no overt leukemic involvement, no disease-related cytopenia) and were observed without treatment for

| Image analysis
Using the same software for all patients, images were visually assessed by two nuclear medicine physicians, and by an independent specialist in conflicting cases, reaching a final image interpretation consensus in all cases.Image analysis was blinded to outcome.The segmentation of the tumor contours was semiautomatically performed by the MIM Software version 7.2.1 (Cleveland, OH).Segmentation threshold was established at a SUV ≥2.5 (Figure 1).All included contours that did not correspond to tumor activity were then manually removed (i.e., physiological uptake or concomitant inflammatory/infectious processes).In the BM, only focal tracer uptakes were considered pathological.Both focal and diffuse splenic uptakes were included (>150% of the liver background).Volumetric parameters such as TMTV (defined as the sum of the metabolic volumes of all lesions), TLG (defined as the sum of individual MTV multiplied by its mean SUV) and SUVmax were then obtained.

| Clinical endpoints
The main endpoint of the study was TTFT, defined as the interval between FL diagnosis and the initiation of frontline therapy.Although this parameter is influenced by several clinical factors, we consider it a relevant outcome in indolent malignancies, such as chronic lymphocytic leukemia or FL, in which WW is a common strategy.It is an indirect but eloquent measure of quality of life, since it is a period of time in which the patient is free from disease-and treatment-related complications.Besides, unlike PFS and OS, TTFT remains independent from the specific therapeutic strategy, which makes of it a good indicator of the natural history of disease.
Response criteria to frontline treatment were the standard. 22S was calculated from frontline treatment to relapse or death of any cause.Early progressors (POD24) were patients who relapsed within 24 months of initial treatment.Overall survival was calculated from diagnosis to last follow-up or death from any cause.Survival from treatment was calculated from frontline treatment initiation to last follow-up or death from any cause.

| Statistical analysis
The method of maximally selected rank statistics (maxstat package, R software, Vienna, Austria), was used to calculate the best TMTV, TLG and SUVmax cutoffs to predict TTFT.The χ 2 or Fisher's exact test were used to compare categorical variables.For TTFT, where a possible competing event exists, the primary event was the initiation of treatment and the competing event was death during WW.Cumulative incidence was then calculated (cmprsk R package) and Gray's test 23 was used for comparisons between both groups.For the estimation of hazard ratios in the uni-and multivariable analyses, Cox and Fine-Gray regression models were used.For the calculation of odds ratios in uni-and multivariable analyses, logistic regression was employed.We plotted Kaplan-Meier survival curves and used the log-rank test to explore PFS and OS differences based on the TMTV.Statistical significance was defined as a p value < 0.05.

| Baseline features and volumetric parameters
Total metabolic tumor volume, TLG and SUVmax were determined for all patients, including three cases in which no tumor mass was MOZAS ET AL.

F I G U R E 1
Illustrative calculation of the total metabolic tumor volume (TMTV) (semiautomatic segmentation using an standardized uptake value (SUV) ≥2.5 threshold uptake)] in a low tumor burden (LTB), TMTV lo patient (left) and in a LTB, TMTV hi patient (right).
T A B L E 1 Time to first treatment (TTFT), frontline therapy modalities, and reasons for initiating therapy, globally and according to the total metabolic tumor volume (TMTV).Note: Statistically significant associations are highlighted in bold.
a Only calculated for treated patients.
Since TTFT was the endpoint of the study, we dichotomized the radiomic parameters according to their ability to predict TTFT.For TMTV, a cutoff of 53.17 mL was obtained (Supplementary Figure S1).
We performed a 100,000-sample bootstrap validation of the maxstatobtained cutoff (mean, 54.99189).This cutoff was then rounded to 50 mL for the sake of practicality and to ensure external validity by avoiding overfitting.This same approach was used for SUVmax and TLG and the resulting cutoff values were 5 (unitless parameter) and 500 SUVbw*mL, respectively.We evaluated whether there was any association between the distribution of TMTV, SUVmax and TLG and the center of origin (Supplementary Table S1), and whether TMTV differed according to the participating center, and we found no significant differences.
Sixty-four patients (66%) had a TMTV above the established cutoff of 50 mL (TMTV hi , Table 2).The distribution of patients according to TMTV, SUVmax and TLG is depicted in Figure 3. SUVmax was the parameter identifying a highest percentage of patients at risk (88% of patients had a SUVmax >5), while 49% of cases had a TLG >500 SUVbw*mL.
As expected, TMTV hi patients had a more advanced stage, more frequent BM involvement by biopsy, more extensive nodal and extranodal disease and higher β 2 -microglobulin levels.No differences were seen with regard to age, sex, histological grade, LDH or hemoglobin levels.TMTV hi patients showed a trend toward a higher-risk FLIPI score, although this difference was not statistically significant.
Twenty-four patients exhibited splenic involvement: 18 diffuse, 4 focal Probability of receiving frontline therapy for all patients of the series (A), according to the total metabolic tumor volume (TMTV, B) and to the follicular lymphoma (FL) International Prognostic Index (FLIPI, C).

T A B L E 2
Baseline features of the 97 patients with initially observed, low tumor burden (LTB) follicular lymphoma (FL), globally and according to the total metabolic tumor volume (TMTV).  1 and Figure 2B), with a 5-year probability of initiating treatment of 77% and 46% for TMTV hi and TMTV lo patients, respectively.This difference was also seen in the proportion of patients receiving treatment during follow-up (78 vs. 49% for TMTV hi and TMTV lo , respectively; p = 0.005).Of note, the reasons to start therapy were comparable between both groups (lymph node growth in 79% and 81% of TMTV lo and TMTV hi patients, respectively).

All patients (N
To assess baseline features predicting the duration of observation, we built univariable Cox regression models for TTFT (Table 3) and found that an older age was predictive of a lower probability of initiating frontline therapy (HR = 0.59; p = 0.033).Factors predicting for a higher likelihood of starting treatment were the presence of ≥2 extranodal sites (HR = 1.69; p = 0.027), an intermediate/high-risk FLIPI score (HR = 1.77; p = 0.05, Figure 2C), and a TMTV hi (HR = 2.48; p = 0.0017).
Considering the statistically significant factors from the univariable analyses, a multivariable model for TTFT was built, excluding the variables included in the FLIPI score, to avoid redundancy.In a model with 94 cases and 66 events, also including extranodal involvement and the FLIPI score, a TMTV hi was the only factor retaining statistical significance (HR = 2.09, CI: 1.14−3.82;p = 0.017; Table 3; Supplementary Figure S2).We also investigated the factors (Supplementary Table S3).
We then explored the potential of predicting TTFT of other two T A B L E 3 Univariable and multivariable analyses for the cumulative incidence of receiving frontline therapy using Fine-Gray competing risk regression.Abbreviations: B2M, β2-microglobulin; BM, bone marrow; CI, confidence interval; FLIPI, Follicular Lymphoma International Prognostic Index; HR, hazard ratio; LDH, lactate dehydrogenase; TMTV, total metabolic tumor volume; ULN, upper limit of normal.

| Frontline treatment, response, and survival
Most patients (68%) were treated with immunochemotherapy (R-CHOP, R-bendamustine, R-CVP).For the entire cohort, the proportion of patients achieving a complete response after frontline treatment was 69%, without significant differences based on the initial TMTV (Supplementary Table S6).Fifteen patients (16%) died during follow-up, three of which had not received treatment for FL.Fiveyear PFS and OS estimates were 67% and 91%, respectively.
Although a trend toward a lower PFS and OS was seen for patients with a TMTV hi at diagnosis, differences did not reach statistical significance.As expected, the TMTV could not predict survival from treatment (Supplementary Figure S4).

| DISCUSSION
Due to the incurable nature and prolonged survival of FL, observation is an acceptable strategy for most patients with advanced-stage, LTB disease. 9Several motivations lie behind the interest of predicting the duration of WW, such as the psychological tolerance of younger patients.Although some factors and indexes (FLIPI, 24 FLIPI2 25 ) anticipate TTFT, 27 predictions remain imperfect.Semiquantitative PET/CT parameters are strong predictors of survival in FL patients in need of treatment, 19,28 but efforts to apply them to LTB patients have been scarce.We evaluated the potential of TMTV, TLG and SUVmax to foresee the duration of observation in 97 patients from two Spanish institutions who did not require treatment 12 at the time of diagnosis.
Two thirds of patients in our cohort had a high TMTV (>50 mL), which was associated with tumor burden features and more extensive nodal and extranodal disease.With a median follow-up of almost 7 years, the median TTFT was 3.1 years, which is in line with previous studies. 4The main finding of our research was that the median duration of observation was significantly shorter for TMTV hi (2.6 years) as compared with TMTV lo patients (8.8 years).In the multivariable analysis, we found that TMTV hi was the only factor predicting for a shorter TTFT (HR = 2.09), while extranodal involvement and the FLIPI score did not.
Long-term data of the randomized trial comparing single-agent rituximab with WW in LTB patients were recently presented. 7th a median follow-up of 12.3 years, rituximab monotherapy was highly effective at prolonging time to next treatment, and outcomes with subsequent lines of treatment were not inferior compared with that of patients undergoing initial observation.Our data could help identify a subset of asymptomatic patients (LTB, high TMTV) who could benefit most from single-agent rituximab, although this hypothesis remains to be proven in the setting of prospective clinical trials.
We also analyzed SUVmax and TLG, and found that they can both predict TTFT.Total metabolic tumor volume and SUVmax have been postulated as parameters reflecting different cell compartments.While TMTV best reflects the malignant B-cell burden, intratumoral T cells influence SUVmax, and this can be dependent on the treatment regimen. 29Due to the small number of patients with an SUVmax <5, the absence of independent impact of TLG on TTFT, the more consolidated role of TMTV in other settings in FL 19 and the contradicting results regarding TLG and SUVmax in previous studies, 30 we focused our analysis on the impact of TMTV.
Two previous small series 20,21 have used PET parameters for TTFT prediction.In both of them, however, cutoffs for such variables were calculated using receiver operating characteristic (ROC) curves, treating the need of therapy as a categorical variable.We believe that using a time-to-event analysis is more correct, since TTFT constitutes a dynamic clinical endpoint.Besides, the Kaplan-Meier method is not entirely appropriate for assessing TTFT, since it disregards cases who died without having received treatment.This can in turn be solved by the calculation of the cumulative incidence of initiating treatment, with competing risks of death. 31 the Leccisotti study 21 the median TMTV was much lower than in ours (7.1 vs. 138.08mL), as was the TMTV cutoff (14 vs. 50 mL), which might be explained by different inclusion criteria (i.e., more stringent criteria to undertake a WW approach) and different segmentation methods (PERCIST instead of SUVmax ≥2.5).That study also showed that TMTV and TLG predicted TTFT independently of FLIPI.However, the presence of extranodal disease, which we consider an important factor guiding treatment initiation, is not accounted for by the FLIPI score nor was it included in the multivariable analysis.A combined FLIPI and TMTV risk-stratification tool was also proposed by the authors.As much as we believe in the potential of radiomics to improve prognostication, we find the proposal of a new score too daring at this time, due to the small cohort size and lack of validation series.
In contrast with other clinical endpoints such as PFS or OS, TTFT (the duration of WW) has the peculiarity of deriving from a clinical decision-making process.The interpretation of the so-called high tumor burden features [10][11][12] is subject to significant variability among clinicians.Besides, other factors that are not accounted for by those criteria, such as patient preferences, age, or comorbidities, are integrated in the decision of starting treatment or continuing observation.These facts can lead to initially puzzling observations in our cohort, such as an older age (>60 years) being predictive of a longer TTFT (HR = 0.59).This is in all likelihood explained by a greater reluctance to administer therapy to an older, more comorbid individual, and not by a more indolent biological behavior.
The diversity of PET parameters, segmentation algorithms, thresholds and manual contouring methods can be overwhelming.
Besides, relevant TMTV cutoffs may significantly differ in various histologies (FL and DLBCL) and clinical situations.For instance, we found a TMTV of 50 mL to be predictive of TTFT in LTB patients, while a TMTV >510 mL anticipated a shorter PFS in the Meignan study, 19 in which all patients had a high tumor burden.We believe that the definitive incorporation of radiomics into lymphoma prognostication calls for international standardization and a solid methodological consensus for each clinical scenario.
One of the considerations regarding PET parameters is whether they substantially improve the prognostic information provided by CT scan data alone.Some risk scores include the extension of nodal involvement measured by CT, in the form of the number 24 or size 19,25,32 of involved lymph node areas.Indeed, the GELF criteria only consider lymph node size to recommend treatment.In our view, tumor volume measurement using PET/CT has clear advantages over morphological imaging techniques, especially in the case of lesions that are not measurable by CT (e.g., spleen infiltration without splenomegaly, and bone, pleural or peritoneal infiltration), where delineating tumor contours becomes challenging, due to the contiguity with vascular, nervous and muscular structures (Supplementary Figure S4).
We have to acknowledge several limitations of our work.First, the number of patients is modest.Second, the retrospective nature of the study makes it vulnerable to inherent flaws.Third, due to the lack of validation cohort, TMTV cutoff definition might be subject to overfitting.Fourth, the clinical application of PET calculations in clinical practice might not be straightforward, since it is timeconsuming.Lastly, although decisions were taken in a multidisciplinary team, using similar criteria for the past 20 years, we cannot deny the subjectivity of deciding when to initiate treatment.Despite all that, our data arise from a well-annotated clinical database, with a mature follow-up, employing semiquantitative PET measurements performed by two independent nuclear medicine physicians and robust statistical methods for cutoff calculation and TTFT analysis.
It could be argued that by limiting the patients included in our study to those not fulfilling the GELF criteria, which are themselves a measure of tumor burden and were empirically defined, might diminish the relevance of our conclusions.However, we focused on this subset of cases in order to identify patients without any of the classical high tumor burden features who might not benefit from WW for a long time.As mentioned before, the decision of initiating treatment derives not only from tumor burden features, but also personal factors from the physician and the patient.
In our exploratory study, we found that a high TMTV is a strong independent predictor of the duration of WW in initially observed, LTB FL patients.Although we failed to find a TMTV threshold identifying a subset of patients with an extremely low long-term probability of requiring treatment, we did recognize a third of LTB FL patients with a low TMTV who had a median treatment-free survival beyond 8 years.
Upon the validation of our cutoff and the standardization of segmentation methods, the information provided by PET/CT could become an additional factor to consider deferring or initiating treatment (such as single agent rituximab) in asymptomatic patients.
predicting the categorical event of needing treatment within five years of diagnosis by means of logistic regression and confirmed that, among 12 clinically relevant variables, TMTV was the only one predicting this endpoint [OR = 4.29 (CI: 1.52−12.66);p = 0.0067]

F I G U R E 3
imaging parameters, categorized according to previously obtained cutoffs.Both a SUVmax >5 [HR = 3.21 (CI: 1.34−7.70);p = 0.0089] and a TLG >500 SUVbw*mL [HR = 2.17 (CI: 1.32−3.57);p = 0.0022] anticipated a shorter duration of WW (Supplementary FigureS3), although only SUVmax retained statistical significance in a multivariable model also including the FLIPI score and extranodal Distribution of the patients of the series according to their maximum standardized uptake volume (SUVmax), total metabolic tumor volume (TMTV) and total lesion glycolysis (TLG).

Risk category Cumulative incidence of receiving frontline therapy (94 cases, 66 events) Univariate analysis Multivariable analysis HR P HR (CI) P
hi (>50 mL) 2.48 (1.41−4.37)0.0017 2.09 (1.14−3.82)0.017 Note: Statistically significant findings are highlighted in bold.NI, not included in the multivariable model due to absence of statistical significance in the univariate analysis (*) or to avoid redundancy with the FLIPI score ( § ).