Pre‐diagnostic blood immune markers, incidence and progression of B‐cell lymphoma and multiple myeloma: Univariate and functionally informed multivariate analyses

Recent prospective studies have shown that dysregulation of the immune system may precede the development of B‐cell lymphomas (BCL) in immunocompetent individuals. However, to date, the studies were restricted to a few immune markers, which were considered separately. Using a nested case–control study within two European prospective cohorts, we measured plasma levels of 28 immune markers in samples collected a median of 6 years before diagnosis (range 2.01–15.97) in 268 incident cases of BCL (including multiple myeloma [MM]) and matched controls. Linear mixed models and partial least square analyses were used to analyze the association between levels of immune marker and the incidence of BCL and its main histological subtypes and to investigate potential biomarkers predictive of the time to diagnosis. Linear mixed model analyses identified associations linking lower levels of fibroblast growth factor‐2 (FGF‐2 p = 7.2 × 10−4) and transforming growth factor alpha (TGF‐α, p = 6.5 × 10−5) and BCL incidence. Analyses stratified by histological subtypes identified inverse associations for MM subtype including FGF‐2 (p = 7.8 × 10−7), TGF‐α (p = 4.08 × 10−5), fractalkine (p = 1.12 × 10−3), monocyte chemotactic protein‐3 (p = 1.36 × 10−4), macrophage inflammatory protein 1‐alpha (p = 4.6 × 10−4) and vascular endothelial growth factor (p = 4.23 × 10−5). Our results also provided marginal support for already reported associations between chemokines and diffuse large BCL (DLBCL) and cytokines and chronic lymphocytic leukemia (CLL). Case‐only analyses showed that Granulocyte‐macrophage colony stimulating factor levels were consistently higher closer to diagnosis, which provides further evidence of its role in tumor progression. In conclusion, our study suggests a role of growth‐factors in the incidence of MM and of chemokine and cytokine regulation in DLBCL and CLL.


Introduction
B-cell lymphomas (BCLs) are the most common hematopoietic cancers in both men and women in the developed world. 1,2 The strongest and most consistent risk factors are related to altered immunity conditions including HIV infection or iatrogenically induced immune suppression after transplantation. 1 However, the prevalence of these conditions is too low to explain the majority of BCL cases. This has led to the hypothesis that minor perturbations in immune function among otherwise immunocompetent individuals could be related to future BCL risk.
In our study, we use blood samples collected years before clinical diagnosis to interrogate the relationship between prediagnostic blood levels of a large panel of cytokines, chemokines and growth factors and future risk of BCL and its main histological subtypes. We investigate the marginal relationship of each inflammatory biomarkers separately through univariate analyses, and consider the potential for a joint effect of these markers through penalized multivariate models (in our case, partial least square [PLS]). Exploiting the What's new? B-cell lymphomas (BCL) are frequent in immunocompromised individuals, but most BCL cases are thought to occur as a consequence of minor immune perturbations in otherwise immunocompetent individuals. Here the authors prospectively examined a panel of immune markers in the blood from 268 patients afflicted with BCL and paired controls. The data uncover a functional role for growth factors (i.e. FGF-2, TGF-alpha) in the incidence and progression of multiple myeloma, a BCL subtype, and underscore the importance of chemokine and cytokine regulation in diffuse large B-cell lymphoma and chronic lymphocytic leukemia.
prospective nature of our biosamples, we also seek for (combinations of) biomarkers that would be indicative of the time elapsed between protein measurement and clinical onset.

Study subjects
The EnviroGenoMarkers (EGM) study 15 is based on participants from two existing prospective cohorts: the Italian component of the European Prospective Investigation into Cancer and Nutrition (EPIC-Italy) 16 and the Northern Sweden Health and Disease Study (NSHDS). 17 In both cohorts, blood samples were (prospectively) collected from healthy subjects at enrolment. This study was approved by the committee on research ethics at the relevant institutions.
The EPIC project is a European network of prospective cohorts that was set up to examine relationships of cancer risk with nutrition and metabolic risk factors. 18 Between 1993 and 1998, EPIC Italy completed the recruitment of 47,749 volunteers (15,171 men and 32,578 women, ages 35-70 years) in five different areas covered by cancer registries. 16 After providing informed consent, a blood sample was collected as well as detailed information on dietary and life-style habits using standardized questionnaires (http://epic.iarc.fr/ research/quest.php). Incident primary cancer was identified by automated linkages to cancer and mortality registries, population offices of all municipalities where participants reported to be residing, hospital discharge systems and periodic personal contacts (in Naples).
The NSHDS cohort contains three sub-cohorts, of which we solely used samples from the V€ asterbotten Intervention Program. A total of 80,000 healthy individuals aged 40-60 years were recruited between 1985 and 2008 and were asked to complete a self-administered questionnaire collecting demographic, medical and lifestyle information as well as a separate self-administered food frequency questionnaire. Informed consent was obtained from all participants and a medical examination was conducted during which a blood sample was taken. Incidence cancers occurring among cohort members during the study period were identified by linkage with the Swedish Cancer Registry and the local Northern Sweden Cancer Registry.
For both cohorts, within 2 hours of blood collection, blood samples were processed for the isolation of buffy coats and other fractions which were placed in cold storage (liquid N 2 in EPIC-Italy and 2808C in NSHDS). Samples were transported on dry ice to the laboratory and stored for a short period at 2808C before analyses.
For each incident BCL case identified within the two cohorts, one random control was selected among all cohort members alive and free of cancer at the time of diagnosis of the index case matched by cohort, center, gender, date of blood collection (6 6 months) and age at recruitment (6 2.5 years). Information from the two studies was integrated into a single database and standardized. Lymphoma cases were classified into subtypes according to the SEER ICD-O-3 morphology codes. 19 We considered multiple myeloma (MM) together with all other BCL subtypes and hereafter in the text BCL includes MM.
We first analyzed samples from 100 case-control pairs (phase 1), which were subsequently supplemented with an additional 181 case-control pairs to increase the power of the study (147 cases in NSHDS, 34 cases in EPIC-Italy) (phase 2). 19 After further subtype characterization and review, 11 cases were reclassified (Hodgkin's lymphoma [n 5 6]; T-cell lymphoma [n 5 1]; and unknown (n 5 4)) and excluded from further analysis along with their matched controls. Moreover, two cases without suitable control samples were excluded. The final number of included successfully analyzed samples was 268 BCL cases and 268 controls (Table 1). Median time between blood collection and diagnosis of BCL was 6.2 years (range, 2.01-15.97) in NSHDS and 5.4 (range, 2.03-11.47) in EPIC-Italy.
For 224 case-control pairs (from both cohorts), fullresolution DNA methylation data were also available from Illumina Infinium Human Methylation 450 platform using standard protocol and preprocessing/normalizing steps as described elsewhere. 20 From these profiles, using an established deconvolution approach, 21 we estimated the proportion of the following six blood components: CD4, CD8 and natural T-cells, B-cells, monocytes and granulocytes.
Laboratory personnel were blinded with regard to casecontrol status. Cases and controls were assayed next to each other on the same plate in the same batch and a single quality control sample was run in duplicate with the case-control sets in each plate. Samples of phase 1 were run once due to

Cancer Epidemiology
Vermeulen et al.
sample volume limitations while samples in phase 2 were run in duplicate. Four analytes (IL12, IL1-RA, sIL2-RA and Flt3ligand) were excluded from further statistical analyses due to a high rate of non-detects (>75%). Median intra-batch coefficients of variation (CV) for all cytokines based on the quality control duplicates was 14.8 and 5.7 and median inter-batch CV was 7.7 and 13.3 for phase 1 and 2, respectively (Supporting Information Table S1). Median intra class correlation coefficient (ICC) of the measured analytes was 0.87 and was above 0.5, except for MDC and FGF-2 (0.16 and 0.43, respectively) (Supporting Information Table S1). Cytokine levels measured out of range of the calibration curve (either too low: <limit of detection (LOD), or too high) and missing values for covariate (body mass index [n 5 8], smoking status [n 5 14], education [n 5 16], alcohol intake [n 5 41], physical activity [n 5 2]) were imputed based on a maximum likelihood estimation method which was informed by the observed correlation structure within the data. 22 Imputation of samples <LOD was carried out using the empirical LOD across all plates as the upper bound. For imputation of samples with a concentration exceeding the calibration curve, we used a value of twice the highest observed concentration that was not out of range as the upper bound. 70% of the retained markers had <30% imputed values (Supporting Information  Table S1). In all analyses, levels of cytokines were logtransformed to normalize their distributions. Differences between cases and controls in baseline continuous covariates were assessed using paired Student's t test, and a v 2 test for categorical variables.

Linear mixed models
As proposed elsewhere, 19 linear mixed models were used to investigate the relationship between each of the immune marker levels separately and the disease outcome. The general formulation of the mixed model for a given protein (continuous variable) observed in participant i (Y i ) can be described as follows: where a is the intercept, E i is the residual error and X i is a binary variable indicating whether individual i is a BCL case or not. FE i is a vector of fixed effect observations for individual I, including the matching criteria (age, gender and country), the experimental phase (1 or 2) and as potential confounders, body mass index (BMI, continuous, in kg/m2), education (categorical: none, primary, technical/professional, secondary, university/college), physical activity (categorical: inactive, moderately inactive, moderately active, active), smoking status at enrolment (categorical: non-smokers, former smokers, smokers) and alcohol intake at enrolment (continuous in g/day). Nuisance variation due to differences between microtiter plates was modeled through a random intercept u Ai (where Ai denotes the plate on which sample i was assayed).
The strength of the association between the BCLs (or histological subtypes) case/control status and each protein level was inferred using a likelihood ratio test comparing the model with the disease status (X i ) variable to the one without it. The model was fitted on all markers separately, and we accounted for multiple testing using a Bonferroni correction, controlling the family wise error rate (FWER) below 5%. To investigate potential confounding by blood cell count differentials, the models (for full BCL and all histological subtypes) were further adjusted on the estimated cell proportions for 5 (of the 6) cell types (CD8, CD4, natural killer T cells, B cells and monocytes) for the 224 pairs in which that information was available.
Proteins that were found to be differentially expressed between cases and controls were further investigated through unconditional logistic regression (ULR), where, for a given subtype, cases were compared to all controls, adjusting for country, gender, age at recruitment, phase and microtiter plate number. Quartiles (Q) of plasma cytokine concentrations were calculated based on the distribution in controls.

PLS analyses
To evaluate the potential for a joint inflammatory signal that would be related to BCL or any of its histological subtypes, we performed series of PLS-DA analyses in relation to case/ control status. To facilitate interpretability, we performed variable selection by penalizing the loadings coefficients as proposed in the sparse PLS-DA models (sPLS-DA). 23 As recently proposed, 24 we also accounted for a functional grouping of the proteins in cytokines, chemokines and growth factors classes to inform the model. We ran series of sparse group PLS-DA (sgPLS-DA) analyses to select the most relevant protein groups in relation to disease status and imposed sparsity within the selected groups. In all PLS-DA analyses, the number of components was set to 1, and calibration of both the penalty, and, when applicable (sGPLS-DA), of the number of selected groups was done via fivefold cross-validation repeated 100 times. Calibration parameters were chosen to minimize the average misclassification rates, using univariate and bi-dimensional grids for sPLS-DA and sgPLS-DA, respectively, exploring all possible values of the number of selected variables and, if applicable, groups.
We also adopted an sPLS approach to investigate the relationship between prediagnostic levels of inflammatory markers in BCL cases and the time elapsed between the measurement and the clinical onset. We ran these analyses for BCL cases only and calibrated the penalty using the same cross-validation procedure to minimize, in that case, the mean square error of prediction.
For all PLS-DA and PLS analyses, we conducted a series of stability analyses randomly sub-sampling (N 5 10,000 times) 80% of the study population, and running the PLS models for each subsample. As a measure of stability, we report, for all investigated values of the calibration parameters, the number of times each variable was selected across the 10,000 subsamples.
To adjust results from all PLS analyses for technically induced variation, and as already proposed, 25,26 we inferred de-noised data from the linear mixed model presented above by subtracting from the observations the estimated random effects.
Linear mixed models were fitted using lme4 R-statistical package, and all PLS and PLS-DA analyses were performed using the R-statistical package sgPLS using the R 3.4.0 language and environment (The R Foundation for Statistical Computing, Vienna, Austria). Conditional and ULRs were performed using SAS (ver. 9.2, SAS institute).

Results
Of all BCL cases, 16.4% were diagnosed with diffuse large BCL (DLBCL) (n 5 44), 14.6% with follicular lymphoma (FL) (n 5 39), 15.6% with CLL (n 5 42) and 28.4% with MM (n 5 76). Distribution of BCL subtypes and gender across phases and countries are shown in Supporting Information Table S2. Each phase includes subjects from both cohorts and gender. Characteristics of the study population are summarized in Table 1. Supporting Information Tables S3 and  S4 show the median, minimum and maximum levels of all cytokines stratified by case-control status, country, phase of study and BCL subtypes. Median concentration of most immune markers was higher among control subjects, phase 1 and NSHDS subjects compared with cases, phase 2 and EPIC-Italy subjects, respectively.

Linear mixed model analyses
In a first set of analyses, all BCL cases were pooled together and multivariate analyses revealed a general lower level of inflammatory markers among cases compared with controls ( Fig. 1a, Supporting Information Tables S3 and S5). Among the 28 analytes, 20 showed an inverse association with disease status. Of these, only two reached Bonferroni significance level (Supporting Information Table S5) and involved blood levels of FGF-2 (b 5 -0.50, p values 5 7.2 3 10 24 ) and TGF-a (b 5 -0.68, p values 5 6.5 3 10 25 ). Models adjusted for white blood cell (WBC) differentials provided consistent results (Supporting Information Table S6) and one borderline significant association involving fractalkine (b 5 -0.47, p values 5 1.84 3 10 23 ) emerged.

Cancer Epidemiology
Vermeulen et al. analyses by phase showed that the results for MM were consistent between the analyses done in Phases 1 and 2, providing technical replication of our results (Supporting Information Tables S7a and S7b). Given that the two associations found in the pooled BCL analyses correspond to the two strongest MM-specific associations, one can hypothesize that these findings are driven by the MM subtype. This was confirmed by additional analyses of pooled BCL excluding MM cases, where no significant associations were observed (Supporting Information Fig. 1).
Linear mixed models were further stratified by median time-to-diagnosis: before 6 years (Supporting Information  Table S8) and after 6 years (Supporting Information Table  S9) for all BCL, BCLL, MM and DLBCL subtypes. Results showed largely similar associations for cases diagnosed within or >6 years after blood collection.
Results from the multivariable ULR models (including all lymphoma controls) for MM were consistent with the linear regression analyses. These identified an inverse association between risk of MM ( Table 2) and blood levels of FGF-2 Quartiles of (log-transformed) plasma levels of immune markers were calculated based on the distribution in control subjects. Models adjusted for age, sex, country, body mass index, smoking status, education, physical activity, alcohol intake and phase. Tests for trend were calculated using the quartile number as continuous variable. Noise variance due to plate was removed before analyses. Abbreviations: FGF-2, fibroblast growth factor-2; TGF-a, transforming growth factor alpha; MCP-3, monocyte chemotactic protein-3; MIP-1a, macrophage inflammatory protein-1 alpha; VEGF, vascular endothelial growth factor; EPIC, European Prospective Investigation into Cancer and Nutrition; NSHDS, the Northern Sweden Health and Disease Study.

Cancer Epidemiology
Vermeulen et al.  Table S10). Moreover, analyses including minimal adjustment (excluding some factors which are weak confounders such as smoking status and physical activity) did not affect the presented results.

Multivariate analyses: PLS-DA models
Calibration of the sPLS-DA analysis of all BCL cases selected 14 variables as yielding optimal balance between discriminatory performances and sparsity. Of these, 12 showed negative loadings coefficients, suggesting consistent lower levels of inflammatory markers in BCL cases (Fig. 2a). Stability analyses for models of size 14 showed that 8 of the 14 proteins were selected in >80% of the (N 5 10,000) subpopulations.
In particular, extended stability analyses to all possible number of variable selected in the sPLS-DA component (Fig. 2b) showed that both FGF-2 and TFGa were the first variables to be consistently selected, even in sparse models: both proteins showed selection proportions above 70% in (sub-optimal) models selecting five variables.  granulocytes, eosinophils etc.). These analyses did not change the results presented here (not shown).
Misclassification rates yielded by the calibrated sPLS-DA and sgPLS-DA analyses for all BCL cases (Fig. 3c) showed rather high error rates, hence indicating a moderate predictive value of these combinations of markers (AUC from ROC analyses were below 62% for both models). However, our results clearly suggest that, irrespective of the variant of the PLS-DA model, the model fitted on all BCL cases showed lower misclassification rates for the MM cases (Fig. 2c).
Calibration of the subtype-specific sPLS-DA and sgPLS-DA models were restricted to cases control pairs within each subtype (N 5 42, 44, 39 and 76 pairs for CLL, DLBCL, FL and MM respectively), and models for FL yielded poor discriminatory performances with misclassification rates higher than 58%. Due to this, we do not report FL-related results.
As depicted in Figure 3a, sets of selected variables showed limited overlap across histological subtypes. For DLBCL, two variables were selected by the sPLS-DA (Fig. 4a): MDC (negative loadings) and eotaxin (positive loadings). Including a group structure in the model (Fig. 3b), resulted in only the chemokines being selected to discriminate DLBCL cases and controls, and within chemokines, MDC and eotaxin were selected with highest loadings (in absolute value) along with MCP-1 and IP10. Stability analyses (Supporting Information Fig. 2a) suggested that these two proteins were the only ones stably selected in sparse models. . Models were fitted on subtype-specific sets of cases and controls. Loadings coefficients are presented for DLBCL, CLL and MM separately for the sPLS-DA model (a), and the sgPLS-DA (b) models. Models for FL are not reported as they yielded poor predictive performances. Abbreviations: IL, interleukin; INF-a, interferon alpha; INFg, interferon gamma; GMCSF, granulocyte-macrophage colony stimulating factor; TNF-a, tumor necrosis factor alpha; EGF, epidermal growth factor; FGF-2, fibroblast growth factor 2; GCSF, granulocyte colony-stimulating factor; GRO, melanoma growth stimulatory activity/growthrelated oncogene; IP10, INF-g-induced protein 10; MCP-1, monocyte chemotactic protein-1; MCP-3, monocyte chemotactic protein-3; MDC, macrophage derived chemokine; MIP-1a, macrophage inflammatory protein 1 alpha; MIP-1ß, macrophage Inflammatory Protein 1 beta; sCD40L, soluble CD40 ligand; VEGF, vascular endothelial growth factor; TGF-a, transforming growth factor alpha. [Color figure can be viewed at wileyonlinelibrary.com]

Cancer Epidemiology
Vermeulen et al.
For CLL, five variables were selected by sPLS-DA (Fig.  3a): IL4 and TGF-a (positive loadings coefficient, and sCD40L, eotaxin and MCP-1 (with negative loadings coefficients). Stability analyses (Supporting Information Fig. 2b) showed that IL4, eotaxin and sCD40L showed high selection probability (>70%) for models including five or more variables. Sparse group PLS-DA models for CLL selected a single group (cytokines) and within the group, sCD40L and IL4 (Fig. 3b).
MM analyses selected eight variables (all negative loadings coefficients, Fig. 3a). Of these, five showed higher absolute values of the loadings coefficients and high selection proportion (>75% selection proportion for models of size 8; Supporting Information Fig. 2c): FGF-2, TGF-a, MCP-3, factalkine and VEGF. Calibration of the sgPLS-DA analyses of MM subtype selected a single group: growth factors (Fig. 3b) and within this group EGF, FGF-2, VEGF and TGF-a were selected.
To explicitly model the relationship between our 28 inflammatory markers and the time to diagnosis, we ran series of sPLS analyses in cases only (all BCL cases and subsequently DLBCL, CLL and MM separately), relating prospective blood levels of all inflammatory markers and the observed time to diagnosis. The resulting sPLS models (Fig.  4a) selected a single variable for all BCL (GMCSF, negative loadings coefficient), for DLBCL (sCD40L, negative loadings coefficient), for MM (IL8, positive loadings coefficient) and CLL (MDC, positive loadings coefficient). Stability analyses (Fig. 4b) showed that while each of these subtype-specific variables were the most frequently selected, especially for all BLC analyses (selection proportion of GMCSF >40%), other Results are presented for all BCL cases and for cases of DLBCL, CLL and MM separately. Loadings coefficients obtained for the calibrated models are presented for each set of cases considered (a). Results from stability analyses using 10,000 subsamples of the full set of cases are represented in B by the per-variable proportion of selection across all independent subsamples. Abbreviations: IL, interleukin; INF-a, interferon alpha; INF-g, interferon gamma; GMCSF, granulocyte-macrophage colony stimulating factor; TNF-a, tumor necrosis factor alpha; EGF, epidermal growth factor; FGF-2, fibroblast growth factor 2; GCSF, granulocyte colony-stimulating factor; GRO, melanoma growth stimulatory activity/growth-related oncogene; IP10, INF-ginduced protein 10; MCP-1, monocyte chemotactic protein-1; MCP-3, monocyte chemotactic protein-3; MDC, macrophage derived chemokine; MIP-1a, macrophage inflammatory protein 1 alpha; MIP-1ß, macrophage inflammatory protein 1 beta; sCD40L, soluble CD40 ligand; VEGF, vascular endothelial growth factor; TGF-a, transforming growth factor alpha. [Color figure can be viewed at wileyonlinelibrary.com] variables were competing to predict time to diagnosis. For MM both IL8 and GMCSF were selected in 20% of the subsamples. For CLL, both MDC and MCP-1 were selected with proportions around 20%, and for DLBCL, sCD40L showed a selection proportion around 25%, while it was around 15% for MCP-1.

Discussion
In our study of plasma levels of circulating immune markers and subsequent risk of BCL and main histological subtypes, growth factors and in particular, FGF-2, TGF-a and VEGF were found consistently associated (inversely) with incidence of MM. These associations seemed to persist among cases sampled more than six years before diagnosis, and none of these disease-associated inflammatory markers showed an association with time to diagnosis in cases only.
For CLL, DLBCL and FL, no significant association between circulating immune markers was observed in univariate analyses. However, adopting a group PLS approach, better accounting for the possible pleiotropic and complex effects of immune markers on BCL incidence, chemokines as a group were found to be related to DLBCL and cytokines to CLL. Additional variable selection within each selected groups identified individual markers driving the link between the group and the outcome: sCD40L and IL4 for CLL; and MDC, eotaxin, MCP-1, and IP10 for DLBCL. These links were not detected using the univariate approach. These results indicate that the use of group and sparse-group PLS may enhance the analyses of interrelated biological markers.
No direct evidence from our analyses provided support for the previously reported associations between TNF, 9 IL13 8 and IL10 9,10,14 and BCL; TNF-a, 11 IL6 14 and IL10 9,14 and FL; TNF-a, 11 IL5 and IL10 9 and DLBCL and TGF-a 11 and TNF 9 and CLL. However, the direction of most of the previously reported associations is consistent with our findings. In particular, as Gu et al. 8 reported (among 92 B-NHL cases and 184 matched controls), we find a decreased risk of B-NHL with increasing levels of IL13 and IL5 and an increased risk of B-NHL for TNF-a. As previously reported in a study involving 491 B-NHL cases and 491 controls, 9 we also find a positive correlation between levels of TNF-a, IL10 and the incidence of all BCL, FL and DLBCL. The associations reported by Purdue et al. 10,11 and Conroy et al. 14 (272 NHL cases and 541 matched controls) linking TNF-a and FL, TGF-a and CLL and IL10 and BCL and FL were also in the same direction of ours. As such, our study does provide some potential meta-analytical support for reported associations involving blood levels of immune markers and specific BCL-subtypes.
Most previous prospective studies on immune markers and lymphoma did not include MM. In the pooled analyses of the MM Cohort Consortium included 493 MM cases and 978 controls from 8 cohorts, IGF-1 was found associated with an increased MM risk within 3 years of blood collection while soluble IL-6 receptor was associated with MM in the 6 first years after blood collection. 13 Therefore, these markers are likely to reflect the tumor and/or its microenvironment. In contrast, our study revealed several markers, mostly growth factors, inversely related to long-term risk of MM. This may be of importance as the average 5-year survival rate for MM patients remains low (45%). 27 If results of our study are replicated in other studies and extended to clinical studies in patients with monoclonal gammopathy of undetermined significance (MGUS), or smoldering MM (SMM), this could lead to the identification of patients at higher risk of progressing to MM, and, in the long-term could improve individualized surveillance strategies.
We also performed sparse PLS analyses to identify (combination of) biomarkers that would be indicative of the time elapsed between protein measurement and clinical onset in a case-only setting. These analyses revealed that GMCSF plasma levels were increasing closer to diagnosis for all BCL cases, and in particular MM cases. A clinically relevant aspect of the interactions of MM plasma cells in the bone marrow microenvironment is neovascularization, which is central in disease progression. 28 Myeloma plasma cells induce angiogenesis via recruitment and activation of stromal inflammatory cells such as macrophages and mast cells. When these cells are activated, they secrete angiogenic factors including GM-CSF, which contribute to enhance the tumor neovascularization. 29 Recently GMCSF has also been found to be involved in homing circulating endothelial precursor cells, which contribute to the "angiogenic switch" and tumor progression. 29 Our study has a number of strengths, including its prospective nature, which limits reverse causation bias that may occur when variation in blood level of cytokines is induced by the disease itself, cancer treatments or lifestyle changes after cancer diagnosis. Moreover, compared to most previous prospective studies, especially on MM, we measured a larger panel of immune markers. The availability of two cohorts allowed for independent confirmation of the observed signals. Conversely, different media for blood samples (citrate in EPIC-Italy and EDTA in NSHDS) might have introduced differences in cytokine levels between the two cohorts which may cause bias in unconditional analyses by incomplete correction for cohort status in the model. Although the use of different anticoagulants results in absolute differences in levels of immune markers, correlations between measurements in split samples simultaneously treated with heparin, citrate and EDTA have shown to be highly correlated. 15,30 Similarly, bias may arise from cytokine measurements of study subjects in two phases despite adjustment in multivariate analyses. However, stratified analyses by cohort and phase showed overall similar trends for the identified markers despite reduced power in these analyses. Furthermore, FGF-2, for which a significant association with future MM risk was found, presented a relatively low ICC (0.43). The consequence of a low ICC would be an underestimation of the effect and would not introduce a false-positive association. As

Cancer Epidemiology
Vermeulen et al.
such, we may have underestimated the predictive power of FGF-2 which warrants follow-up in future studies.
We measured the immune markers at a single time point to determine future risk of BCL, which may not accurately reflect the long-term immune status of an individual. However, several studies have provided evidence of a reasonable between-to-within person variability ratio (ICC) suggesting temporal stability for panels of cytokines. [30][31][32][33][34][35] Finally, blood cytokines are produced not only by those cell types considered to play pivotal roles in the immune system and in inflammatory responses, including lymphocytes, monocytes and mast cells but also by macrophages and, for some cytokines, also fibroblasts, neutrophils and endothelial cells. So, it should be noted that plasma level of cytokines may not necessarily reflect activity in the target tissue.
Our study provided evidence for a strong link between FGF-2 and TGF-a levels and incidence of MM. Several clinical studies have reported that the plasma concentrations of FGF-2 were elevated in patients with active MM compared to patients with inactive disease, and this correlates with increased bone marrow angiogenesis and lymphangiogenesis. [36][37][38][39] MM patients who respond to chemotherapy (an immunosuppressed condition) show a significant decrease in serum FGF-2 levels, whereas non-responders do not. 39 TGFa is an important mitogen that binds to the EGF receptor and has been studied in many other malignancies, but data on MM are limited and no prospective data are available. [40][41][42] In the sgPLS analyses, we also observed some moderate support for an effect of VEGF on MM incidence. Similar to FGF-2, clinical studies have shown that increased serum levels of VEGF are associated with more advanced disease stages and with poor prognosis in BCL and MM cases. 36,37,39 VEGF and its ligands and receptors have a central role in physiological regulation of angiogenesis. 43 Moreover, there is a growing list of nonvascular roles of VEGF including recruitment of inflammatory cells and autocrine and intracrine production of hematopoietic stem cells. 43 A recent nested case-control study within the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial showed a significant association between elevated blood levels of soluble VEGFR-2 (sVEGFR-2) and risk of BCL. 11 Although the biologic function of sVEGFR2 is unclear, it has been shown that sVEGFR-2 binds the lymphangiogenic growth factor VEGF-C and thus inhibits VEGF-C-induced activation of VEGFR-3, consequently inhibiting lymphatic endothelial cell proliferation. 44 On the other hand, sVEGFR-2 can bind VEGF and may act as a VEGF inhibitor. These studies support a possible role of the growth factors (VEGF and FGF-2) in the pathogenesis of MM. Given their interrelationship and cyclic response, more in-depth monitoring of the VEGF, FGF-2 growth factors and its soluble receptors is needed to clarify their possible pre-diagnostic role in MM.
Although the markers identified in our study have been identified previously in clinical studies of MM or its precursor states (i.e., MGUS and SMM), 45-50 the direction of our findings is in general opposite to observations among subjects diagnosed with MM, where higher concentrations of these markers seems to be related to generally poorer disease outcome. The reason for this difference in direction of the effect is not known but may hint toward a preclinical deregulation of these important biological systems in subjects developing MM later in life which at the time of clinical manifestation reverse in overexpression. However, we cannot exclude the bias related to limited statistical power and design-related sources of variability in our findings.
In conclusion, our study showed that several immune markers, in particular growth factors, are associated with MM incidence in preclinical blood samples taken many years before clinical diagnosis. In addition, we provide marginal support for some of the previous reported associations between several immune-markers and subtypes of BCL, in particular chemokines being related to DLBCL and cytokines with CLL. In addition, we showed a consistent link between blood levels of GMCSF to time-to-diagnosis in all BCL and MM cases. These results need to be extended and replicated in independent prospective cohorts to clarify the relationship with BCL risk for these markers.