Association between anthropometry and lifestyle factors and risk of B‐cell lymphoma: An exposome‐wide analysis

Abstract To better understand the role of individual and lifestyle factors in human disease, an exposome‐wide association study was performed to investigate within a single‐study anthropometry measures and lifestyle factors previously associated with B‐cell lymphoma (BCL). Within the European Prospective Investigation into Cancer and nutrition study, 2402 incident BCL cases were diagnosed from 475 426 participants that were followed‐up on average 14 years. Standard and penalized Cox regression models as well as principal component analysis (PCA) were used to evaluate 84 exposures in relation to BCL risk. Standard and penalized Cox regression models showed a positive association between anthropometric measures and BCL and multiple myeloma/plasma cell neoplasm (MM). The penalized Cox models additionally showed the association between several exposures from categories of physical activity, smoking status, medical history, socioeconomic position, diet and BCL and/or the subtypes. PCAs confirmed the individual associations but also showed additional observations. The PC5 including anthropometry, was positively associated with BCL, diffuse large B‐cell lymphoma (DLBCL) and MM. There was a significant positive association between consumption of sugar and confectionary (PC11) and follicular lymphoma risk, and an inverse association between fish and shellfish and Vitamin D (PC15) and DLBCL risk. The PC1 including features of the Mediterranean diet and diet with lower inflammatory score showed an inverse association with BCL risk, while the PC7, including dairy, was positively associated with BCL and DLBCL risk. Physical activity (PC10) was positively associated with DLBCL risk among women. This study provided informative insights on the etiology of BCL.

Epidemiological studies showed that the risk of BCL is associated with anthropometry measures, lifestyle, viral, environmental and occupational factors (collectively called the exposome). [2][3][4][5][6][7][8] Moreover, in the last two decades, reports from epidemiological studies suggested differences in risks among BCL subtypes for a wide range of risk factors. 2 To better understand the role of risk factors in the occurrence of BCL, it would be preferable to study a large set of lifestyle factors (exposome) in a single study. Few methods are available to comprehensively evaluate the role of specific risk factors with disease. Recently, a study design analogous to genome-wide association studies, the exposome-wide association study, or equivalently, environment-wide association study (EWAS), has been proposed to search for and validate exposures associated with complex diseases. Instead of testing one or only a few associations at a time, EWAS evaluates multiple exposures for association, with proper adjustment for multiplicity and collinearity of comparisons. EWAS techniques have recently been used to assess environmental factors in relation to chronic diseases (eg, Type 2 diabetes, high blood pressure and peripheral arterial disease) and mortality. [9][10][11] In this study, we aimed to use an exposome-wide approach to evaluate multiple lifestyle exposures and determine both their independent and combined roles (using a multivariable penalized regression algorithm and principal component [PC] approaches) with respect to the risk of BCL and major subtypes using data from the European Prospective Investigation into Nutrition and Cancer cohort (EPIC).

| MATERIALS AND METHODS
The EPIC study is a prospective cohort involving 23 centers from 10 European countries (Denmark, France, Germany, Greece, Holland, Italy, Norway, United What's new? The "exposome" includes all non-genetic exposures (e.g. diet, viral, environmental, etc.), with the goal of understanding how those exposures may affect an individual's health. In this study, the authors used a technique called "EWAS" (exposome-wide association study) to identify multiple factors that are associated with B-cell lymphoma (BCL) risk. Their results confirm both previously reported risk factors and protective factors. In addition, they identify several previously unknown associations. These new insights, gained via the analysis of multiple exposures, suggest that traditional single-factor approaches may be suboptimal compared with an EWAS approach.
Kingdom, Spain and Sweden). The rationale and study design have been described previously. 12,13 In brief, 521 324 subjects, mostly aged 30 to 70 years, were recruited between 1992 and 2000. Ethical review boards from International Agency for Research on Cancer (IARC) and local participating centers approved the study and all participants gave their written informed consent. Of the 521 324 EPIC cohort participants, we excluded prevalent cancer cases at baseline (n = 25 184), subjects with missing follow-up information (n = 4148), with incomplete information on diet or lifestyle questionnaires (n = 6259), or those with extreme caloric intake (top and bottom 1% of the total energy intake to energy requirement ratio) (n = 9573) and incident cases of non-BCL lymphomas (n = 734). This left a cohort of 475 426 subjects, including 2402 incident BCL and 473 024 participants free of cancer.
Validated country-specific questionnaires were used to collect information on the usual diet during the year before recruitment; namely through self-administered semi-quantitative food frequency questionnaires or diet history questionnaires administered through a personal interview, and semi-quantitative food-frequency questionnaires combined with a food record. 13,14 Lifestyle questionnaires were used to obtain information on sociodemographic characteristics, physical activity, medical history and alcohol and tobacco consumption. Anthropometric measures were also ascertained at recruitment. 15

| Follow-up and outcome assessment
Primary incident lymphoma cancer cases were identified by linkage with national cancer registries in Denmark, Italy, the Netherlands, Norway, Spain, Sweden and the United Kingdom. A combination of methods were used in France, Germany and Greece, including cancer registries, health insurance records, and active follow-up contacting participants or their next-of-kin. Mortality data were retrieved from regional or national mortality registries. The follow-up period was defined from the age at recruitment to the age at first lymphoma diagnosis, death or last complete follow-up (December 31, 2013), depending on which occurred first.
Diagnoses of primary incident lymphoma cases were based on the International Classification of Diseases for Oncology, third edition and grouped according to recommendations of the InterLymph Pathology Working Group. 1 In the current analysis, only mature B-cell lymphomas (Table S1) were considered, which were further categorized into diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), chronic lymphocytic leukemia (CLL) (including small lymphocytic leukemia), multiple myeloma/plasma cell neoplasm (MM), and "other" entities (ie, those cases in which the B-cell lymphoma subtype is unknown or does not fall within the above-mentioned subtypes).

| Exposures assessment per category
Anthropometry measures included height, weight, hip circumference, waist circumference, body mass index (BMI, kg/m 2 ) and waist to hip ratio. Participants' height, weight, hip circumference and waist circumference were measured at baseline, except for France, Oxford and Norway, where self-reported measures were obtained via questionnaire. [15][16][17] Smoking status included ever smoking, current smoking, currently smoking cigarettes, currently smoking cigars, duration of smoking, duration of cigarettes smoking, >15/d currently smoked cigarettes and >25/d currently smoked cigarettes.
Alcohol intake: Data on alcohol intake were collected through the dietary questionnaire (alcohol intake over 12 months prior to recruitment) and in the lifestyle questionnaire (consumption of alcoholic beverages at different ages in the past) and expressed in grams of ethanol per day (g/d).
Medical history: Participants stated whether they had ever had myocardial infarction, stroke, hypertension, diabetes, reported a cardiovascular problem.
Physical activity: The assessment of physical activity measures is described in detail elsewhere. [18][19][20] Current occupational physical activity was based on employment status and on the level of physical activity at current work, which was later coded in categories (sedentary occupation, standing occupation, manual work, heavy manual work and unemployed).
Information on housework, do-it-yourself work, gardening and climbing stairs was combined to estimate the overall household activities and walking, cycling and sport activities were combined to determine the overall recreational activities. Subsequently, energy expenditure using metabolic equivalent values was calculated, according to the Compendium of Physical Activities. 21 Sex-specific total physical activity index, Cambridge physical activity index (1-4 levels) and IARC physical activity score (1-3 levels) were also included in current analyses.
Diet: The 16 main food groups included were potatoes and other tubers; vegetables; legumes; fruits (including nuts and seeds); dairy products; cereal and cereal products; meat and meat products; fish and shellfish; egg and egg products; fat; sugar and confectionary; cakes and biscuits; nonalcoholic beverages; alcoholic beverages; condiments and sauces; soups and bouillons.
Nutrient: Nutrient values of all items from the 24-hour dietary recalls were standardized to build the EPIC Nutrient Database. 22 For each nutrient, the nutrient density was calculated by dividing the caloric value of that nutrient by the total caloric intake. For nutrients without caloric value (vitamins, flavonoids, cholesterol, calcium), the weight of the nutrient was divided by the total caloric intake.
Dietary pattern: The Mediterranean diet (MD) score was assessed using the adapted relative MD (arMED) score. 23,24 In brief, the arMED is a 16-point linear score that incorporates eight key dietary components: six components presumed to reflect the MD [fruit (including nuts and seeds), vegetables, legumes, fish (including seafood), olive oil and cereals] and two components consumed in low quantity in the MD (dairy products and meat). The sum of these points was used to define the MED score, that ranged from 0 to 16 (from the lowest to the highest adherence). 24 The dietary inflammatory potential was assessed by means of an inflammatory score of the diet (ISD), calculated using 28 dietary components and their corresponding inflammatory weights. 25,26 Overall, the ISD is a relative index that categorizes individual's diets from maximally anti-inflammatory (corresponding to lower scores) to maximally pro-inflammatory (higher values).
Socioeconomic position (SEP): Educational level (primary school or less, technical/professional school, secondary school, longer education including university degree) was used as an indirect measure of SEP. A particular advantage of investigating education is avoiding reverse causation bias: diseases may lead to downward occupational mobility and reduced income, but generally will not affect educational status achieved by early adulthood. Table S2 shows the list of 84 exposure variables included in the study.

| Statistical analyses
Exposure variables with <25% missing data (Table S3) were imputed based on a maximum likelihood estimation method which was informed by the observed correlation structure within the data. To understand the structure of our data and see which exposures are related to each other, we calculated Spearman rank correlation between each two variables adjusted for age, sex and country. Spearman correlation coefficients were visualized with a heatmap where variables were arranged using a hierarchical clustering algorithm. The larger the correlation between a pair of variables, the closer in proximity they appear in the heatmap. Absolute correlations below 0.2 were omitted and remaining correlations were plotted in a "circus" plot.
Univariate and age-, sex-, and country-adjusted Cox proportional hazards models were used to examine the association between each exposure and BCL and its subtypes. Subsequently, to examine country heterogeneity, we fitted each model per country (age and sex adjusted) and pooled the estimates by conducting a random-effects meta-analysis. The coefficient of inconsistency I 2 was used as a metric to assess heterogeneity between countries, with a P value <.05 to be regarded as statistically significant evidence for between country heterogeneity. Cox analyses were stratified by sex and median age at recruitment (≤55, >55 years). Sensitivity analyses were performed by excluding cases with less than 2 years of follow-up (n = 176 cases) and centers with self-reported anthropometry data (France, Oxford and Norway) and without comprehensive physical activity data (Norway, Umea).
Given the correlation between exposures, univariate regression analysis is prone to increased false positive results. 27 Therefore, we used the least absolute shrinkage and selection operator (LASSO) technique, a multivariable penalized regression algorithm, 28,29 to identify exposures associated with BCL. LASSO technique is a powerful method that performs two main tasks: regularization and feature selection. In order to do so, the method applies a shrinking (regularization) process in which the coefficients of the regression variables are penalized, thus shrinking some of them to zero. During the feature selection process, the variables that still have a nonzero coefficient after shrinkage are selected to be part of the model. Optimal tuning parameter λ, which controls the strength of the penalty, was obtained by 5-fold cross-validation. All exposures were standardized. Dummy variables were defined for country and together with age and sex were forced into the "regularized" Cox model by decreasing their penalty factor to zero. We also applied an Elastic-Net approach 29 that combines the penalties of ridge and LASSO regressions to get the best of both. The method effectively shrinks coefficients (similar to ridge regression) and set some coefficients to zero, like in LASSO. As the results were similar to the LASSO technique, we therefore present the former.
Finally, PC analysis (PCA) was applied to reduce the spectrum of the exposures into a smaller number of clusters of related exposures. between exposures were lower than 0.4 ( Figure S1). As expected, correlated exposures were mostly within the same category ( Figure S2).
Next, Cox regression analyses were performed for each country independently and per-country estimates were pooled (random effect meta-analysis). Among anthropometry variables, height was associated with increased risk of BCL after multiple testing corrections ( Figure 3). We found similar results for MM without significant heterogeneity between countries while the association between consumption of sugar and confectionary and FL risk was no longer statistically significant (β = .004, P value = .01, P heterogeneity = .06) ( Figure S3, Note: Country, age and gender were forced into the penalized Cox models. All variables were standardized. Penalty parameter, lambda, was derived using 5-fold cross-validation. Regression coefficients were obtained at lambda minimum (the value of λ at the lowest cross-validation error). Positive coefficients indicate that an exposure is associated with higher risk of lymphoma, and vice versa for negative coefficients. See Table S2  The stratified analyses revealed that the associations between BCL and the subtypes and certain PCs slightly differ between males and females (Table 4) and by age (Table S10). These analyses showed a positive association between PC5, PC7, PC10, PC11, PC12 and PC13 and DLBCL and between PC9, PC11, PC13 and FL among women.
Exclusion of cases diagnosed in the first 2 years of follow-up (n = 176) did not materially alter the estimates for individual exposures and PCs. Moreover, sensitivity analyses excluding centers with self-reported anthropometry data and centers without comprehensive physical activity data (n = 178 106) did not change the reported association between anthropometry and BCL and MM, and between physical activity and BCL and DLBCL (data not shown).

| DISCUSSION
In this large prospective cohort study, several anthropometric measures and lifestyle factors were associated with BCL and/or subtypes, with strong evidence for a positive association of anthropometric measures.
In our study, we used a new exposome-based approach to find subtypes are scarce. This study provides more support for the role of nutrition in lymphoma. In our study, consumption of sugar, confectionary, and carbohydrates products was associated with increased risk of FL, particularly among female participants. The majority of previous prospective and case-controls studies on total carbohydrates or the main food sources of carbohydrates and lymphoma risk were null except a positive association reported between high consumption of white bread or pasta and non-Hodgkin lymphoma (NHL). 35 Although sugar intake was not associated with NHL, 36 in a follow-up study, women who frequently consumed cakes or pies were associated with an elevated risk for NHL. 37 Studying dietary glycemic index and glycemic load in further studies are warranted to clarify this association. 38 We found that consumption of dairy products, calcium, riboflavin (B2) and phosphorus may increase the risk of BCL and DLBCL, in particular among females. Previous studies suggested a positive association between dairy products and risk of NHL, 39,40 particularly for DLBCL. 41 Milk is a source of fat and protein, which are both thought to be risk factors for NHL, 39 as well as calcium, riboflavin and Vitamin A. 35 The positive association between dairy consumption and risk of BCL may be attributed to the effects of dietary calcium and phosphorus, largely found in dairy products, which decrease levels of  Note: Model included age, gender, country and all PCs; Figure 2 and Table S8 shows   and polychlorinated biphenyls compounds that have been associated with increased risk of NHL. Thus, adverse health effects related to their high content in some fish may diminish the otherwise protective effects conferred by fish consumption. 39 We found a possible association between the consumption of animal fats (positive association) and polyunsaturated fatty acids (inverse association) and risk of BCL. Many studies suggest that highfat diets are linked to the etiology of NHL. A recent meta-analysis showed a significant association between total fat consumption and increased risk of NHL and DLBCL, but not for CLL and FL. 46 They found that only high animal fat consumption increases the risk for NHL with no association with vegetable fat consumption. 46 A more recent large prospective study, reported increased risk of NHL associated with intakes of total, animal, saturated and trans fat with 14 years of follow-up. However, these associations did not persist with longer follow-up. 47 Animal fats are comprised of saturated fatty acids and unsaturated fats, whereas vegetable fat has a higher concentration of unsaturated fatty acids. A diet high in polyunsaturated fatty acids has been shown to reduce the levels of pro-inflammatory markers such as interleukin (IL)-6, IL-1 receptor antagonist, tumor necrosis factor, and C-reactive protein, as well as increased levels of anti-inflammatory factors, such as IL-10 and transforming growth factor. 48 On the other hand, saturated fats can modulate immune function by enhancing nuclear factor-κB activation and antiapoptotic behavior in T cells, in addition to increasing expression of proinflammatory agents such as IL-6, cyclooxygenase-2 and inducible nitric oxide synthase. 39 A few studies suggest that the link might be related to changes in serum levels of leptin and adiponectin that stimulate proliferation and inhibit apoptosis through PI3K/AKT activation. 46 We previously showed that BCL risk was associated with a higher ISD and a lower adherence to MD, 24,26 which was confirmed in the present study. The role of inflammation, immune dysregulation and autoimmunity are known in the pathogenesis of lymphoma. 49 Recent studies further support the inflammatory potential of diet 50  Our study also suggests that heavy smoking (>25 per day) may increase the risk of DLBCL. A pooled analysis of case-control studies within the InterLymph consortium showed that current smoking was associated with a significant 30% increased risk of FL, but not NHL overall or other NHL subtypes. 56 Moreover, a meta-analysis of seven prospective studies 57 did not show association between cigarette smoking and NHL. Several factors may explain these inconsistencies including methodological differences in the studies (ie, study design, data collection, categorizations and residual confounding) and differences in population in terms of ethnicity, socioeconomic status, and disease prevalence. Therefore, further research to pursue the association is warranted. One promising direction for future investigation includes refining our understanding of the carcinogens in cigarette smoke and their biological effects that could plausibly contribute to lymphomagenesis. 58 An inverse association between alcohol intake and BCL has been consistently observed in both large case-control 2 and prospective studies. 59 However, it has been hypothesized that this association is driven by unknown confounders. Unlike previous large observational studies, our study did not show the protective effect of alcohol intake in BCL. In contrast, our findings suggested a positive association between alcohol intake (PC9) and FL among women after adjustment for other PCs (HR = 1.24, 95% CI = 1.04-1.46) ( Table 4). Due to insufficient power of the stratified analyses, this finding should be interpreted with caution and further prospective studies are deemed necessary.
The LASSO analyses showed an inverse association between level of education and risk of BCL. Limited and contradictory literature is published about educational level or other SEP indicators and lymphoma risk. 60,61 In line with our findings the InterLymph study showed lower risk of lymphoma and DLBCL among highly educated people. 2 Populations with low SEP may be more exposed to hazard occupational exposures, air pollution, smoking and infections which can increase the risk of lymphoma.
Penalized Cox models revealed an inverse association between history of hypertension and BCL. We recently reported also a negative association with hypertension and with both, systolic and diastolic blood pressure levels in EPIC for all-type lymphomas and for the subgroup of NHL. 62  The strengths of this study include its prospective design, long follow-up, and large size which allowed us to carry out analyses by BCL subentities. Limitations of our study should be considered when interpreting the results, including potential measurement errors derived from questionnaires, which could lead to systematic and random errors. We cannot rule out that they have affected risk estimates.
Our study lacked data on other exposures such as medication, occu- In conclusion, our systematic evaluation confirmed several previously reported risk factors (anthropometric measures, animal fat, dairy and sugar intake) as well as protective factors (MD, diet with lower inflammatory score, fish and Vitamin D, SEP) of BCL and/or subtypes.
While our study did not support the previously reported protective effect of alcohol intake on BCL, it revealed several unknown associations (increased risk of DLBCL with smoking and a beneficial effect of condiments and sauces intake for BCL and DLBCL). In this study, we applied a comprehensive approach for conceptualizing the roles and relationships of multiple exposures in the etiology of BCL and could generate some new insight in BCL risk factors. This highlights that traditional approaches of testing single association at a time could be suboptimal compared with a EWAS approach.