Agnostic Cys34‐albumin adductomics and DNA methylation: Implication of N‐acetylcysteine in lung carcinogenesis years before diagnosis

Although smoking and oxidative stress are known contributors to lung carcinogenesis, their mechanisms of action remain poorly understood. To shed light into these mechanisms, we applied a novel approach using Cys34‐adductomics in a lung cancer nested case–control study (n = 212). Adductomics profiles were integrated with DNA‐methylation data at established smoking‐related CpG sites measured in the same individuals. Our analysis identified 42 Cys34‐albumin adducts, of which 2 were significantly differentially abundant in cases and controls: adduct of N‐acetylcysteine (NAC, p = 4.15 × 10−3) and of cysteinyl‐glycine (p = 7.89 × 10−3). Blood levels of the former were found associated to the methylation levels at 11 smoking‐related CpG sites. We detect, for the first time in prospective blood samples, and irrespective of time to diagnosis, decreased levels of NAC adduct in lung cancer cases. Altogether, our results highlight the potential role of these adducts in the oxidative stress response contributing to lung carcinogenesis years before diagnosis.


Introduction
Lung cancer is the leading cause of cancer mortality worldwide, 1,2 and is causally linked to smoking. [3][4][5][6] Using high throughput DNA-methylation assays, recent studies have identified epigenetic biomarkers of smoking exposure and smoking history. 4,[7][8][9] Some of these markers were also found to be associated with lung cancer risk prospectively, 3,10,11 suggesting the involvement of DNA methylation changes in the mediation of the adverse effect of smoking.
However, in the absence of information functionally linking methylation levels at these CpG sites to general biological pathways, these results only provide a fractional insight into the molecular mechanisms involved in (smoking-induced) lung carcinogenesis. 12 Oxidative stress (OS) has been reported to be related to smoking 13 and is an established contributor to carcinogenesis, 14,15 notably through reactive oxygen species (ROS) whose overproduction can activate OS, and ultimately lead to cell damage, carcinogenesis and chemotherapy resistance. 16,17 As highly reactive species, ROS are rapidly degraded and cannot easily be quantified. However, measuring ROS adducts with human serum albumin (HSA) has been shown to be a promising alternative. 18,19 HSA is the most abundant protein in serum and the cysteine 34 (Cys34) site acts as a scavenger of small molecules such as ROS. Recently, an approach based on untargeted high-resolution mass spectrometry (HRMS) has been developed to measure HSA-Cys34 adducts in serum samples, and identified adducts associated with smoking status. 18 In the present work, we propose to apply that agnostic adductomics approach to 212 plasma samples from prospective lung cancer cases (median follow-up 7 years) and healthy controls from the Italian component of the European Prospective Investigation into Cancer and Nutrition (EPIC-Italy) cohort whose blood samples were also used to generate DNA-methylation profiles. We relate adductomic profiles to smoking metrics and disease outcome and integrate these data with DNA-methylation to capture long-term effects of OS on lung carcinogenesis and to explore the role of smoking and its downstream biological consequences in these processes.

Study population
Plasma samples from study participants were recruited between 1993 and 1998 as part of the Italian branch of the EPIC. 20,21 Our study population includes 106 prospective lung cancer cases identified through local Cancer registries and 106 healthy controls, which were matched by gender, age, recruiting center and year and season of blood collection from the centers of Turin and Varese (details are available in Supporting Information). Our study complies with the Declaration of Helsinki and protocol was approved by the Ethics Committees at the Human Genetics Foundation (IIGM, Turin, Italy).

Exposure data
Detailed information on cigarette smoking was collected at enrollment from questionnaires in the EPIC study and included smoking status and smoking intensity (in cigarettes/day), smoking duration, time since smoking cessation (in former smokers). As previously proposed, 22 the comprehensive smoking index (CSI) capturing the dynamics of lifelong smoking exposure and possible smoking cessation was calculated from this data. 23

Adductomics measurements
The 212 plasma samples were analyzed in duplicate following a previously described protocol based on the identification of third largest tryptic peptide of HSA ("T3," sequence ALVLIAFAQ YLQQC34PFEDHVK, mass 2,432 Da) containing the Cys34 locus and its potential modifications. 18,24,25 Plasma samples were processed for digestion and dilution and subsequently analyzed by Nano Liquid Chromatography and HRMS (NanoHPLC-HRMS) with an Orbitrap Elite coupled with an ion trap mass spectrometer (Thermo Fisher Scientific, Waltham, MA) operated in positive ionization mode. Spectral data were preprocessed using an in-house R script (see Supporting Information for details). Peak areas of T3 adducts were normalized with the corresponding peak area of the "housekeeping" HK peptide (LVNEVTEFAK) to account for differential amounts of HSA. Adducts abundance was expressed as PAR (peak area ratio as defined by adduct peak area/HK peptide peak area × 1,000). 18,24,25 Signal missing in both duplicates was considered below the limit of quantitation (LOQ) and set to LOQ/√2. 18 If only one measurement was missing, it was imputed in cases and controls separately, using nearest neighbor averaging. From our analysis, a total of five adduct features were excluded because they were missing in more than 60% of the samples. While this cutoff was arbitrary, we ensured that we did not induce any bias in our data or exclude any informative or disease-relevant features, by checking that these five excluded features were (i) not differentially missing in cases and controls, (ii) missing across batches and (iii) had poor quality spectral data. All adducts PAR levels were log-transformed.
Annotation of assayed adducts was based on the identification of the monoisotopic mass of the calculated added masses to the T3-peptide, MS 2 spectra and also used the library of the adducts constructed in previous work.

What's new?
Although smoking and oxidative stress are known contributors to lung carcinogenesis, their action mechanisms remain poorly understood. Here, using human serum albumin Cys-34 adductomics to gauge exposure to reactive oxygen species, the authors detected for the first time lower levels of N-acetylcysteine (NAC) adducts in lung cancer cases years before diagnosis. This variation was associated with smoking and hypomethylation at certain smoking-related CpGs. The results indicate a perturbation in the oxidative stress pathways in future cases and call for further investigation into the role of oxidative stress in lung carcinogenesis and the potential use of NAC as a preventive drug.

Methylation data
DNA-methylation data from the Infinium HumanMethylation450 BeadChip assay was also available in our participants from a previous study. 10 Sample preparation and data processing were described elsewhere 26 and are detailed in Supporting Information. We only considered here the 2,671 CpG sites which were found to be associated to smoking in a recent meta-analysis, 9 of which 2,670 were assayed after probes filtering in our data set.

Statistical analyses
Adductomics profiling. The association of the measured levels of a given adduct and (i) exposure to tobacco smoke, and (ii) prospective lung cancer status was evaluated using linear mixed models where the adduct level was modeled as the response, and smoking exposure or disease status as explanatory variable. To model the repeated measurement design, the participant ID was included as a random intercept in the model. Similarly, nuisance variation was modeled including random intercepts for technical confounders 27,28 (see Supporting Information).
Statistical significance of the effect of the variable of interest was evaluated using the p-value of likelihood ratio test comparing the model with and without that variable. 27,29,30 We corrected for multiple testing using a permutation-based approach estimating the per-test significance level to be applied to control the family wise error rate (FWER). 31,32 Adductomics profiling was conducted for three different measures of exposure to tobacco smoke (smoking status, pack years and CSI), and for lung cancer status. For lung cancer analyses, the model was further adjusted on smoking status (current, former or never smoker), and was stratified on the most represented histological subtypes in our study (Supporting Information Table S1).
For all disease-related adducts, we regressed, in cases only, the adduct level (as predictor) against the time to diagnosis (TtD) as defined by the time elapsed from blood collection to lung cancer diagnosis, adjusting for the same variables.
OMICs integration. We investigated if and how adducts that were found to be differentially abundant between lung cancer cases and healthy controls were also related to smoking, by integrating adducts levels and DNA methylation levels at the smoking-related CpG sites. As detailed in the Supporting Information, we ran a series of linear mixed models for each disease-related adduct and each of the 2,670 smoking-related CpG, setting the methylation level as the variable of interest. These models also accounted for technical variation in the methylation data using a two-step strategy first estimating, and second removing technically-induced shifts in measured methylation levels. 22,29,33 Results from these analyses were visualized as a bipartite network where edges were selected based on statistical significance of the pairwise associations, correcting for 2,670 tests.
Receiver operating characteristic (ROC) analyses. In order to quantify the disease-relevant information brought about by the disease-related adducts, we ran series of unconditional logistic models setting the adduct level as predictors. Results were visualized by means of ROC curves and corresponding area under the curve (AUC). To prevent overfitted results, we constructed ROC curves using (N = 5,000) independent subsamples (see Supporting Information). We compared models with (i) only smoking (as measured by pack-years and/or principal components (PCs) from the smoking-related CpG sites), (ii) only the disease-related adducts and (iii) the diseaserelated adducts and smoking metrics.

Data availability
The data that support the findings of our study are available from the corresponding author upon reasonable request.

Study population and adductomics measurements
Of the 212 participants originally included in the study, 15 (5 controls and 10 cases) were excluded from the analysis due to missing data. Main characteristics of the resulting 101 controls and 96 cases are summarized in Table S1, and show, as expected, similar center, gender, age and body mass index Our untargeted adductomics approach located a total of 42 T3-peptide modifications in our samples. Of these 42 adduct features, 25 were annotated as part of this work and 17 remain unknown (see Table S2). Coefficients of variation among duplicate measurements ranged from 0.22 and 0.5 with a median of 0.32 (Table S3). The differential mass of the measured adducts included truncations and additions and ranged from −65.1159 to 462.2047 Da. Annotations were made on the basis of MS and MS n data (and example of MS and MS n data used for annotations is shown for LC34-Nacetylcysteine in Fig. 1). The larger amount of annotated adducts resulted in Cys34 disulfides (cysteine, homocysteine, cysteinyl-glycine [Cys-Gly], glutathione, N-acetylcysteine [NAC], etc.). Other annotations included sulfoxidation products, truncations and addition of cyanide, crotonaldehyde and benzaldehyde. The full description of measured adducts is available in Table S2.
The distribution of all the adducts levels (expressed as PAR) showed right-skewed distributions, and log-transformation attenuated the asymmetric distribution for all adducts (Fig. S1).

Association studies
As summarized in Table S4, we found a general lower abundance of the assayed T3-peptide adducts in relation to smoking status ( Fig. S2). In particular, for cumulative exposure (pack-years) and CSI, the 42 effect size estimates were negative. We found that the levels of 13 adducts were associated with at least one smoking exposure metric at a nominal significance level of 0.05. Only four of these-LC19, benzaldehyde; LC20, unknown; LC34, NAC; and LC35, Cys-Gly-were associated with more than one smoking exposure metric. However, none of these associations survived correction for multiple testing.
Analyses of the lung cancer status indicated that the levels of six T3-peptide adducts were different at a nominal significance level of 0.05 (Fig. 2) between cases and controls, and for all six adducts, lower levels were observed in cases ( Fig. 3 and Table S5). Of these, the adduct of NAC, (LC34, p = 4.15 × 10 −3 , Table S5) survived the correction for multiple testing based on our permutation-based per-test significance level (α 0 = 5.4 × 10 −3 ) controlling the FWER below 0.1, and the adduct of dehydrated Cys-Gly (LC33, p = 7.89 × 10 −3 , Cys-Gly-(-H 2 O)) was borderline significant. Upon adjustment for smoking status (Model 2), the association was attenuated for both adducts (Fig. 2 and Table S5), and in particular for the NAC adduct, which, however remained nominally significant (p < 2.25 × 10 −2 ).
In order to account for histological heterogeneity of lung cancer, we stratified our analyses on the three most common histological subtypes represented in our study: adenocarcinoma (N = 41 cases), large cell carcinoma (LCC; N = 20 cases) and squamous cell carcinomas (N = 15). We did not identify any differentially abundant adducts for squamous cell carcinomas (results not shown). However, we found two T3-peptide adducts with lower levels in adenocarcinomas (p < 0.05). These included LC19 (adduct of benzaldehyde), which was not identified in the analysis of lung cancer as a whole, and Cys-Gly-(-H 2 O), which remained nominally associated to adenocarcinoma status upon adjustment for smoking status (Table 1). Analyses restricted to LCC identified five of the six lung cancer-related T3-peptide adducts as being significantly less abundant in cases (p < 0.05). All these associations remained nominally significant when adjusting for smoking status. Of the five adducts found to be less abundant in LCC cases, NAC and LC1, despite smaller sample size, exhibited larger differences (β < −1.22) and stronger associations (p < 3.5 × 10 −4 , Table 1) than those observed in relation to lung cancer (i.e., irrespective of histological subtype).
Plasma levels of the seven adducts found differentially abundant in lung cancer cases did not support any linear trend in relation to TtD (p > 0.24, Table S6). In particular, the levels of NAC adduct were found particularly stable with TtD for lung cancer as a whole (β < 10 −3 , p > 0.9), and for both histological subtypes separately (p > 0.23). We only identified a significant (and decreasing) trend for adducts plasma levels of Cys-Gly-(-H 2 O), LC10 (unknown) and in a lesser extent, of Cys-Gly (Table S6) in relation to the TtD in prospective LCC cases.

Integration of DNA methylation data
In order to further explore the role of smoking in the adductlung cancer associations, we explored the pairwise associations linking adduct levels of the six lung cancer-related T3 adducts to methylation levels at each of the 2,670 assayed CpG sites that were recently established as epigenetics markers of smoking status 9 (Fig. 4). From a preliminary PC analysis, we estimated that 110 effective tests were performed for each adduct. Using a Bonferroni-corrected per-test significance for that number of tests, we identified 34 CpG sites that were associated to the adduct level of at least one of the lung cancer-related adducts. Of these 34 CpG sites, six were associated with smoking status, cumulative smoking exposure and CSI in our study population (Table S7). The levels of LC1, LC35 and NAC adducts were associated to the methylation levels of 17, 15 and 11 smoking-related CpG sites, respectively (including all the CpG sites associated with smoking in our data at a Bonferroni-corrected significance level). Conversely, levels of Cys-Gly-(-H 2 O) were not found to be associated to the methylation levels of any of the smoking-related CpG site (p > 0.001).

ROC analyses
We restricted our ROC analyses to the two adducts found to be most differentially abundant between lung cancer cases and  (Fig. 5a) and NAC adducts (Fig. 5b), we found that the variation in adduct levels modestly contributed to lung cancer prediction with an average AUC around 0.6. The 76 PCs (explaining more than 80% of the variance of the 2,670 smoking-related CpG sites) yielded an AUC of 0.63, and jointly including the adducts and the 76 PCs from the smoking-related CpG sites did not affect the model performances (mean AUC around 0.63 for both adducts). Models including only cumulative smoking exposure yielded better predictive performances (mean AUC = 0.71 [0.68-0.73]) and the inclusion of the adduct levels of either NAC (Fig. 5a) or Cys-Gly-(-H 2 O) (Fig. 5b) slightly improved the performances of the model with an average AUC over 0.74. The model including levels of both adducts and smoking exposure yielded similar performances (mean AUC = 0.74, ROC curves not shown). Logistic models restricted to LCC (Fig. 5c) show better predictive performances of the plasma levels of NAC (mean AUC = 0.68) and smoking exposure (mean AUC = 0.73). Including both variables in the model further improved its predictive performances and the corresponding mean AUC reached 0.82.
Conversely, analyses of the adenocarcinoma subtype (Fig. 5d) suggest, as expected, a moderate predictive ability of smoking exposure (mean AUC = 0.66) and of the Cys-Gly-(-H 2 O) level (mean AUC = 0.61), and a modest performance improvement while considering both variables in the model (mean AUC = 0.68).

Discussion
In the present work, we use a recently established technique for agnostic Cys34 adductomics, 18 in order to discover adducts potentially involved in lung carcinogenesis. Based on 212 biobanked samples of prospective lung cancer cases and matched controls, we successfully measured 42 adducts. Association studies relating adduct levels to smoking exposure showed a systematically lower abundance of most adducts in relation to smoking exposure. However, none of these reached statistical significance while correcting for multiple testing. Previous analyses have detected ethylene oxide and acrylonitrile in 34 pooled blood sample from (n = 158) smokers. 18 Consistently with what has been observed in a follow-up from that study using individual blood samples, 25 our study, including individual samples of 67 smokers, did not detect ethylene oxide and acrylonitrile. The majority of our results are consistent with a previous study, 18 reporting inverse associations with the Cys34 oxidation products (Cys34-Gln crosslink and the sulfinic and sulfonic acids) with smoking. Our results suggest an inverse association of the NAC adduct in relation to smoking exposure, which contradicts results from Grigoryan et al., 18 but is in line by recent results from Lu et al. 25 considering exposure to smoky coal. This association is consistent with the inverse association that we also detected in relation to lung cancer. Table 1. Results from the association study of the log-transformed levels of the (N = 42) T3 adducts in relation to the two most represented histological subtypes of lung cancer in our study. Results are summarized by the effect size estimate (regression coefficient β) and the corresponding p-value testing the null hypothesis of no association (β = 0). We report results for the model adjusting for technical confounders, age, gender and BMI (Model 1), and for the model additionally adjusting for smoking status (Model 2). For readability, we only report the adducts whose level were found significantly different in cases and controls at a nominal significance level of 0.05 (shown in bold) for at least one histological subtype
Our analysis of lung cancer outcome identified six differentially abundant adducts in prospective cases compared to controls (p < 0.05). The adduct of NAC showed significantly lower levels according to our permutation-based per-test significance level ensuring a FWER < 0.1. The adduct of Cys-Gly-(-H 2 O) was also found to be lower in cases, and only the NAC adduct was also nominally associated with smoking exposure metrics (p < 0.05).
Integrative analyses of DNA methylation identified 11 (of the 2,670) smoking-related CpG sites that were associated with the NAC adduct, whereas none of the CpG sites were associated with plasma levels of the adduct of Cys-Gly-(-H 2 O). The smoking-related CpG sites associated with the NAC adduct (p < 0.05), were all directly associated with adduct levels (except for cpg04095045 which showed an inverse association), suggesting that a reduced adduct level was associated to hypomethylation at these CpG sites. Previous findings have already shown a hypomethylation at these sites in smokers. 4 These associations could indicate a potential similar smokinginduced pathway resulting in or involving hypomethylation and reduced adduct levels at these specific sites.
Our ROC analyses suggest that both adducts have modest predictive abilities for lung cancer as a whole but complement disease-related information from smoking exposure. Analyses restricted to LCC cases showed that plasma levels of the NAC adduct reasonably predicted disease outcome and that predictions were improved by including smoking exposure variables.
The main limitation of our study is the small sample size, which hampers our ability to perform refined sensitivity and/or stratified analyses. The technical challenges and costs of the method have not yet permitted the analysis of a validation set. However, our data arise from two different populations (from Torino and Varese recruiting centers), and as an internal validation approach, we analyzed both data sets separately. Despite resulting lower sample sizes, our conclusions remained unchanged in both subpopulations (results not shown). Nevertheless, our study is the largest (in terms of the number of samples analyzed) performed to date using HRMS-based untargeted adductomics.
NAC is a known antioxidant capable of reducing OS through its ability to scavenge ROS. It is a precursor of L-Cysteine, an essential amino acid involved in the synthesis of reduced glutathione. 34 The lower abundance of the NAC adduct in cases may reflect the lower bioavailability of NAC in serum suggestive of (potentially smoking-induced) dysregulation of redox control. Several studies have indicated that elevated levels of ROS and perturbation of redox homeostasis are hallmarks of carcinogenesis. 35,36 Therefore, maintaining ROS balance appears important to ensure normal cell growth and development. Studies in mice have indicated the ability of NAC supplementation to hinder the carcinogenicity of cigarette smoke after in utero exposure. 37 Supplementation with prenatal NAC was shown to inhibit genomic and postgenomic alterations in the lung due to cigarette smoke . Bipartite network representation of the pairwise associations between each of the six lung cancer related T3 adducts and the methylation levels at the 2,670 CpG sites found associated to smoking in Joehanes et al. and assayed in our study population. Pairwise association was evaluated using a linear mixed model setting the log-transformed adducts levels as outcome and the methylation M-values as predictor. Results are adjusted for technical confounders, age, gender and BMI. In order to correct for multiple testing, we only represent edges if the p-value for the regression coefficient linking the adduct level and the DNA methylation M-value is below 0.05/110, where 110 is the number of principal components required to explain more than 90% of the variance of the original (2,670 dimensional) DNA methylation matrix. For clarity, we only represent CpG sites that are found associated to at least one T3 adduct. The CpG sites are colored according to the p-value measuring their association with smoking status in our study population, and edges are colored according the direction of the adduct-CpG site association: red for direct associations (β > 0), and blue for inverse associations (β < 0). [Color figure can be viewed at wileyonlinelibrary.com] exposure as a consequence of OS, hence implying the potential utility of such a supplementation in the at-risk population. In a multicenter intervention study, NAC supplementation had no effect on former or current smoker patients affected with the lung, head or neck cancer. 38 In another study on healthy smoking volunteers, a supplementation of NAC was associated with a lower level of DNA adducts, micronuclei in mouth floor and urinary excretion of mutagens. 39 Our findings are in line with these  , a and b), or restricting the cases to LCC for NAC adduct (c) and to adenocarcinomas for Cys-Gly-(H 2 O) adduct (d). In each case, we ran three different models: one including smoking exposure as measured by pack-years (in red), one including the level of the adduct of interest (in green), one including both the adduct level and the smoking exposure (in light blue). For the analyses of lung cancer (all subtypes), we also investigated a model including the 76 principal components explaining >90% of the variance of the 2,670 smoking-related CpG sites (orange), and the model including both these components and the adduct level (dark blue). We used a subsampling procedure (repeated independently 5,000 times) of 80% of the study population as training set and report the ROC curves and corresponding AUC in the validation set as defined by remaining 20% of the population. The plain ROC curve (and the point estimate of the AUC) corresponds to the average performance yielded across the 5,000 subsamples, and the colored areas (and AUC ranges) reflect the extreme performances yielded across the subsamples. [Color figure can be viewed at wileyonlinelibrary.com] results and are the first to report, using prospective samples from an observational study, in vivo decreased NAC adduct levels in future lung cancer cases, years before onset.
The peptide Cys-Gly is known to be involved, as a precursor, in the glutathione (GSH) pathway. The γ-glutamyl transferase catabolizes conversion of GSH to Cys-Gly, subsequently to glycine and cysteine, which is further used to resynthesize glutathione. The lower level of the Cys-Gly-(-H 2 O) adduct in cases, therefore, also points to a perturbation in redox biology, with possible depletion of GSH. Interestingly, our results suggest that these modifications in the Cys-Gly-(-H 2 O) adduct are not directly related to smoking exposure or its downstream consequences.
Both our findings are consistent with a perturbation in redox pathways and consequently a potential imbalance in ROS levels and elimination in future lung cancer cases. The perturbation created in the NAC pathway seems to be related to smoking, whereas the effects expressed by the modifications of Cys-Gly do not seem directly related to smoking.
Altogether our approach provides interesting new insights into lung carcinogenesis that is consistent with results from previous trials establishing NAC as a potential preventive drug for OS in smokers and triggers grounds for further investigation of these preventive treatments. Our results highlight the ability of Cys34 adductomics to explore the biological processes by which OS and redox balance dysregulation may contribute, years before clinical manifestations, to carcinogenesis and to investigate the role of smoking in these processes.