Integrative multi-omics analysis of genomic, epigenomic, and metabolomics data leads to new insights for Attention-Deficit/Hyperactivity Disorder

The evolving field of multi-omics combines data across omics levels and provides methods for simultaneous analysis. We integrated genomics (transmitted and non-transmitted polygenic scores), epigenomics and metabolomics data in a multi-omics framework to identify biomarkers for ADHD and investigated the connections among omics levels. We trained single-and multi-omics models to differentiate between cases and controls in 596 twins (cases=14.8%) from the Netherlands Twin Register (NTR) demonstrating reasonable in-sample prediction through cross-validation. Out-of-sample prediction in NTR participants (N=258, cases=14.3%) and in a clinical sample (N=145, cases=51%) did not perform well (range misclassification: 0.40-0.57). The multi-omics model selected 30 PGSs, 143 CpGs, and 90 metabolites. We confirmed previous associations with ADHD such as with glucocorticoid exposure and the transmembrane protein family TMEM , show that the DNA methylation of the MAD1L1 gene associated with ADHD has a relation with parental smoking behavior and present novel findings including associations between indirect genetic effects and CpGs of the STAP2 gene. The results highlighted connections between omics levels, with the strongest connections between indirect genetic effects, CpGs, and amino acid levels. Our study shows that multi-omics designs considering interrelated omics levels can help unravel the complex biology underlying ADHD.


Introduction
Attention-Deficit/Hyperactivity Disorder (ADHD) in children is a highly heritable (75-91%) complex neurodevelopmental disorder that is characterized by high levels of hyperactivity, inattention, and impulsiveness (Roehr, 2013).The prevalence of ADHD in children is 5-8%, and a majority of these children continue to display symptoms in adulthood (Derks et al., 2008;Faraone et al., 2018;Levy et al., 1997;Luo et al., 2019).Persons with ADHD are more likely to be extraverted and entrepreneurs but also have increased lifetime risk of harmful outcomes (Faraone et al., 2021;Martel et al., 2010).
Several single-omics studies involving genomics, epigenomics, and metabolomics have been performed to discover the etiology of ADHD.A large (N = 55,374) Genome-Wide Association (GWA) study identified the first 12 genome-wide significantly associated loci for ADHD and reported a SNPheritability of 21.9% (Demontis et al., 2019).Demontis et al. compared these results with a genetic study of continuous measures of ADHD symptoms in the general population (Middeldorp et al., 2016), concluding that the same genetic variants that give rise to an ADHD diagnosis also affect inattention and impulsivity in the general population.
The biological basis of ADHD has also been explored in Epigenome-Wide Association (EWA) and metabolomics studies.Epigenomics constitute a link between genetic and environment exposures by changes in gene expression with processes such as DNA methylation (Barros & Offenbacher, 2009).
Methylation level variation is associated with exposures such as stressful life events, nutritional factors, and toxins, and is partly explained by genetic factors (Moore et al., 2013).Interestingly, it has been hypothesized that the decrease in hyperactivity with age might be due to epigenetic factors regulating the expression of genes involved in hyperactivity (Elia et al., 2012).EWA studies that measured DNA methylation in cord blood at birth, in saliva from older children, or in blood from adults provide emerging evidence for DNA methylation sites associated with ADHD symptom trajectories and ADHD clinical diagnosis in children and adults (Goodman et al., 2020;Mooney et al., 2020;Neumann et al., 2020;Rovira, Sanchez-Mora, et al., 2020;Walton, 2019).EWA studies based on blood samples from adolescents and adults identified multiple new candidate loci involved with immune and neuronal functioning associated with adult ADHD symptoms in population-based samples (van Dongen et al., 2019) and loci involved in cholesterol signaling in a comparison of participants with persistent ADHD and controls (Meijer et al., 2020).
Metabolomics studies utilize a profile of small molecules derived from cellular metabolism (Liu & Locasale, 2017).The first studies of metabolomics and ADHD identified multiple metabolites associated with ADHD in blood plasma (Wang et al., 2020).For instance, metabolites involved in oxidative stress pathway(s) were positively associated with ADHD in children.Meta-analyses showed that across all ages, proteins involved in dopamine metabolism and the fatty acid docosahexaenoic acid (DHA) in different tissues were negatively associated with ADHD (Bonvicini et al., 2016(Bonvicini et al., , 2018)).
However, sample sizes of the metabolomics studies concerning ADHD remain relatively small and it is likely that other metabolites and biological pathways influencing ADHD are still to be identified (Bonvicini et al., 2018).
The increasing numbers of single-omics studies for ADHD indicate that information across multiple omics levels, including genomics, epigenomics, and metabolomics data is increasingly collected in large cohorts and analyzed in relation to health and behavioral outcomes.Single-omics analyses contribute to an understanding of the etiologies for the complex outcomes, but fail to include the interactions between the different omics levels.A next step is to combine multiple omics levels into multi-omics approaches to provide a basic understanding of how different omics levels are associated with specific phenotypes such as ADHD (Hasin et al., 2017), but also how they are interconnected.
Associations and correlations across omics levels have been reported between the genome and epigenome (van Dongen et al., 2016;Min et al., 2021), the genome and the metabolome (Hagenbeek, Pool, et al., 2020) and the epigenome and the metabolome (Gomez-Alonso et al., 2021).To optimally analyze multi-omics data in association and etiological studies, dedicated statistical treatment of simultaneous omics influences is required (Durufle et al., 2020), with simultaneous modeling completing approaches in which separate analyses are applied to each omics level.Such innovative multi-omics analyses may result in novel insights and uncover new biological pathways underlying traits and diseases (Rajasundaram & Selbig, 2016).For example, in cancer research, multi-omics findings hold promise to be one of the keys leading to improved personalized medicine by identifying disease markers (Chakraborty et al., 2018).
In our study, we applied an integrative multi-omics approach to further elucidate biological mechanisms underlying ADHD and identify potential biomarkers.We integrated genetic data comprising transmitted and non-transmitted (e.g., genetic nurture; Kong et al., 2018) polygenic scores (PGSs) for 15 traits, which are genetically correlated with ADHD, with genome-wide methylation data, and urinary metabolomic data for 854 twins (N cases = 125, N controls = 729, mean age = 9.5 years [SD = 1.9], % female = 50.7,359 complete twin pairs) originating from the Netherlands Twin Register (NTR; Ligthart et al., 2019).The ADHD status was based on dichotomization of T-scores based on the mother-rated ADHD DSM-oriented scale of the Achenbach System of Empirically Based Assessment (ASEBA) Child Behavior Checklist (CBCL).The omics data were analyzed to 1) assess their association with ADHD in multivariate single-omics analyses; 2) assess the pairwise cross-omics connections among the variables selected by the single-omics analyses across the different omics levels; and 3) obtain multi-omics biomarkers from a predictive model for ADHD and explore the connections of those biomarkers.First, we build the single-and multi-omics models in 70% of the NTR data.Next, we predict ADHD status in the remaining 30% of the NTR data.We further explored prediction in an independent Dutch clinical cohort of 145 child participants, of which 74 ADHD cases and 71 controls which had another psychiatric diagnosis (mean age = 10.1 years [SD = 1.7], % female = 25.5).The three phases of the analytic design are summarized in Figure 1.

Study population and procedures
Participants were selected from the Young Netherlands Twin Register (van Beijsterveldt et al., 2013), and children referred to a youth psychiatry clinic (LUMC-Curium, the Netherlands), as part of the ACTION (Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies) Biomarker Study (Barters et al., 2018;Hagenbeek, Roetman, et al., 2020).
The ACTION project collected first-morning urine samples and buccal-cell swabs from 1494 twins (747 complete pairs) and 189 children from the clinical cohort with standardized protocols (see http://www.action-euproject.eu/content/data-protocols).The first-morning urine samples were stored in a freezer at the child's home (T ≈ −18 • C) and were transported to the lab in a mobile freezer unit at T = −18 • C. In the lab, urine samples were stored at T = −80 • C until further processing.The buccal swabs were collected during two consecutive days: twice in the morning (before breakfast) and twice in the evening (before dinner).In the twin cohort, buccal-cell swabs were also collected from parents and siblings of the twins.Participants were excluded if no ADHD status at the time of biological sample collection was available (N = 209), if they were assigned a control status while their co-twin was assigned a case status for ADHD (N = 51), if the collected urine was not the first-morning urine (e.g., parent-reported time of urine collection was after 12:00 in the afternoon, N = 13), or if any of the omics data were not available because of poor quality (N = 367).From the twin cohort, 854 out of 1494 participants and from the clinical cohort, 185 participants out of 189 had complete phenotype and omics data across all omics levels (genomics, epigenomics, and metabolomics), see Supplementary Table 1.

Computation of principal components
From the best-guess 1000 genomes imputed data, a complete list of all platform-genotyped SNPs on both AXIOM as well as Illumina GSA were extracted from the NTR and LUMC-Curium cohorts.This makes a single dataset without large SNP missingness where most genotypic information from both platforms is used.These SNPs were also extracted from the 1000G reference panel genotype data with additional filters for MAF (>0.05), and call rate (>0.98).In the 1000G population alone, the SNPs were then LD pruned (PLINK1.9;--indep 50 5 2), and long-range LD regions were removed (Abdellaoui et al., 2013).Then the NTR, LUMC-Curium, and 1000 genomes sets were merged for all SNPs that passed the above QC criteria and were present in all sets.Twenty principal components were subsequently calculated using the 1000G populations and then projected on the data (SMARTPCAv7; Price et al., 2006).

Transmitted and non-transmitted alleles and calculation of polygenic scores
The ACTION Biomarker Study included twins and siblings of twins with two genotyped parents, for whom allele transmission could be established based on the imputed best guess data.SNPs were removed from the imputed data based on the following criteria: 1) MAF (< 0.01); 2) HWE p-value (< 1×10 -5 ); 3) call rate (> 0.98); 4) SNPs with duplicate positions; 5) SNPs with 3 or more alleles; and 6) non-ACGT SNPs on the autosomes.All the remaining SNPs were used to create the transmitted alleles dataset.The non-transmitted alleles dataset was created by generating a single transmissiondisequilibrium test (TDT) pseudo-control genotype for each child (given the two parents; Plink-tucc option) after defining all children as being cases (Clayton, 1999).To determine the maternal and paternal transmission of haplotypes, the transmitted and non-transmitted alleles datasets were phased (SHAPEITv2.r904).The resulting haplotypes were converted into mother and father non-transmitted homozygous haploid genotypes for polygenic score analyses of non-transmitted alleles per parent.
PGSs were calculated for transmitted and non-transmitted alleles based on multiple discovery GWA meta-analyses, that omitted NTR from the discovery meta-analysis to avoid overlap between the discovery and target samples.In a GWA study by Ip et al. (2021) of childhood aggression the genetic correlation with ADHD was 1.0028 (se = 0.0732, p = 9.39x10-43) and we included 14 traits, which showed significant (p < 0.02) high genetic correlations (≤ -0.40 or ≥ 0.40) with ADHD and aggression.
The linkage disequilibrium (LD) weighted betas for all traits (see Table 1), were estimated (LDpredv0.9,causal fraction = 0.50), in randomly selected 2,500 2 nd degree unrelated individuals from the NTR as a reference population to obtain LD patterns (Vilhjalmsson et al., 2015).For all phenotypes we obtained a transmitted and two non-transmitted (1 non-transmitted for the father and 1 for the mother) PGSs from the LD corrected betas (PLINKv1.9).Thus, in total, 45 PGS were specified for each child.The effects of sex, age at biological sample collection, genotype platform, and the first 10 genetic principal components (PCs) were regressed on the standardized (mean of 0 and standard deviation of 1) PGSs and these residuals were included in the analyses.

Metabolomics
Urinary metabolomics data were generated by the Metabolomics Facility of Leiden University (Leiden, the Netherlands) on three platforms: 1) a liquid chromatography-mass spectrometry (LC-MS) platform targeting amines; 2) a LC-MS platform targeting steroid hormones; and 3) a gas chromatography (GC) MS platform targeting organic acids.Subjects were randomized across batches, whilst retaining twin pairs on the same plate.Each batch included a calibration line, QC samples (pooled aliquots from all urine samples; every 10 samples), sample replicates, and blanks.In-house developed algorithms were applied, using the pooled QC samples, to compensate for shifts in the sensitivity of the mass spectrometer across batches.Metabolites were reported as 'relative response ratios' (target area/area of internal standard) after QC correction, and metabolites with a relative standard deviation of the QC samples larger than 15% were excluded (6 amines, 3 steroids, and 1 organic acid).Metabolite measurements that fell below the limit of detection/quantification were imputed with half of the value of this limit, or when this limit was unknown with half of the lowest observed level for this metabolite.
The effects of sex and age were then regressed from the sample-median normalized and inverse normal rank transformed urinary metabolites.

Metabolomics platforms
Ultra-performance liquid chromatography-mass spectrometry (UPLC-MS) was used to measure 66 amines and 13 steroids (pre-QC).To measure the amines, methanol was added to 5 μL of spiked (with internal standards) urine for protein precipitation and centrifugation of the supernatant.Next, the sample was reconstituted in a borate buffer (pH 8.5) with AQC reagent after sample evaporation (speedvac).Chromatic separation was achieved by an Agilent 1290 Infinity II LC system ( 1290Multicolumn Thermostat and 1290 High-Speed Pump; Agilent Technologies, Waldbronn, Germany) with an Accq-Tag Ultra column (Waters Chromatography B.V., Etten-Leur, The Netherlands).The UPLC was coupled to electrospray ionization on an AB SCIEX quadrupole-ion trap (QTRAP; AB Sciex, Massachusetts, USA).Analytes were monitored in Multiple Reaction Monitoring (MRM) using nominal mass resolution and detected in the positive ion mode.
To measure the steroid metabolites, internal standards were added to 90 μL of urine and samples were filtered with a 0.2μm PTFE membrane.Chromatographic separation was achieved by UPLC (Agilent 1290, San Jose, CA, USA), using an Acquity UPLC CSH C18 column (Waters), with a flow of 0.4 mL/min over a 15 min gradient.Samples were then transferred to a triple quadrupole mass spectrometer (Agilent 6460, San Jose, CA, USA) with electrospray ionization.By switching, positive and negative ion mode analytes were detected in MRM using nominal mass resolution.
For the reporting the levels of 21 organic acids, GCMS was applied.Liquid-liquid extraction with ethyl acetate was applied twice to 50 μL of spiked (with internal standards) urine to extract the organic acids and remove urea.Online derivatization procedures were performed in two steps: 1) oxidation with methoxamine hydrochloride (MeOX, 15 mg/mL in pyridine); and 2) N-Methyl-N-(trimethylsilyl)trifluoroacetamide (MSTFA) silylation.Chromatic separation was performed on a 25m (HP-5MS UI) film thickness 30 x 0.25m ID column, with helium as a carrier gas (1,7 mL/min).The mass spectrometer (Agilent Technologies, Waldbronn, Germany), using a single quadrupole with electron impact ionization (70 eV) was operated in SCAN mode (mass range 50-500), using 1 μL of the sample.
In total, 21 organic acids were successfully measured.

Overview
To examine molecular variation associated with ADHD for PGSs, DNA methylation, and metabolomics, and to build a predictive model for ADHD, we applied an integrative multi-omics method called sparse multi-block supervised analysis (Rohart et al., 2017;Singh et al., 2019).All analyses were performed in R (version 4.1.1)mainly using the mixOmics package (version 6.16.3;Rohart et al., 2017).The analytical design consisted of three steps as previously described in Hagenbeek et al (2022; under review): 1) three single-omics analyses; 2) two sets of pairwise crossomics analyses; and 3) a multi-omics analysis (Figure 1).The twin sample was randomly split at twin pair level to create two subsets: 70% of the data were used for model training (training data), and 30% of the data for model testing (test data; Table 2).The performance of final single-and multi-omics models were explored in a follow-up clinical sample.
Step 1: single-omics analyses Principal Component Analysis (PCA) was run separately for the 45 PGSs, the 78,772 CpGs in the DNA methylation data, and all 90 metabolites in the training data to obtain a first insight into the dimensionality in each omics level.Next, Partial Least Square Discriminant Analyses (PLS-DA) was applied in the training data to assess the ability of each of the three omics levels to correctly classify ADHD status.We determined the maximum number of components for to include in the PLS-DA models based on the PCA results using the elbow method to obtain the optimal number of principal components (Supplementary Table 2; Supplementary Figure 1).The optimal number of components to include in each PLS-DA model was determined by 5-fold cross-validation (CV) with 50 repeats (mixOmics perf function; Supplementary Table 3; Supplementary Figure 2).To reduce the number of variables in each omics layer that contribute to each component we applied sparse PLS-DA (sPLS-DA; mixOmics tune function with 5-fold CV, 50 repeats).We first determined the optimal number of components (at least two) and assessed the performance of the final sPLS-DA model through CV (mixOmics perf function with 5-fold CV, 50 repeats; Supplementary Table 3; Supplementary Figure 3-4).Three types of distance measurements were assessed; 1) centroid distance; 2) Mahalanobis distance and 3) maximum distance.The distance that showed the best predictive performance in the training data was selected for the final models.
We evaluated the out-of-sample predictions in the test and follow-up data (mixOmics predict function) based on the best performing model (i.e., the model with the optimal number of components, traits per component, and distance measure) determined during the training of the model (Supplementary Table 3).We applied three methods to evaluate how well the final models classified case-control status.First, we employed a balanced misclassification rate, the balanced error rate (BER; false-negative + false positive rate), that corrects for imbalances in the number of cases and controls.Second, we calculated the sensitivity, specificity, and accuracy of the single-omics models by comparing the predicted cases and controls with the true cases and controls.Third, we assessed the Area Under the Curve (AUC) for each component in both the test and follow-up data with Receiver Operating Characteristic (ROC) analysis.
Step 2: Pairwise cross-omics analyses We constructed Partial Least Squares (PLS) regression models in canonical mode to consider equivalently both datasets in the training data to investigate pairwise cross-omics connections (i.e., PGSs-DNA methylation, PGSs-metabolomics, and DNA methylation-metabolomics).We calculated the correlations among the loading scores of the PLS components for the selected omics traits.We included 2 components for the PGS-DNA methylation model, 2 components for the PGSmetabolomics model, and 5 components for the DNA methylation-metabolomics model.In total, we ran two sets of PLS models: three models for each pairwise combination of omics data that included all 45 PGSs, 78,772 CpGs, or 90 metabolites.These provide insight into the correlations of the loading scores among all omics traits.The next three models for each pairwise combination of omics data included only the 17 PGSs, 486 CpGs, and 90 metabolites that were selected by the single-omics sPLS-DA models (Supplementary Data 2).This second set of models provides insight into the correlations of the loading scores among the omics traits that best contribute to the classification of ADHD cases and controls in the single-omics sPLS-DA models.We performed hierarchical clustering to identify biological relevant clusters using the Ward linkage algorithm on Euclidean distances of the PLS variates of the PLS models that included the omics traits as selected by the single-omics sPLS-DA models, and extracted the two largest clusters for both of the omics levels included in the PLS models ('dendextend' R-package; Galili, 2015).
Step 3: Multi-omics analyses The multi-omics analysis was conducted through multi-block sPLS-DA (MB-sPLS-DA) in the training data including an empirical design matrix, based on the correlations among the loading scores from the pairwise cross-omics PLS models including all omics traits (Supplementary Table 4).We chose the maximum number of components to include in the MB-PLS-DA models based on the results of the single-omics sPLS-DA models (Supplementary Table 3).A five-fold CV with 50 repeats was used to determine the optimal number of components to include in the MB-PLS-DA models (mixOmics perf function), the number of traits to include per component per omics layer (mixOmics tune function), and the performance of the final MB-sPLS-DA model (mixOmics perf function; Supplementary Table 5; Supplementary Figure 5-7).
The accuracy of out-of-sample case-control status prediction was evaluated in the test and follow-up data (mixOmics predict function), using the best performing model (i.e., the model with the optimal number of components, traits per component, and distance measure) as was determined during model training (see Supplementary Table 5).The final multi-omics models were evaluated by their BER, and the sensitivity, specificity, and accuracy of the models were calculated from their confusion matrices.In the multi-omics model, the ROC curves were calculated per component for each omics layer.

Biological characterization
We describe the correlations among the loading scores of the PLS components of the PGSs, CpGs, and metabolites that were selected by the single-omics sPLS-DA and the multi-omics MB-sPLS-DA models to facilitate high correlation patterns suitable for biological interpretation.We performed enrichment analysis for the CpGs selected across our analyses for all phenotypes (618) in the EWAS atlas on the 10th of December 2021 (Xiong et al., 2022).We performed enrichment on 1) the CpGs that were selected into the single-omics sPLS-DA model, 2) the CpGs in the clusters identified by the pairwise cross-omics analyses, 3) the CpGs that were selected for the multi-omics MB-sPLS-DA model, and 4) the CpGs that were included in the high correlation patterns of the multi-omics MB-sPLS-DA.When fewer than 20 CpGs were selected, the trait associations with the CpGs were manually retrieved from the EWAS atlas.We focused the enrichment analysis on the CpGs since there is no similar tool like the EWAS atlas for the metabolites.

Single-omics models for ADHD
Based on the sPLS-DA analyses for PGSs, DNA methylation, or metabolomics data we built singleomics predictions models for ADHD case-control status.The single-omics models included 17 PGSs, 486 CpGs, and all 90 metabolites (Supplementary Trait enrichment analyses for all 486 selected CpGs in the sPLS-DA model showed the strongest enrichment for glucocorticoid exposure (i.e., based on an EWAS of administration of corticosteroid medication and DNA methylation measured in buccal cells (Braun et al., 2019); N Overlap Differently Methylated CpGs (DMC) = 13, OR = 7.66, p = 1.12x10 -14 ), and ancestry (DMC = 15, OR = 2.61, p = 5.73x10 -06 ; Supplementary Table 7).Model performance was sub-optimal in the test and follow-up data for all three omics levels (AUC range = 0.45-0.60),as was also illustrated by the low degree of separation among the cases and controls (Supplementary Table 6; Supplementary Figure 9-10).

PGS-DNA methylation analysis
An average correlation of r = 0.23 (q = 8.92x10 -16 ) was found between the PGSs and DNA-methylation in the loadings scores of the pairwise PLS cross-omics model (Supplementary Table 4).For the pairwise model, we applied hierarchical clustering which identified two clusters for the PGS and DNA methylation data (Figure 2a; Supplementary Data 3).Cluster 1 contains 8 of the PGSs (47.1%) selected by the sPLS-DA model for ADHD, and comprised the transmitted PGSs for ADHD, insomnia, MDD, smoking initiation, and the number of cigarettes per day, the non-transmitted maternal PGSs for insomnia and childhood aggression, and the non-transmitted paternal PGS for ADHD.The 9 PGSs (52.9%) included in cluster 2, comprised the transmitted PGS for EA, the non-transmitted maternal PGSs for age at smoking initiation, childhood IQ, and self-reported health, and the non-transmitted paternal PGSs for age at smoking initiation, ASD, the number of cigarettes per day, and intelligence.

PGSs-Metabolomics analysis
Hierarchical cluster of the pairwise PGSs-metabolomics model, with an average correlation between the loading scores of r = 0.26 (q = 1.02 x10 which are amines (80% of all amines).The remaining 12 amines, as well as all steroids and organic acids, have been included in metabolite cluster 2 (N = 42, 46.7%).Overall, we observed correlations ranging from -0.17 to 0.25 for the PGSs in cluster 1 with metabolites, and from -0.22 to 0.16 for the cluster 2 PGSs with metabolites (Figure 2b; Supplementary Data 5; Supplementary Table 9).For the metabolites in cluster 1 the correlation ranged between -0.22 to 0.16 with PGS, and for cluster 2 between -0.17 to 0.25.
The correlation of the CpGs included in cluster 1 with metabolites range from -0.23 to 0.29 (Figure 2c; Supplementary Data 6; Supplementary Data 7).The CpGs included in cluster 2 have correlations ranging from -0.27 to 0.28 with metabolites.For the metabolites in cluster 1 the correlations with CpGs ranged from -0.26 to 0.25, and for cluster 2 from -0.27 to 0.29.

Multi-omics model for ADHD
We built a multi-omics panel for ADHD based on a multi-block sPLS-  11-12).

Discussion
We present an integrative multi-omics analysis of childhood ADHD based on genomics, epigenomics, and metabolomics data to identify potential multi-omics biomarkers and to investigate the connections among the different omics levels.The single-omics models selected 17 PGSs, 486 CpGs, and 90 metabolites and the multi-omics model selected 30 PGSs, 143 CpGs, and 90 metabolites, and several of the selected variables support known associations with ADHD.The out-of-sample predictions were sub-optimal in both the single-and multi-omics models (BER range: 0.40-0.57and 0.42-0.55,respectively).We evaluate the pairwise variable associations using an approximation of their correlation coefficient as detailed in Gonzalez et al. (2012).The averaged values range from 0.23 to 0.29.The connections of the ADHD biomarkers in the multi-omics model highlighted several possibly involved genomic regions and further supported previously identified markers such as the gene MAD1L1, childhood stress, and glucocorticoid exposure.
The single-omics models selected 17 PGS for 12 different phenotypes, while the multi-omics model selected 30 PGSs comprising all 15 phenotypes.For some traits, only the transmitted PGSs were selected such as ADHD, insomnia, MDD, EA, smoking initiation, and cigarettes per day.These traits showed genetic correlations with ADHD in recent studies (Demontis et al., 2019(Demontis et al., , 2021)).One study that investigated the intergenerational transmission for EA and ADHD showed that transmitted PGSs influenced ADHD behavior at school (de Zeeuw et al., 2020).For traits as aggression, self-reported health, age of smoking initiation, intelligence, childhood IQ, and autism we observed that only nontransmitted PGSs were selected.This may represent an important observation: the genotype of the parents can help predict the ADHD status of their offspring, when we consider their non-transmitted scores across a broad range of traits.We do note that the predictive accuracy of the model is currently low, but may improve when PGSs improve with increasing GWAS sample size.
In the single-and multi-omics models, we observed enrichment for traits involved in diet exposure, such as glucocorticoid and vitamin B12 levels.We also found association for THM exposure, a class of chemical compounds that can arise when water is disinfected with chlorine (Hood., 2005).Still, we have to note that the number of overlapping DMCs with all known DMCs in the EWAS atlas was usually relatively small.The multi-omics model selected two novel CpGs located in the gene MAD1L1 previously linked to ADHD in blood and saliva tissue (Goodman et al., 2020;Mooney et al., 2020;Rovira, Sanchez-Mora, et al., 2020).These CpGs also showed a connection with the non-transmitted paternal cigarettes per day PGS in our study.The connection suggests that the association between ADHD and the methylation around MAD1L1 might be caused by the effects of paternal smoking or that paternal smoking is a confounder between ADHD and the methylation around MAD1L1 (e.g., parental smoking leads to both ADHD and DNA methylation, but there is no connection between the DNA methylation and ADHD).Still, to our knowledge this is the first observation of paternal smoking effects on ADHD and paternal smoking should merit investigation in future studies.An earlier multiomics study identified several genes whose expression or DNA methylation level in (fetal) brain mediates the effects of genetic variants on ADHD, including a gene for a transmembrane protein TMEM125 (Hammerschlag et al., 2020).We observed four CpGs located in three genes of this same gene family (TMEM100, TMEM135, and TMEM140) had a connection with the transmitted childhood IQ PGS.The findings of MAD1L1 and the TMEM family support the results of the previous studies and imply an effect of these genomics regions on ADHD.
Both the single-and multi-omics models selected all 90 metabolites, suggesting that a combination of all metabolites discriminates better between ADHD cases and controls.While we included 90 metabolites across three broad classes of metabolites, much of the metabolome was not covered (Bowling & Thomas, 2014).For instance, sex hormones are hypothesized to play an important role in ADHD, and a multi-omics study of ADHD which combined GWA results with protein coding gene interaction data-sets found a link between ADHD and an androgen antagonist (Camara et al., 2021;Cheng et al., 2020).Our steroid hormone platform, which was developed specifically for steroid assessment in urine samples, covered only 4 sex hormones.The inclusion of additional sex hormones might enhance our insights.
We observed three connection patterns in our multi-omics analysis where the omics variables showed correlation coefficients of r ≥ 0.60.As a results, we can make several observations about these patterns.
First, only amines (mostly essential amino acids) showed high negative correlations with CpGs (r range: -0.71 to -0.60) and we observed no high correlations between the organic acids and the steroids and the other omics levels.Recent studies have linked several of these essential amino acids, such as glycine, serine, leucine, and valine, to childhood ADHD (Anand et al., 2021;Yu et al., 2021).Essential amino acids that come from dietary intake.This suggests a relation between diet, DNA methylation, and metabolites involved in ADHD and that DNA methylation mediates the effect of diet on metabolite levels.However, the reverse might also hold, where the metabolites affect CpG levels, as has been previously shown for lipid levels (Dekkers et al., 2016).A previous study suggested IGF2 methylation mediates the effect of prenatal diet on ADHD symptoms in childhood (Rijlaarsdam et al., 2017).Our observation concerning the amino acids supports hypotheses that essential amino acids affect childhood ADHD, and that diet might help to improve the symptoms of ADHD (Anand et al., 2021;Hamza et al., 2019;Yu et al., 2021), and makes it worthwhile to examine causal relations among diet, DNA methylation, and amino acids.
Another multi-omics connection pattern comprised the transmitted PGSs for ADHD and self-reported health with 6 CpGs and 10 amino acids.The 6 CpGs are located in the STAP2 gene (Signal Transducing Adaptor Family Member 2) on chromosome 19.No SNPs or CpGs in or around this gene were associated with ADHD in the latest GWA or EWA studies, suggesting STAP2 is a novel genomic region that could be involved in ADHD (Demontis et al., 2019(Demontis et al., , 2021;;Goodman et al., 2020;Mooney et al., 2020;Neumann et al., 2020;Rovira, Sanchez-Mora, et al., 2020;Walton et al., 2017Walton et al., , 2019)).The 6 CpGs and the STAP2 gene have been associated with other traits such as smoking behavior, ageing, and Type 2 Diabetes in blood (Xiong al., 2022; Supplementary Data 10).Although we observed no connection between these CpGs and the smoking-related PGSs, it should be taken into account that the smoking PGSs are an imperfect measure of overall smoking exposure.
In contrast to previous multi-omics studies for ADHD, which relied on sequential integration of summary statistics (Cheng et al., 2020;Hammerschlag et al., 2020;Korovila et al., 2017), our multiblock integration approach allowed simultaneous modeling of multiple omics levels and gain insights into the relationships among the different omics levels (Wörheide et al., 2021).Another asset of our study is that by employing an integrative multi-omics design, we reduced the detrimental heterogeneity due to, for example, differences in sample size or study protocols (Wörheide et al., 2021), which are commonly observed in sequential integration methods.In the current study, such undesirable sources of heterogeneity were minimized by our choice of analytical design and by streamlining the collection protocol for biological samples across twins from the NTR and patients from LUMC-Curium.We collected the biological samples in both cohorts in parallel and generated the omics data for both cohorts together.Other than the difference in cohort type, population vs. clinical, the main difference between these cohorts is the absence of genotyping information for the parents of the children included in LUMC-Curium and non-transmitted PGSs could not be assessed in the clinical cohort.
Overall, our study underlines the complexity of ADHD, as it remains a challenge to detect multi-omics biomarkers for the trait.We showed different omics levels to be connected with each other regarding ADHD, leading to new hypotheses about these cross-omics interactions.The connection patterns reproduced previously identified associations with ADHD, such as the MAD1L1 gene and glucocorticoid exposure, while also highlighting new interesting genomic regions such as the STAP2 gene.It is of value to continue improving multi-omics methods since these will become increasingly important in the translation from association studies to clinical applications.The integrated approach we applied in this study requires raw data analysis and requires that all participants are measured across all omics levels.Solving the challenge of missing data in multi-omics approaches will lead increased sample sizes and increase power in future projects.Multi-omics might be especially promising for complex traits such as ADHD, where clinicians have been struggling to find fitting treatments for patients and struggle to predict the persistency of ADHD (Caye et al., 2019).By improving methods and enhancing models, as well as by including more omics levels such as transcriptomics, proteomics, and exposome (Cheng et al., 2020;Liu et al., 2021), the identification of multi-omics biomarkers will eventually aid in diagnosing and improving personalized treatment plans for persons diagnosed with ADHD.
The  We employed an analytical design consisting of three phases: 1) single-omics analyses; 2) pairwise cross-omics analyses; and 3) multi-omics analyses.First, we built single-omics biomarker panels in the twin cohort, with 70% of the twin data for model training (training data), 30% of the twin data for model testing (test data), and the clinical cohort (follow-up data).Second, the overall pairwise crossomics correlations among the loading scores of the PLS components for all omics traits and the selected omics variables in the single-omics models were examined in the training data.Third, using the same data split for model training, testing and follow-up, we built multi-omics biomarker panels and described the multi-omics connections of the selected omics traits.The details about the analyses are provided in the Methods section.The outer ring depicts the PGSs, CpGs, and metabolites in yellow, pink, and green, respectively.For the polygenic scores (PGSs), Educational Attainment is abbreviated as "EA", and the '_NTm' suffix denotes the PGSs non-transmitted by mother.The inner plot depicts the correlations among the omics traits.For the metabolites, the 'amines.' prefix indicates these metabolites were measured on the Liquid Chromatography Mass Spectrometry (LC-MS) amines platform, and the 'OA.' prefix indicates these metabolites were measured on the Gas Chromatography (GC-) MS organic acids platform.Here, only high absolute correlations (r ≥ 0.60) between traits of at least two omics levels are depicted, with blue lines reflecting negative correlations and red lines positive correlations.
Correlations are averaged across all components in the MB-sPLS-DA model.The full correlation matrix is included in Supplementary Data 9 and the correlational patterns are included in Supplementary Data 10.
Tables Table 1.Overview of the discovery genome-wide association studies to calculate polygenetic scores.Mean (SD) age 9.5 (1.9) 9.1 (1.6) 9.5 (1.9) 9.6 (2.0) 9.5 (1.9) 9.6 (1.9) 10.5 (1.7) 9.8 (1.7) 10.1 (1.7) Range age 5.6 -12.9 5.8 -12.5 5.6 -12.9 5.7 -12.7 6.0 -12.2 5.7 -12.7 7.0 -13.3 6.3 -13.4 6.3 -13.4 The ADHD status in twins was based on sex-and age-specific T-scores from the mother-rated ADHD DSM-oriented scale of the ASEBA CBCL as completed at biological sample collection.For twins, Tscores were calculated in the entire NTR cohort of children with a CBCL ADHD measure available at the 9-10 years of age (N = 23,858).In the clinical cohort, parents (90% mothers, 10% fathers) completed the CBCL as part of a standardized clinical assessment a maximum of 6 months before or after biological sample collection.T-scores were calculated with the CBCL software.In the NTR, children with ADHD CBCL T-scores of 65 or higher were classified as cases (N = 125) and children with T-scores below 65 as controls (N = 729).The LUMC-Curium cohort represents a clinical cohort, and children with T-scores between 65 and 69 were excluded to prevent effects of subclinical participants (N = 40).Children with T-scores of 70 or higher were classified as cases (N = 74) and children with T-scores below 65 were classified as controls (N = 71).

Figure 1 .
Figure 1.Workflow of the analyses performed in this study.

Figure 2 .
Figure 2. Clustered heatmaps of the correlations among the loading scores of the Partial Least Squares (PLS) components including only the omics traits as selected by the single-omics sparse Partial Least Squares Discriminant Analyses (sPLS-DA).

Figure 3 .
Figure 3. High cross-omics correlations among the loading scores of the PLS components of the multi-omics traits identified in the 4-component multi-block sparse Partial Least Squares Discriminant Analysis (MB-sPLS-DA) model.

Table 3 ; Supplementary Data 2; Supplementary Figure
8).The 17 selected PGSs in the sPLS-DA model comprised the transmitted PGS for ADHD, smoking initiation, Major Depressive Disorder (MDD), educational attainment (EA), number of cigarettes per day, and insomnia.The non-transmitted maternal PGSs were selected for insomnia, selfreported health, childhood aggression, age of smoking initiation, and childhood IQ.The nontransmitted paternal PGSs were selected for ADHD, intelligence and Autism Spectrum Disorder (ASD), EA, number of cigarettes per day, and the age of smoking initiation (Supplementary Data 2).

Supplementary Table 11; Supplementary Table 6; Supplementary Figures
day, insomnia, loneliness, childhood IQ, self-reported health, and age smoking initiation.The non-transmitted paternal PGSs were selected for ADHD, aggression, cigarettes per day, insomnia, loneliness, age at first birth, intelligence, smoking initiation, age smoking initiation, and ASD.Trait enrichment analyses for all selected CpGs in the MB-sPLS-DA model showed the strongest enrichment multi-omics model comprised the transmitted PGS for ADHD, aggression, cigarettes per day, insomnia, loneliness, childhood IQ, self-reported health, age at first birth, intelligence, smoking initiation, EA, MDD, and wellbeing.The non-transmitted maternal PGSs were selected for aggression, cigarettes per current work is supported by the "Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies" project (ACTION) and the Consortium on Individual Development (CID).ACTION received funding from the European Union Seventh Framework Program (FP7/2007-2013) under grant agreement no 602768.NH is supported by the Royal Netherlands Academy of Science Professor Award (PAH/6635) to DIB.We acknowledge the CID Gravitation Program of the Dutch Ministry of Education, Culture, and Science and the .eu/content/data-protocols.The data of the Netherlands Twin Register (NTR) ACTION Biomarker Study may be accessed, upon approval of the data access committee, through the NTR (https://tweelingenregister.vu.nl/information_for_researchers/working-with-ntr-data euproject
Measured with the mother-rated Attention Deficit/Hyperactivity Problems DSM-oriented scale of the Achenbach System of Empirically Based Assessment (ASEBA) Child Behavior Checklist (CBCL).The ASEBA CBCL Attention Deficit/Hyperactivity Problems DSMoriented scale scores in the clinical cohort include 90% mother report and 10% father report. 1