Post‐translational modifications linked to preclinical Alzheimer's disease–related pathological and cognitive changes

Abstract INTRODUCTION In this study, we leverage proteomic techniques to identify communities of proteins underlying Alzheimer's disease (AD) risk among clinically unimpaired (CU) older adults. METHODS We constructed a protein co‐expression network using 3869 cerebrospinal fluid (CSF) proteins quantified by SomaLogic, Inc., in a cohort of participants along the AD clinical spectrum. We then replicated this network in an independent cohort of CU older adults and related these modules to clinically‐relevant outcomes. RESULTS We discovered modules enriched for phosphorylation and ubiquitination that were associated with abnormal amyloid status, as well as p‐tau181 (M4: β = 2.44, p < 0.001, M7: β = 2.57, p < 0.001) and executive function performance (M4: β = −2.00, p = 0.005, M7: β = −2.39, p < 0.001). DISCUSSION In leveraging CSF proteomic data from individuals spanning the clinical spectrum of AD, we highlight the importance of post‐translational modifications for early cognitive and pathological changes.


BACKGROUND
Alzheimer's disease (AD) is the most common form of dementia among adults 65 years and older. 1 The hallmark pathological features of AD are extracellular deposits of the misfolded amyloid beta (Aβ) protein, as well as neurofibrillary tangles composed of hyperphosphorylated tau protein. 2 Protein misfolding causes soluble versions of Aβ and tau to be organized into toxic, fibrillar aggregates and to lose their functional properties. 3,4As a result, there is growing interest in targeted, therapeutic interventions that enhance the biological processes behind protein degradation as a means of combating disease progression. 5e presence of amyloid plaques begins years if not decades before the onset of clinical dementia, and has been conceptualized as an asymptomatic preclinical stage of AD. 6,7 Abnormal Aβ accumulation among clinically unimpaired (CU) adults is associated with magnetic resonance imaging (MRI)-based measures of neuronal injury, 8,9 abnormal tau levels, 10 cognitive decline, 11,12 and future progression to mild cognitive impairment (MCI) or dementia. 11Decreases in cerebrospinal fluid (CSF) levels of Aβ 42 and Aβ 42 /Aβ 40 ratios are among the earliest physiological changes that can be used to identify individuals with preclinical AD. 13,14 This initial period along the AD continuum represents a promising target for early therapeutic intervention 15 ; however, the physiological drivers of initial disease processes remain poorly understood.
Recent advancements in mass spectrometry, immunoaffinity assays, and aptamer-based microarrays have led to recent findings describing proteomic changes beyond amyloid and tau in the context of AD. 16,17 This research has been pivotal to identifying novel biomarkers and disease processes that emerge in parallel to or independent of initial amyloid and tau accumulation.As opposed to focusing solely on individual protein levels, many of these studies have approached high throughput proteomic data sets using network approaches to uncover communities of proteins-or modules-important to AD pathogenesis.
However, few studies-including those described above-have utilized network techniques to examine proteomic changes associated with preclinical AD, particularly among CU individuals. 28Abnormal amyloid accumulation during the CU stage is increasingly becoming a target for therapeutic intervention in anti-amyloid clinical trials, yet little is known about the biological pathways underlying in vivo neurodegenerative and cognitive changes at the earliest stages of the AD cascade, among CU individuals.
In this study, we identify CSF proteomic co-expression modules and examine their associations with early disease-relevant changes in a large cohort of CU individuals.To accomplish this, we constructed a protein co-expression network among a discovery cohort of CU, MCI, and AD participants, using 3869 proteins quantified in the CSF by modified aptamer technology (SomaScan).We further replicated this network in an independent, deeply phenotyped CU cohort.We discovered modules enriched for post-translational modifications (phosphorylation and ubiquitination) that predicted abnormal amyloid accumulation, tau aggregation, cognitive performance, and apolipoprotein E (APOE) genotype among CU individuals.
These findings emphasize the importance and multi-faceted role of post-translational modifications as an early driver of AD-related pathophysiology.

Participants
We analyzed CSF samples from 258 research participants recruited from either the Iqbal Farrukh and Asad Jamal Stanford Alzheimer's Disease Research Center (ADRC) and its affiliated clinics (ADRC+) or from the Stanford Aging and Memory Study (SAMS).Clinical diagnosis was determined at a clinical consensus meeting by a panel of neurologists and neuropsychologists.ADRC+ participants underwent neurological examination, neuropsychological testing, and neuroimaging, and provided biofluid samples (CSF).Participants diagnosed as CU (CDR = 0 or 0.5), MCI (CDR = 0, 0.5, or 1), or having AD dementia (CDR > 0.5) were used in our analyses and treated as the discovery cohort. 29SAMS is an ongoing prospective study of CU older adults that seeks to understand how memory performance relates to brain structure, brain function, and AD risk factors.

Cerebrospinal fluid samples
CSF samples were collected via lumbar puncture, which was performed in the morning after an overnight fast.A Sprotte needle inserted between lumbar vertebrae L4 and L5 was used to collect 10 mL of CSF, divided into 1.0 or 0.5 mL aliquots and stored in polypropylene tubes at −80 • C until assay.CSF centrifugation and assessment of blood contamination was conducted as described previously. 32

AD biomarker quantification and amyloid status determination
Separate aliquots processed by the Lumipulse G system (Fujirebio US, Inc., Malvern, PA) were used to measure CSF levels of AD biomarkers (phosphorylated tau 181 [p-tau 181 ], Aβ 42 , and Aβ 40 ) for all 147 SAMS CU and 89 of the 111 ADRC+ participants. 29The remaining

SomaLogic protein quantification and quality control
The aptamer-based SOMAScan assay platform was used to quantify CSF protein expression levels for further network analysis. 33is method of protein quantification relies on chemically modified DNA strands whose unique three-dimensional (3D) shapes allow them to bind to specific proteins with high specificity."SOMAmers" whose protein-aptamer complexes that survive sequential streptavidin bead capture, photocleavage, and kinetic capture are quantified after hybridizing to a DNA microarray.This technique provided us with the relative concentration (quantified in terms of relative fluorescent units, or RFUs) of 5284 CSF proteins.SomaLogic, Inc., uses a 96-well plate design with wells devoted to buffer, calibrator, quality control, and biological samples to account for nuisance variation and batch effects.There are three stages of data normalization: (1)  well as the entire plate as a whole.Proteins flagged by SomaLogic's internal quality control, as well as samples with normalization factors falling outside the acceptable assay range were removed before analysis.In addition, we removed outlying samples whose standardized connectivity 34 was more than three standard deviations (SD) from the mean.Finally, for each protein, we constructed a distribution of measurements across buffer samples and assessed whether it differed significantly from each clinical sample's measurement (at a false discovery rate alpha level of 5%).We removed proteins where more than 25% of clinical samples fell within this buffer distribution, resulting in 3869 CSF proteins for subsequent analyses (Figure S1).

MRI imaging
MRI was used to measure structural neuroimaging outcomes within the SAMS CU cohort.Data were acquired on a 3T GE Discovery MR750 MRI scanner (GE Healthcare) using a 32-channel radiofrequency receive-only head coil (Nova Medical).For the current analyses, we processed a whole-brain high-resolution T1-weighted anatomic volume (repetition time [TR] = 7.26 ms, field of view [FoV] = 230 mm × 230 mm, voxel size = 0.9 × 0.9 × 0.9 mm, slices = 186), through FreeSurfer version 7. Subcortical and cortical region of interest (ROI) volumes-including total gray matter, hippocampus, and white matter hypointensity volume-were defined by FreeSurfer's aparc+aseg atlas.

Cognitive composite scores
We variant discovery was used to map genome sequencing data to the reference genome (GRCh38) and to produce high-confidence variant calls using joint-calling. 35APOE genotype (ε2/ε3/ε4) was determined using allelic combinations of single nucleotide variants rs7412 and rs429358.

Differential abundance analyses
We conducted a one-way analysis of variance (ANOVA) followed by a Student's t-test to identify differentially expressed proteins in AD dementia compared to CU individuals within the ADRC+ cohort.A false discovery rate (FDR) correction at an alpha level of 5% was used to account for multiple comparisons and determine significance.Modules with at least one differentially abundant protein were considered relevant to aging and AD dementia, and they became the focus of subsequent analyses.

Protein-protein co-expression network
We performed a weighted gene correlation network analysis using the WGCNA package (version 1.72.1) in R. 34 For this, we used a subset of participants from the ADRC+ cohort to ensure a diagnostically balanced sample; specifically, we included only 18 CU, 18 MCI, and 18 AD participants.All cognitively impaired participants (MCI and AD) were amyloid positive.
First, we constructed a matrix of the bi-weight mid-correlations between proteins and transformed this into a signed adjacency matrix using a soft thresholding power of 12 (resulting in a scale-free topology fit above 0.8).This adjacency matrix was then transformed into a topological overlap matrix (TOM), which captures the similarity between nodes in terms of their shared patterns of connections.We performed hierarchical clustering with a 1-TOM distance measure, and used a dynamic tree cutting algorithm (cutreeDynamic, with a minimum module size of 15, deepSplit = 4, and a partitioning around medoid step that respected the dendrogram) to identify modules from the dendrogram.
The first principal component of each module's protein expression matrix was used to define a module eigenprotein. 34The degree of module membership for each protein (i.e., their intramodular connectivity (kME) value) is calculated by correlating its expression patterns across all samples with the module eigenprotein.We used kME values to merge highly similar modules together.The top 50% of proteins (ranked by kME value) within each module were correlated with every other module; if more than 25% of these proteins had greater membership in another module, the modules were merged.

Gene ontology analysis
We used the g:Profiler R package (version 0.2.1) to understand the Gene Ontology (GO) biological processes (GO:BP) and molecular functions (GO:MF) enriched within our modules at an experiment-wide threshold of α = 0.05.Whenever possible, we used the program default multiple comparison algorithm (g:SCS), which accounts for the hierarchical relationship between GO terms; we also tested for enrichment against a custom background of all 5284 SomaLogic-quantified proteins.When we were unable to find significantly enriched biological pathways with this approach, we turned to three alternate methods.
First, we used FDR correction for multiple comparisons (still enriched against the custom background of all SomaLogic proteins).If we were still unable to identify significant GO terms, we then used g:SCS correction against a background of all annotated genes.Finally, if we were still unable to identify GO terms, we used FDR correction against a background of all annotated genes.

Module preservation analysis
We used the WGCNA modulePreservation function to calculate the extent to which our modules were preserved in the independent, CU cohort (SAMS).This function applied our previously-defined modules to SAMS CSF samples and calculated module preservation statistics comparing the strength of interrelationship between nodes (module density) as well as connectivity patterns (module connectivity) in replicated modules versus the original. 36For each preservation statistic, module labels were permuted 200 times with a random seed set to 1 for reproducibility.Module density and connectivity preservation statistics are captured in a Z Summary measure (i.e., the mean of these two categories of preservation statistics); modules with a Z Summary >10 were considered preserved, as recommended previously. 36We additionally examined the medianRank of each module, which reflects the relative ranking of each across all preservation statistics and is less influenced by module size than the Z Summary value. 36

Module enrichment analyses
To establish whether a module was enriched with a particular characteristic-such as genetic regulators of amyloid pathology or proteins differentially abundant in AD dementia versus CU individualswe first calculated the average log-transformed p-value for that given characteristic across proteins within our module of interest.We then constructed a null distribution of average p-values with 10,000 module-sized random samples (with replacement) and calculated a z score to see if there was a significant difference between our module of interest compared to the null distribution.
To assess for enrichment of amyloid pathology, we used a singlenucleotide polymorphism (SNP) summary statistics from the genome-wide association study (GWAS) of amyloid positron emission tomography (PET) data of Raghavan et al. (2020). 37For enrichment of genetic regulators of clinical AD dementia diagnosis, we used GWAS summary statistics from the International Genomics of Alzheimer's Project (https://www.niagads.org/igap-rv-summary-stats-kunkle-p-valuedata), 38as well as the study of Bellenguez et al. (2022) (https://www.ebi.ac.uk/gwas/publications/35379992). 39GWAS summary statistics served as input to the FUMA online platform, which functionally annotates SNPs, maps them onto genes, and calculates the gene-level associations with a given phenotype. 40The p-values resulting from these gene-level associations were used for our module enrichment analyses of amyloid PET signal and AD dementia genetic risk.
To assess whether a given module was enriched for polyubiquitinated proteins, we used a mapping of the ubiquitylome by Abreha et al.
(2018), 41 and to examine enrichment for protein phosphopeptides, we used a mapping of the phosphoproteome by Ping et al. (2020). 42For these analyses, instead of calculating the average log-transformed pvalues, we calculated the average number of ubiquitination sites or protein phosphopeptides within a module of interest or an equally-sized random sample.

Multivariate LASSO regression, stability selection, and other statistical analyses
We performed multivariate regression analyses with a least absolute shrinkage and selection operator (LASSO) method to examine whether protein modules of interest could discriminate Aβ− from Aβ+ CU participants, using the glmnet package (version 4.1.6)in R.This approach uses L1 regularization to reduce the number of parameters within a model, by shrinking irrelevant and redundant parameters to a coefficient of 0. We selected the tuning parameter, , that minimized the mean cross-validated error after 10-fold cross-validation.We also manually assigned observations to folds 1 through 10 using a random sequence.SAMS CU participants with APOE genotype information (n = 124) were divided into an 80/20 train/validation split and used to train and validate our classifiers.We then evaluated the performance of our classifiers among a test set of ADRC+ CU participants that were not included in our network construction process (n = 54).We used the mean and confidence interval of the area under the receiver-operating characteristic (ROC) curve (AUC) to determine the significance and accuracy of each of our classifiers; these calculations and visualizations were performed with the ROCR (version 1.0.11) and pROC (1.18.0)R packages.
For the simplest logistic regression model, we included only age, sex, and ε4 allele count as predictors of amyloid status.We used this as a baseline point of comparison for our LASSO regression model, which additionally included all proteins within a given network module as predictors.The purpose of this was to leverage the variable selection properties of LASSO regression to understand which module proteins and/or demographic characteristics were most influential in predicting amyloid positivity.Thus we implemented a stability selection approach using the stabs R package (version 0.6.4),which uses subsampling to determine which model features are most likely to be selected across many different LASSO iterations. 43,44Each subsample contained half of the observations of the original data set, and this process was repeated 50 times.We used a 65% selection probability threshold to identify stably selected LASSO regression model features; these features then served as predictors in a logistic regression model predicting amyloid status among CU individuals.

ADRC+ discovery cohort participant characteristics
We began with data from 111 participants along the AD continuum as our discovery cohort (mean age = 68.6 years, SD = 8.31; 55% women) (Table 1).We used the ADRC+ cohort to identify proteins differentially abundant in AD relative to CU contexts.However, a clinically balanced subset of these participants (n = 54) were used to construct a protein co-expression network (mean age = 68.6,SD = 8.32; 61% women).This subset included 18 CU (Aβ+: n = 6), 18 MCI, and 18 AD participants (Table 1).
We constructed a protein co-expression network among a clinically balanced subset from the ADRC+ cohort, using the weighted gene correlation network analysis (WGCNA) algorithm.This network resulted in 25 communities of proteins, or "modules," ranging in size from 15 (M14-oxidative stress response and M23) to 1323 proteins (M22) (Figure 1B,C).Thirteen of these modules were enriched with at least one of the 130 differentially expressed proteins.We considered these modules to be AD relevant and made them the focus of subsequent analysis interpretations.
A representative eigenprotein was calculated for each module and used in Kruskall-Wallis tests to predict clinical disease stage (Figure 2).

F I G U R E 1 Protein co-expression network construction and module characterization in the discovery ADRC+ cohort. (A) Differential abundance analysis.
A volcano plot depicting the results of an ANOVA analysis followed by a Student's t-test to identify differentially abundant proteins in AD dementia compared to CU individuals.This models the log2-fold change in relative fluorescence units (RFUs) (x-axis) against the negative log10 p-value (y-axis) representing the association between the protein and a clinical AD dementia diagnosis.The p-values were adjusted using an FDR correction for multiple comparisons at an alpha level of 5%; only proteins with -log10 adjusted p-values exceeding this threshold were colorized as teal (decreased abundance in AD) or purple (increased abundance in AD).These proteins were used to restrict the scope of our subsequent analyses of co-expression network modules.(B) WGCNA protein co-expression network construction.A heatmap representing the topological overlap matrix (TOM) based on similarities in protein abundance levels that was used as input for our hierarchical clustering and community detection.Heatmap colors range from light yellow to red, reflecting low to high similarity, respectively.At the top and to the right, the network dendrogram and module color assignments are displayed.(C) Table of module sizes.A table listing all modules in our ADRC+ network by the number of proteins within each module.(D) Gene ontology analysis.Functional annotations derived from gene ontology analyses of the modules containing at least one protein differentially abundant in AD, conducted using g:Profiler.Of these nine modules, only seven contained functional enrichments that exceeded significance thresholds, and they are depicted here.The top three most significant gene ontology (GO) biological process and/or molecular function terms per module are displayed (y-axis) against their respective -log10 p-values (x-axis).

Module preservation within an independent CU cohort (SAMS)
An independent cohort of 147 CU participants were used to examine whether the protein co-expression network was preserved in the absence of cognitive impairment, as well as to relate modules from the network to clinically-relevant phenotypes (mean age = 68.7 years, SD = 5.79; 61% women) (Table 2).These participants were enrolled We used a module preservation analysis to determine whether the co-expression network could be reproduced within our SAMS CU cohort.Fifteen modules-including nine AD-relevant moduleswere highly preserved, with Z Summary values ranging from 12.0 to 31.0 (Figure S4).All remaining modules were weakly preserved, with Z Summary values ranging from 4.0 to 9.4.
To understand how modules might contribute to other phenotypes relevant to aging and AD risk, we focused on the associations between AD-relevant modules and continuous CSF p-tau181, composite cognitive scores, APOE genotype, and structural MRI measures (Figure 3B, Tables S1-S3).Ten of the 13 AD-relevant were associated with p-tau181 levels: M1-immune system regulation, M2-axonogenesis, M3-synapse assembly, M4-phosphorylation, M7-ubiquitination, M12-blood coagulation, M22, M23, and M25-axonal guidance.Six of the AD-relevant modules were associated with ε4 allele count: M2-axonogenesis, M4-phosphorylation, M7-ubiquitination, M18steroid dehydrogenase activity, M22, and M25-axonal guidance.Five modules were associated with executive function composite scores: F I G U R E 3 Module/phenotype relationships.(A) Modules by amyloid status.Box plots illustrating the results of Kruskall-Wallis tests for one-way ANOVA used to calculate module eigenprotein relationships to amyloid status within the independent SAMS CU cohort.From left to right, modules M4-phosphorylation, M7-ubiquitination and M18-steroid dehydrogenase activity are depicted.(B) Heatmap visualizing module relationships to cognition, AD pathology, genotype, and structural MRI outcomes within the independent SAMS CU cohort.Only module/phenotype relationships significant after multiple comparison correction are depicted.Heatmap colors range from purple to turquoise to red, reflecting the magnitude and direction of standardized beta values.The text within heat map cells are the unadjusted p-values for each association.Module/trait relationships whose unadjusted p-values change in significance (or are otherwise noteworthy) after controlling for amyloid and/or tau are marked with different superscripts: those that lose significance after controlling for amyloid are marked with "a"; those that gain significance after controlling for amyloid are marked with "b"; those that maintain significance after controlling for amyloid and tau are marked with "c." 1 Mean standard deviation (SD); n (%).
There was little difference in results between demographically adjusted (Figure 3B, Table S1) and amyloid-adjusted models (Figure 3B, Table S2).After controlling for continuous Aβ42/Aβ40 values, M18steroid dehydrogenase activity was additionally associated with CSF p-tau181 levels, whereas M2-axonogenesis was no longer associated with ε4 allele count.Only M7-ubiquitination was associated with executive function composite scores in these analyses.
Finally, the only significant relationship that persisted after controlling for p-tau181 and Aβ42/Aβ40 values was that between ε4 allele count and module M18-steroid dehydrogenase activity (Figure 3B, Table S3).A number of relationships were significant before multiple comparison correction.Specifically, M3-synapse assembly and M7-ubiquitination were associated with executive function; M4phosphorylation and M7-ubiquitination were associated with ε4 allele count; M10-telomerase RNA activity was associated with hippocampal volume; and M2-axonogenesis and M3-synapse assembly were associated with white matter hypointensity volume (Table S3).
Many of these patterns remained significant even among Aβ− participants alone (Figure S4).All but M1-immune system regulation remained associated with p-tau181 levels, and M18-steroid dehydrogenase activity remained associated with ε4 allele count.

Genetic and cell-type module enrichment
Next, we explored whether any AD-relevant modules were enriched for genetic variants associated with AD risk in GWASs.First, we examined whether they were enriched for proteins expressed by genetic regulators of clinical AD dementia (Kunkle et al., 38 ; Bellenguez et al., 39 ) and/or amyloid burden measured with PET (Raghavan et al., 37 ), as established by various GWAS summary statistics.
We used Z scores to determine how the average log-transformed p-value within our module compared to a distribution of 10,000 module-sized random samples.M13-G protein and oxidoreductase activity (Z = 3.009) was significantly enriched for proteins associated with genetic regulators of clinical AD dementia (Figure 3).In addition, in an effort to validate M4-phosphorylation and M7-ubiquitination module functional annotations, we sought to ensure that these modules were enriched for such post-translationally modified proteins.We also performed cell-type enrichment analyses on our ADrelevant modules, using the internet-based application, WebCSEA.
Eight of our 13 modules were enriched for specific cell types after Bonferroni correction: M2-axonogenesis for macrophages, M3synapse assembly for neurons, M4-phosphorylation for neurons, M7-ubiquitination for excitatory neurons and stromal cells, M10telomerase RNA activity for enterocytes and red blood cells, M15-G protein/oxidoreductase activity, M18-steroid dehydrogenase activity for epithelial and red blood cells, and M25-axon guidance for stromal cells (Figure S6).

Proteins within modules M4-phosphorylation and M7-ubiquitination accurately predict amyloid status in an independent CU cohort
Given the relationship between amyloid status and M4phosphorylation, M7-ubiquitination, and M18-steroid dehydrogenation activity, we sought to understand whether indi-vidual proteins within this module could accurately predict abnormal amyloid accumulation among a test set of 54 ADRC+ CU participants whose data were not used for network construction (Tables 3-5).For

DISCUSSION
In this study, we used CSF proteins to construct a co-expression network among a cohort of individuals along the clinical AD continuum and replicated this network in an independent cohort of CU older adults.
We further examined the relationship between protein clusters-or modules-within this network and phenotypes relevant to aging and AD, such as CSF measures of amyloid and tau burden, cognition, structural neuroimaging outcomes, and APOE genotype.This approach allowed us to identify modules relevant to AD disease biology and evaluate their early functional and physiological consequences among CU individuals.
The modules we observed resembled those described previously in larger-scale proteomic studies.These include modules devoted to axonal development, blood coagulation, RNA activity, synapse assembly, G protein and oxidoreductase activity, myelination, and protein kinase activity. 22,23,26,28,49Modules M3-synapse assembly, M4-phosphorylation, M10-telomerase RNA activity, and M23 were associated with clinical disease stage after multiple comparison correction.
In addition, modules M4-phosphorylation, M7-ubiquitination, and M18-steroid dehydrogenase activity were associated with abnormal Aβ aggregation within the SAMS cohort.Although not associated with clinical diagnosis, these modules arguably reflect early changes in the AD cascade and are relevant to understanding disease biology.These modules were also associated with p-tau181 levels, particularly after adjusting for amyloid pathology.Modules M4-phosphorylation and M7-ubiquitination showed amyloid-independent effects with APOE ε4 genotype, whereas M18-steroid dehydrogenase activity had a tau and amyloid-independent effect on genotype.Module M4-ubiquitination was further associated with executive function.
We performed enrichment analyses and found that only module M15-G protein and oxidoreductase activity were enriched for proteins associated with genetic regulators of clinical AD dementia. 39 additionally confirmed that modules M4-phosphorylation and M7ubiquitination were enriched with such post-translationally modified proteins (i.e., protein phosphopeptides and polyubiquitinated proteins, respectively).Furthermore, we performed cell-type enrichment analyses on our AD-relevant modules and found them to be enriched for neuronal, stromal, macrophage, epithelial, and red blood cell types.
Using LASSO regression analyses, we observed that modules M4phosphorylation and M7-ubiquitination accurately predicted amyloid status among 54 CU ADRC+ participants who were not included in the network construction process, with AUCs of 0.85 and 0.84, respectively.A logistic regression model that included 14-3-3 protein gamma (YWHAG)-a stably selected protein from the M4-phosphorylation module-outperformed one that included APOE ε4 genotype, age, and sex, alone (AUC of 0.80 vs an AUC of 0.69).
Our findings underscore the importance of protein posttranslational modification in abnormal amyloid accumulation.The role of post-translational modifications-such as phosphorylation and ubiquitination-in AD have been described in detail. 41,50,51As mentioned, neurofibrillary tangles in AD comprise the hyperphospho-rylated tau protein; in addition, Aβ production can be regulated by the phosphorylation of the amyloid precursor protein [APP]. 50A number of 14-3-3 proteins-including YWHAG, YWHAE, and YWHAB-were members of the M4-phosphorylation module.These phospho-binding proteins regulate a wide range of functions within the brain, including protein kinase activity, apoptosis, cell trafficking, and neuronal plasticity. 52In addition, there is evidence that these proteins interact with tau and can promote its phosphorylation. 53M4-phosphorylation also contained Ca 2+ /calmodulin-dependent protein kinases (CAMK2B and CAMK2D), which had relatively high selection probabilities for an M4-wide LASSO regression model.These calcium-signaling molecules have been linked to both phosphorylation of the tau protein and APP. 54 was enriched for ubiquitination, a post-translational modification mediated by a sequential cascade of enzymes that transfer ubiquitin, a 76 amino acid protein, to lysine residues on target proteins.
Ubiquitin can be assembled into polymeric chains via ubiquitination of one of its seven lysine (K) residues: K6, K11, K27, K29, K33, K48, and K63. 55M7 was enriched specifically for K63-linked ubiquitination, which is involved in non-proteasomal functions, such as protein kinase activation, DNA repair, and autophagy. 56Autophagy is a degradative process mediated by the lysosome and critical to the cellular response to stress, such as nutrient starvation, hypoxia, oxidative stress, and DNA damage. 57,58It degrades misfolded proteinsparticularly long-lived, insoluble, protein aggregates, 59 -as well as damaged organelles. 57M7-ubiquitination contained a number of regulators of autophagy machinery, such as MAP1LC3A, GABARAP, GABARAPL1, and GABARAPL2. 60tophagy induced by nutrient starvation is meant to promote cell survival, by providing cells with internal nutrient supplies and clearing protein aggregates. 58,61However, there is evidence to suggest that autophagy is dysregulated in AD. 62,63 In a 5fXAD mouse model of AD, fasting led to an increase in macroautophagy activity, but did not result in subsequent degradation of intracellular Aβ accumulation that stemmed from increased extracellular uptake. 64Although we cannot establish the direction of causality between autophagy and AD pathology, it is plausible that increased autophagy is detectable in the CU stages preceding clinical impairment.
In have also noted that in this context, roughly half of the time aptamers are binding to alternative forms of the same protein. 65,66In addition, our reliance on network analytical approaches provided an additional safeguard against such concerns.
Our study has several limitations.This work is cross-sectional, and longitudinal studies are needed to understand the time course of these proteomic signatures and the ability of these modules to predict future progression from CU to clinical impairment (MCI and AD dementia).
Furthermore, we were unable to functionally annotate modules with important relationships to AD pathology, such as M22 and M23, possibly due to their extreme module sizes.Finally, our cohort is predominantly non-Hispanic White and highly educated, thereby limiting the generalizability of our findings.Despite these limitations, our work relating CSF protein modules and phenotypes relevant to aging and AD dementia is important given the need to discover mechanisms driving initial disease processes in the absence of clinical impairment.
Overall, our study highlights the important, multi-faceted involvement of ubiquitination in the AD cascade, particularly at its initial stages.
22 ADRC+ participants who did not have Lumipulse data had Aβ peptides quantified by the Quanterix Neurology 3-plex A assay (Quanterix, MA, USA).Amyloid status was determined with ratios of Aβ 42 to Aβ 40 , and Aβ 42 /Aβ 40 ratios were used both continuously and dichotomously in subsequent analyses.Cut offs to classify participants into amyloid negative (Aβ−) and amyloid positive (Aβ+) groups were derived in a batch-specific fashion and are described in the Supplementary Methods.

F I G U R E 2
Modules by clinical disease stage.plots illustrating the results of Kruskall-Wallis tests for one-way ANOVA used to calculate module eigenprotein relationships to clinical disease stage.Modules M3-synapse assembly, M4-phosphorylation, M10-telomerase RNA activity, and M15-G protein/oxidoreductase activity were significantly associated with disease stage after FDR correction for multiple corrections.in the Stanford Aging and Memory Study (SAMS) and are referred to herein as the SAMS CU cohort.There were 109 amyloid negative (Aβ−; mean age = 68.1 years, SD = 5.52) and 38 amyloid positive (Aβ+; mean age = 70.2years, SD = 6.34) participants.The mean length of storage time before SomaScan protein quantification was 3.84 years (SD = 1.32).

F I G U R E 4
Module enrichment of AD genetic risk factors and post-translationally modified proteins.Histograms representing the bootstrapped null distribution of either the average -log10 p-values, number of ubiquitination sites, or number of protein phosphopeptides.These histograms were derived from randomly sampling a module-sized collection of proteins 10,000 times.Histograms are overlaid with normal distribution curves, and vertical lines represent z scores capturing the distance between the average p-value within a given module and its bootstrapped null distribution.Only vertical lines with z-scores significant above the 90% confidence interval critical value (−1.645 or 1.645) are labeled.Significant enrichment results in module M15-G protein and oxidoreductase activity of gene-level associations with clinical AD dementia from the Bellenguez et al. (2022) 39 GWAS, module M4-phosphorylation of protein phosphopeptides derived from the Ping et al. (2020) 42 mapping of the phosphoproteome, and module M7-ubiquitination of polyubiquitinated proteins derived from the Abreha et al. (2018) 41 mapping of the ubiquitylome.
each of these modules, we performed multivariate LASSO regression on a model derived from module proteins, age, sex, and APOE ε4 allele count.Both M4-phosphorylation (AUC = 0.85, 95% confidence interval[CI] = 0.72-0.97,seven-parameter solution) and M7-ubiquitination (AUC = 0.84, 95% CI = 0.72-0.96,six-parameter solution) predicted amyloid status with high accuracy among the test set of ADRC+ CU participants.(Figure5A).In contrast, a logistic regression containing only APOE ε4 allele count, age, and sex weakly predicted amyloid status among ADRC+ CU participants (AUC = 0.69, 95% CI = 0.52-0.86).Module M18-steroid dehydrogenase activity did not significantly predict amyloid status (AUC = 0.56, 95% CI = 0.38-0.75,13-parameter solution) (FigureS7A).LASSO regression is a useful method of feature selection.It minimizes the loss function by reducing the absolute value of the sum of the model's coefficients, shrinking the coefficients of weak and redundant parameters to 0. We used a stability selection procedure to determine which variables were most likely to be selected across many different iterations of LASSO regression, using a selection probability greater than 65% as our cutoff.The stably selected variables included YWHAG for module M4-phosphorylation and SMURF1 and APOE ε4 allele count for module M7-ubiquitination (FigureS7B,C).We used the stably selected proteins as predictors in separate logistic regression models that additionally controlled for APOE ε4 allele count.These models predicted amyloid status with moderate to weak accuracy (YWHAG [M4-phosphorylation]-AUC = 0.80, 95% CI = 0.65-0.94;SMURF1 [M7-ubiquitination]-AUC = 0.71, 95% CI = 0.55-0.88)(Figure5B).
conclusion, by examining module changes in the absence of clinical impairment, our study enabled us to elucidate the critical importance of phosphorylation and ubiquitination for preclinical changes in cognition and pathology.The focus on abnormal amyloid aggregation is particularly relevant because clinical trials are increasingly targeting this earliest stage of disease for therapeutic intervention.By leveraging a large sample of CU participants and cutting-edge protein quantification technology, we were able to identify biological mechanisms associated with amyloid positivity.The SOMAScan platform is the largest protein panel available for clinical screening of CSF samples, and its aptamer-based technology enabled high-throughput protein quantification.Although some studies have observed the potential for aptamer off-target cross-reactivity with homologous proteins, they Alzheimer's Disease Research Center and affiliated clinics (ADRC+ cohort).These participants were either enrolled by the Stanford ADRC or recruited from associated clinics at Stanford; we thus refer to these participants as the ADRC+ cohort.Of these participants, 73 were CU, 19 had MCI, and 19 were diagnosed with AD dementia.Based on CSF analyses, there were 42 amyloid negative (Aβ−) and 69 amyloid positive (Aβ+) participants; all clinically impaired participants (diagnosed with either MCI or AD) were amyloid positive.The mean length of storage time of CSF samples before protein quantification by the SomaScan assay platform was 3.94 years (SD = 2.39).

TA B L E 2
Demographic information by amyloid status (amyloid negative or amyloid positive) for the clinically unimpaired, independent Stanford and Aging Memory Study cohort (SAMS CU cohort).

Overall, N = 54 1 Negative, N = 29 1 Positive, N = 25 1
Demographic information by amyloid status (amyloid negative or amyloid positive) for the clinically unimpaired, Stanford and Aging Memory Study (SAMS CU cohort) participants used to train LASSO and logistic regression models.Demographic information by amyloid status (amyloid negative or amyloid positive) for the clinically unimpaired, Stanford and Aging Memory Study (SAMS CU cohort) participants used to validate LASSO and logistic regression models.Demographic information by amyloid status (amyloid negative or amyloid positive) for the clinically unimpaired participants from the ADRC+ cohort used to test LASSO and logistic regression models.
F I G U R E 5 Module M3's prediction of amyloid status using LASSO regression with stability selection.(A) Receiver-operating characteristic (ROC) curves depicting the classification performance (sensitivity vs specificity) of models predicting amyloid status among a test set of CU ADRC+ participants.In blue are results from a basic logistic regression model including APOE ε4 allele count, sex, and age (AUC = 0.71).In either dark green or green are results from a LASSO regression model derived from module M4-phosphorylation or M7-ubiquitination proteins, respectively, along with the previously mentioned demographic and genotype factors.(C) ROC curves similar to those in (A), except in dark green or green are results from a logistic regression model including only stable model features, along with APOE ε4 allele count: YWHAG for M4-phosphorylation and SMURF1 for M7-ubiquitination.