Pathology Steered Stratification Network for Subtype Identification in Alzheimer's Disease

Alzheimer's disease (AD) is a heterogeneous, multifactorial neurodegenerative disorder characterized by beta-amyloid, pathologic tau, and neurodegeneration. There are no effective treatments for Alzheimer's disease at a late stage, urging for early intervention. However, existing statistical inference approaches of AD subtype identification ignore the pathological domain knowledge, which could lead to ill-posed results that are sometimes inconsistent with the essential neurological principles. Integrating systems biology modeling with machine learning, we propose a novel pathology steered stratification network (PSSN) that incorporates established domain knowledge in AD pathology through a reaction-diffusion model, where we consider non-linear interactions between major biomarkers and diffusion along brain structural network. Trained on longitudinal multimodal neuroimaging data, the biological model predicts long-term trajectories that capture individual progression pattern, filling in the gaps between sparse imaging data available. A deep predictive neural network is then built to exploit spatiotemporal dynamics, link neurological examinations with clinical profiles, and generate subtype assignment probability on an individual basis. We further identify an evolutionary disease graph to quantify subtype transition probabilities through extensive simulations. Our stratification achieves superior performance in both inter-cluster heterogeneity and intra-cluster homogeneity of various clinical scores. Applying our approach to enriched samples of aging populations, we identify six subtypes spanning AD spectrum, where each subtype exhibits a distinctive biomarker pattern that is consistent with its clinical outcome. PSSN provides insights into pre-symptomatic diagnosis and practical guidance on clinical treatments, which may be further generalized to other neurodegenerative diseases.


INTRODUCTION
Alzheimer's disease (AD), one of the common neurodegenerative disorders, causes progressive memory loss, cognitive decline, behavioral disturbance, functional dependence, and ultimately death 1 . No effective treatments for AD have been found so far, urging the need for early diagnosis and intervention. Current diagnostic systems for AD primarily rely upon clinical signs and symptoms. However, neuropsychological assessments alone are inadequate to reflect the underlying pathophysiological progressions considering the long preclinical period and diverse symptoms across individuals [2][3][4] . Additionally, post-mortem histological examination of AD pathology in brain tissue samples often does not align closely with clinical diagnosis 5 .
This body of evidence further underscores the significant gap between symptomatic manifestations and latent development of AD pathology [6][7][8] . Therefore, to facilitate accurate prognosis and prompt intervention of AD, it is critical to stratify the aging population into fine-grained subtypes that are not only closely correlated with clinical outcomes but also characterize biomarker progression of cognitive decline.
According to the 2018 NIA-AA (National Institute on Aging and Alzheimer's Association) research framework 9 , accumulations of beta-amyloid plaques (A) and pathologic tau (T), along with neurodegeneration ([N]), are considered the three pathological hallmarks of Alzheimer's disease progression.
With recent advancement and accessibility of neuroimaging techniques, spatial data for biomarkers and atrophy patterns becomes increasingly available for clinical and research purposes. Striking efforts have been made to understand and explain the heterogeneity in AD progression from neuroimaging data [10][11][12][13] , among which statistical inference, mostly machine learning approaches, have proven its promising potential in AD subtype identification [14][15][16][17] . For example, a Subtype and Stage Inference (SuStaIn) was developed to predict temporal patterns and identify AD subtypes from cross-sectional magnetic resonance imaging (MRI) progression patterns 18 . Vogel et al. extended the SuStaIn to include tau biomarker and identified four distinct progression subtypes 19 . Recently, a semi-supervised clustering via generative adversarial networks (Smile-GAN) was proposed to cluster subjects with distinct progression pathways using brain atrophy measurements 20 . However, those methods including most existing machine learning analyses do not take into account the underlying pathological mechanism of AD, which may result in ill-posed results that are inconsistent with essential biological laws.
Different from statistical inference approaches, mathematical modeling predicts disease progression based on existing knowledge of AD pathology 21 . Several models have been proposed to simulate the propagation pattern of biomarkers. For example, an ordinary differential equation (ODE) system was constructed to mathematically model the temporal progression of amyloid and neuronal dysfunction 22 .
Following the diffusive nature of amyloid and tau, network diffusion models were developed to predict longitudinal patterns of atrophy and metabolism in AD across brain networks 23,24 . This pioneering work has shown the potential of capturing macroscopic properties of disease progression from a systems biology perspective. Hao et al. proposed a PDE-based (partial differential equation) system biology model to investigate the complex molecular pathways underlying AD 25 . But this model was not designed on a real brain domain and was restricted to theoretical analysis due to the lack of real data validation. Recently, a network-guided reaction-diffusion model was proposed to integrate both the interactions between AT [N] biomarkers and the diffusions along structural brain networks, which encodes the prevailing pathological mechanisms of AD and captures biomarker dynamics of individuals 26 . While those studies demonstrated the role of neurobiological factors in heterogeneous trajectories of cognitive decline [27][28][29] , a comprehensive analysis of phenotypic heterogeneity is still lacking, which limits their applicability to the general population.
To disentangle the heterogenous neurodegeneration trajectories, we propose a novel pathology steered stratification network (PSSN) based on the combined power of data-driven deep learning and theory-based biological modeling. Our approach is integrated with existing neuropathological mechanisms and is evaluated on significant longitudinal multimodal imaging scans covering the full spectrum of cognitive states (from cognitive normal to AD), allowing the model to be trained on different progression paths. To the best of our knowledge, the proposed computational framework is the first subtype identification tool that jointly considers the biomarker pathological interactions, brain network diffusions, and clinical assessments for population stratification. Leveraging theory-based biological models and data-driven deep learning, our PSSN: incorporates neuropathological domain knowledge to ensure neurologically consistent results; ameliorates the limitation of sparse longitudinal imaging data by learning the spatiotemporal dynamics of AT[N] biomarkers; stratifies subjects into fine-grained subtypes with distinct neurological underpinnings and phenotypic outcomes; characterizes subtype transition path on AD spectrum, providing insight into pre-symptomatic prognosis and prevention. It is important to note that fully validating the subtypes identified by our model may be challenging. This is primarily due to the absence of a definitive ground truth for subtype assignment in Alzheimer's disease and the inherent heterogeneity present in the AD data. Nevertheless, we believe that our model could provide insights and contributes to the ongoing efforts to understand the complex nature of AD progression. We will continue to process additional temporal neuroimaging data to enhance the representativity of our model and improve its accuracy.

Data Collection and Image Processing
The data used in this study was collected from Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We retrieved (i) demographic information, including age, gender, education, and occupation information; (ii) clinical assessment from a wide range of neuropsychological domains, including memory, executive, language, sociability, attention, etc.; (iii) longitudinal neuroimaging data, including Amyloid-, Tau-, and FDG-PET for AT[N] biomarker burden, and DWI for structural brain networks. Subjects selected here have two or more longitudinal scans and clinical assessment data. 320 subjects were then selected based on three criteria: (i) have available Amyloid-PET, Tau-PET, FDG-PET, T1-weighted MRI, and DWI scans, (ii) have at least one follow-up PET scan of A, T, or [N] biomarkers to track AD progression, and (iii) have a clinical diagnostic label indicating their cognitive status, either as cognitive normal (CN) or Alzheimer's disease (AD), assigned for each PET scan.

Brain Network Construction
DWI images were aligned with each subject's T1-weighted MRI. We first parcellate brains to 148 regions based on the parcellation framework proposed in 40 and then applied the surface seed-based probabilistic fiber tractography in FreeSurfer 41 . The number of fibers connecting two brain regions is counted to measure the strength of the anatomical connectivity of pair-wise regions and stored in the connectivity matrix. A total of 506 structural brain networks were constructed from DWI images.

Regional AT[N] Level Acquisition
2,807 amyloid PET (1,252 subjects), 1,009 tau PET (670 subjects), and 3,590 FDG PET (1,516 subjects) images were parcellated into 148 regions using Destrieux atlas 40 by aligning with each subject's T1weighted MR image. We then calculate the regional standard uptake value ratio (SUVR) for amyloid-, tau-, and FDG-PET to represent the regional value of corresponding biomarkers, which is normalized by the whole cerebellum reference. All the PET data we collected have passed the quality control of the ADNI.

Clinical Assessments
MMSE, ADAS, and CDR scores are indicators of subjects' overall cognition status and are used to guide our PSSN. We use Everyday Cognition Questionnaire (ECog) scores to validate and analyze our results.
ECog is scored on a four-point scale where higher scores represent more severe dementia, based on one global factor and six domain-specific factors: everyday memory (ECogMem), language (ECogLang), visuospatial abilities (ECogVisspat), planning (ECogPlan), organization (ECogOrgan), and divided attention (ECogDivatt). All the above clinical assessment data were collected from the ADNI database.

Reaction-Diffusion Model
Based on existing neuropathological knowledge, we first construct a reaction-diffusion model of Alzheimer's disease to characterize the spatiotemporal interaction and diffusion of AT [N] biomarkers. This biological model can predict subject-specific long-term propagation patterns across brain networks, which enables PSSN to identify AD subtypes following neurological principles.

Model Entity
As shown in Fig. 1(a), the model takes four entities of AD as input: (i) A biomarker level measured from amyloid PET at 148 brain regions, which is a vector of 148×1; (ii) T biomarker level measured from tau PET of size 148×1; (iii) [N] biomarker level measured from FDG PET of size 148×1; Each AT[N] input is a 592×M matrix, where M denotes the number of subjects included in our model. (iv) brain network (graph Laplacian), which is a 148×148 matrix used as the diffusive pathway for AT [N]. These multimodal neuroimaging data jointly provide a comprehensive AD profile of each subject.

Model Design
In Fig. 1(b), the core of our reaction-diffusion model is the amyloid cascade hypothesis [30][31][32][33][34] . (i) Amyloid activates hyperphosphorylation of tau, tau triggers subsequent neurodegeneration, and damaged neurons release more amyloid to the brain (dashed arrows) 35 . (ii) amyloid and tau have constant production (solid arrows) and density-based degradation (hollow arrows). (iii) neuronal resilience moderates the rate of neurodegeneration 36 . (vi) amyloid and tau diffuse along the structural brain network via Laplacian operator 24,37,38 . This model quantitatively characterizes the spatiotemporal dynamics of AT[N] biomarkers ( Fig. 1(c)), laying a solid foundation for AD prognostic subtype identification. Detailed descriptions can be found in 26 .

Pathology Steered Stratification Network
Following our biological model, we construct a deep predictive neural network to disentangle the heterogenous neurodegeneration progression and stratify subjects into fine-grained subtypes within distinct neurobiological underpinnings.  predicts clinical assessments using the subtype-probability vectors.

Data Fusion
Let ! denote the number of clinical assessments for subject . For each subject , we (1)

Stratification Network
We aim to stratify aging brains into a set of subtypes, written as = { ! % | = 1, … , , = 1, … , ! }, where ! % is the subtype index for subject at the %& visit. To distinct neurobiological underpinnings between subtypes as well as preserve the shared symptoms and cognitive decline patterns within each subtype, three sub-networks are proposed as follows.
1. Feature Extraction Network ( Fig. 1(d)). The feature extraction network is a many-to-many Long Short- is the feature representation learning network, and denotes the parameters of the network.
2. Subtype Network ( Fig. 1(e)). The subtype network is a fully connected neural network that learns patterns in the subject pool and then maps each AT[N]-Net feature vector ! % into a subtype-specific feature vector ! % ( ), where ∈ {1,2, … , } and denotes the number of identified subtypes. Also, we include fuzzy assignment by minimizing the entropy-based regularization term to avoid the trivial stratification solution (one dominant cluster). Let denote the parameters of subtype network.
3. Prediction Network ( Fig. 1(f)). We use a fully connected neural network to predict each subject's MMSE, CDR, and ADAS scores based on the subtype assignment probability ! % generated in Subtype Network. A ( -norm loss function is used to measure the prediction error, written as !
This clinical assessment-guided feature allows us to group subjects with similar clinical stages. Let denote the parameters of the prediction network.

Optimization
Three sub-networks are jointly updated by the loss function where ! % allows clinical assessments to guide the stratification process and is a hyperparameter used to control the level of fuzzy assignment. Directly calculating the backpropagation derivative is quite complicated in the Stratification network, as it aims to minimize the divergence between clinical scores and predictions as well as to find the optimal cluster assignments at the same time. Thus, we apply the actorcritic model 39 for optimization, which iteratively optimizes two groups of sub-networks.
Let ∇ -, ∇ ' , ∇ . represent the gradients of the backpropagation process, respectively. We first fix the parameter of the prediction network ( ) to estimate parameters of feature extraction and subtype network ( , ). By fixing , gradients ∇ ' can be given by: Similarly, the gradients ∇ . can be given by: At each stage, we update the subtype assignment by the updated subtype-specific features { ! % }. Then by fixing the feature extraction network and subtype network ( , ), we iteratively update the prediction network whose gradient is given by:

RESULTS
Our PSSN outputs two types of information. First, the PSSN uses a reaction-diffusion model to predict the spatiotemporal dynamics of AT[N] biomarkers for each subject. This model incorporates current neuropathological pathways between these biomarkers and integrates their diffusion across brain networks.
By doing so, we ensure that the prediction results are aligned with known neuropathological pathways. The

Experiment Setup and Parameter Tuning
We conducted our experiments on a multi-core Redhat Linux machine with two 32GB NVIDIA Tesla V100 GPUs. Our model was developed and constructed using TensorFlow, a popular machine learning framework.
To train our PSSN, the data ( and ) is randomly split into training (80%) and validation (20%) sets. Since there is no ground truth for subtype assignments, we determine the optimal number of clusters by jointly considering cluster results from K-means, SuStaIn, and our PSSN. Taking K-means and SuStaIn as baselines, our method attains the lowest cluster variance ratio with respect to both K-means SuStaIn methods.
The hyper-parameters were set to = 6, = 10 /0 , _ = 8 and _ = 0.7, where K represents the number of subtypes and has the most significant impact on both the clustering outcomes and final experimental results; is the parameter for calculating the loss function; _ is the number of layers of the fully connected networks used in both the Subtype Network and the Prediction Network; _ is the probability of data kept at each dropout. to an 11.50% decrease in intra-cluster variance when compared to directly applying K-means.

Stratification Comparison
We use seven ECog scores (Mem, Lang, Visspat, Plan, Organ, Divatt, and Total) to evaluate the subtype stratification results as they reflect the clinical stages of AD progression and are unknown to all models. We first analyze the distribution of subtype classification in three methods. As shown in Table 1, K-means and SuStaIn identified a giant cluster that includes more than 50% of observations despite extensive attempts.
One possible explanation is their incapability to handle high-dimensional spatial data, where our input dimension is 148 brain regions. The elastic feature extraction network in PSSN can adapt to time series data with various lengths. Given the sporadic availability of neuroimaging data in the real world, such a design maximizes the utility of available data.

Intra-cluster Homogeneity
We then investigate the consistency of clinical outcomes within each cluster. For each subtype identified by the PSSN, we calculate the variance of clinical scores to evaluate the quality of subtype classification, where a small variance indicates intra-cluster homogeneity. Fig. 3(a) shows the intra-cluster variance of seven ECog scores by K-means (top), SuStaIn (middle), and PSSN (bottom). Note that each column of the heap map is individually normalized using column-wise z-scores, ensuring that each method is compared using

(a) PSSN vs. K-means (b) PSSN vs. SuStaIn
the same mean and standard deviation values. Our PSSN method achieves superior intra-cluster consistency compared to K-means and SuStaIn in all ECog assessments.

Inter-cluster Heterogeneity
A pairwise -test was performed to check the inter-cluster variance. In Fig. 3(b), we plot a triangle with 1 ( = 15 grids where each grid represents the results of two-sample -tests. As shown in Fig. 3(b), our PSSN method achieves the best performance since it has the least biased subtypes (pink grids, < 5% of the subject pool) and the greatest number of pairs with significant inter-cluster differences (red grids, -value < 0.05).
We further rank each subtype according to the severity of averaged MMSE, CDR, and ADAS scores. It is found that our PSSN has the smallest cross-ranking links, indicating the best consistency within each subtype (Fig. 3(c)). With the lowest intra-cluster variance as well as the highest inter-cluster difference, our PSSN achieves the best coherence of all clinical assessments and sets a solid cornerstone for its clinical implications and applications.

Clinical Insights
Current diagnoses of AD are often delayed due to the wide spectrum of clinical symptoms among individuals. Given the fine-grained subtype identified by our model, we further examine the distribution of ECog scores to explore the representative symptoms of each subtype.

ECog Entropy
To evaluate the inter-cluster difference of ECog scores, we calculate the cluster-wise entropy ( ) in seven ECog scores:

K-means SuStaIn PSSN
The p-value across two subtypes is significant (<0.05) Same as the above, with an additional criterion that at least one of them is a biased subtype Others where refers to the index of ECog scores, and E 2,4 F is the probability of assigning the %& ECog scores to %& subtype. The average entropies of each ECog score are shown in Fig. 4(a). Compared to K-means and SuStaIn, our PSSN achieves the lowest entropies in all seven categories of ECog scores, showing the greatest stability in subtype identification.

Subtype Symptom
In Fig. 4(b), the six domains of ECog (except ECog-Total) are mapped to six brain domains based on functional parcellation (see Fig. 4 for detailed brain region parcellation). Further in Fig. 4(c), we conclude the position of subtypes in the continuous spectrum: CN-like subtypes (#1, #4), MCI-like subtypes (#5, #6), and AD-like subtypes (#2, #3). The observed pattern summarized below offers new insight into the prediagnosis and treatment of AD as it inherently combines physiological and psychological markers.
AD State. As we can see clearly from Fig. 4(b), subtype #3 and subtype #2 present distinct cognitive levels in almost all domains. Subtype #3 has significantly higher scores than (i) subtype #4 in six domains (Organ, Divatt, Visspat, Mem, Lang, and Total), (ii) subtype #1 in domains (Plan, Divatt, Visspat, Mem, Lang, and Total). Subtype #2 is significantly higher than (i) subtype #4 in Divatt, Mem, Lang, Total, (ii) subtype #1 in Divatt, Mem, Lang, Total, Plan, Visspat. This pattern also aligns with the predicted AT[N] level generated by our spatiotemporal model (discussed in Section Neurobiological Evaluation). Based on the consistency of pathological biomarkers and clinical scores, we consider subtypes #3 and #2 as AD types, where highlevel AT[N] accumulates across the brain and multiple symptoms manifest together simultaneously.
Although it is hard to differentiate subtypes #3 and #2 based on reported symptoms, we could observe distinct patterns of pathological burdens Fig. 5(a). Subtype #4 scores are significantly lower in memory, which is also consistent with the brain mapping of AT[N] biomarkers. A further examination shows that subtype #4 has more problems with visual-spatial abilities, which suggests subtype-specific treatments in parietal and occipital lobes and could be used in clinical settings to differentiate subtypes #4 and #1.
MCI State. Subtypes #5 and #6 are intermediate states. Subtype #5 scores significantly higher on memory than subtype #4 and on visual-spatial ability than subtype #1. Subtype #6 has higher scores in almost all cognitive domains than subtypes #1, #4, which implies subtype #6 should be placed closer to the AD end on cognitive continuums. Subtype #5 has close to normal organization and language abilities, while subjects in subtype #6 tend to score higher in planning, language abilities, and total assessments. Such distinction suggests heavier pathologic burdens in Subtype #6's Wernicke's and Broca's areas, which is confirmed in Fig. 5(a).

DISCUSSIONS
Due to the massive heterogeneity between neurobiological examination and clinical diagnosis, we are interested in the associated neurobiological factors for each identified subtype. We compute the subtype transition matrix, analyze the corresponding AT[N] pattern, and generalize an AD evolutionary graph for subtype prognosis.
Transition Matrix. As defined above, each subject is labeled as ! % which is the subtype index (from 1 to is the possibility of subtype # transiting to subtype # .   Fig. 5(a)) and subtypetypical clinical profiles ( Fig. 4(b)).  Fig. 5(b) summarizes hypothetical CN, MCI, AD pathways, which are congruent with the order of subtypes on the continuum in Fig. 4(c), demonstrating the consistency between physiological and psychological signs using our pathology steered method. Clinicians should closely monitor their cognitive abilities and provide prompt intervention if there is a sudden deterioration.
Final Stable State. For the subtype transition matrix , each subtype can be considered as a node in a graph, and the transition probability indicates the directed link between two subtypes. As the initial distribution of subtypes in our subject pool is (@) = {0.11, 0.11, 0.22, 0.38, 0.11, 0.08} and (B) = (@) B , we can get the steady state vector of the transition matrix by taking the infinite limit of , written as = (@) • lim B→D B . Fig. 6 shows the transition matrix at different steps ( = 1, 3, 5) using heatmaps, where darker colors represent higher transition probabilities. When = 0, the heatmap is the initial transition matrix. As we increase the exponent, the matrix approaches the stable state vector . We can calculate by solving • = , which is the eigenvector of the transition matrix corresponding to the eigenvalue = 1. More than 50% of subjects will progress to AD states (subtypes #2 and #3), 33% of subjects will stay in MCI state (subtypes #5 and #6), and only 13% of subjects will stay in cognitive normal states (subtypes #1 and #4).

Fig. 6. Heatmaps of subtype transition matrix at different time steps ( ). The number and shade in entry
%& row and %& column represent the probability of transiting from subtype # to subtype # at a given time.

CONCLUSIONS
In this paper, we propose a pathology steered stratification network to fill the domain knowledge gap in the current study of AD subtype identification. Our proposed method integrates (1) a state-of-art reactiondiffusion model that can identify causality and mechanisms underlying Alzheimer's and characterize the spatiotemporal dynamics of AT[N] biomarkers across brain networks; (2) a deep predictive neural network that can learn representative features of neurodegeneration trajectories and stratify the aging population into fine-grained subtypes with distinct pathological profiles. Compared to other methods like K-means, our PSSN achieves the highest inter-cluster heterogeneity and intra-cluster homogeneity. Importantly, we discovered six distinct subtypes spread across the neurodegeneration spectrum, where each subtype shows consistency between neurobiological burden and symptom profiles, indicating the potential of our PSSN to disentangle AD heterogeneity. An evolutionary disease graph is presented as a general guideline for the probable state transition of subtypes. Altogether, our method provides a systematic way of subtype identification with joint consideration of existing pathological knowledge, physiological imaging data, and psychological examinations, which can assist pre-clinical AD prognosis and may be applied to other neurodegenerative studies facing similar therapeutic challenges.