Longitudinal analysis of brain structure using existence probability

Abstract Introduction We propose a method to evaluate quantitatively the longitudinal structural changes in brain atrophy to provide early detection of Alzheimer's disease (AD) and mild cognitive impairment (MCI). Methods We used existence probabilities obtained by segmenting magnetic resonance (MR) images at two different time points into four regions: gray matter, white matter, cerebrospinal fluid, and background. This method was applied to T1‐weighted MR images of 110 participants with normal cognition (NL), 165 with MCI, and 82 with AD, obtained from the Japanese Alzheimer's Disease Neuroimaging Initiative database. Results We obtained the coefficients of probability change (CPC) for each dataset. We found high area under the receiver operating characteristic curve (ROC) values (up to 0.908 of the difference of ROCs) for some CPC regions that are considered indicators of atrophy. Additionally, we attempted to establish a machine‐learning algorithm to classify participants as NL or AD. The maximum accuracy was 92.1% for NL‐AD classification and 81.2% for NL‐MCI classification using CPC values between images acquired at first and sixth months, respectively. Conclusion These results showed that the proposed method is effective for the early detection of AD and MCI.

characterization of longitudinal structural brain changes, is needed for superior prediction of the onset and treatment of AD.
Several established methods exist to analyze brain volume changes. The software package FreeSurfer longitudinal stream can estimate volume changes within more than 40 volumes of interest (VOIs) using the Bayes estimation and Markov random fields (Fischl et al., 2002(Fischl et al., , 2004. Tensor-based morphometry compares the amount of deformation between two mutually registered images. This method performs global registration by affine transformation, with subsequent local registration by nonlinear transformation. The Jacobian map is obtained as the distribution of shrinkage and extension for each axis in the whole brain (Hua et al., 2010). Other methods focus on changes in the brain surface. Boundary shift integrals evaluate the degree of shrink by assessing changes in pixel values at the boundary between the cortical tissue and cerebrospinal fluid (CSF) (Freeborough & Fox, 1997;Leung et al., 2010Leung et al., , 2012. Using the distance between two corresponding voxels on a contour, structural image evaluation using the normalization of atrophy (SIENA) is another well-known method to assess volume changes (Smith et al., 2002).
Numerous studies have used these longitudinal analyses to evaluate AD in magnetic resonance (MR) images; specifically, information regarding the hippocampus, such as its volume, shape, and structure, is often used to estimate the progress of AD (Ceyhan et al., 2011;Chan et al., 2001;Grundman et al., 2004;Mungas et al., 2002;Reuter et al., 2012;Tang et al., 2015). Lillemark et al. (2014) investigated the relationship and proximity between brain regions to classify individuals into healthy control, MCI, and AD groups. Thompson et al. (2003) generated a map that visualized the rates of local gray matter loss over time. As aforementioned, most of these analyses have used the volume or shape of the whole or local brain regions. Reuter et al. (2012) developed a novel longitudinal image processing framework based on the FreeSurfer pipeline for automatic surface reconstruction and segmentation of brain MR images acquired at arbitrary time points.

This study proposes new indices for early detection of
Alzheimer's disease and MCI by quantitative evaluation of longitudinal structural changes using corresponding changes in the brain tissue. Specifically, we used existence probabilities of the following three different tissues that can be easily obtained by segmentation in SPM8 (Statistical Parametric Mapping) (Ashburner & Friston, 2005): gray matter (GM), white matter (WM), and CSF. All other signals are considered background. The structural changes were considered changes in existence probabilities, and the temporal change in each brain tissue type was estimated from the above probabilities. The results were subsequently analyzed with the SPM software, a widely used tool to analyze brain MR images. This analysis enabled the acquisition of additional information with this method. Moreover, this method outlines both structural and tissue-level changes. This report expounds on the method and its use to differentiate individuals with AD from those with NL, thereby demonstrating its classification efficacy.

| Subjects
We used T1-weighted MR images from the Japanese Alzheimer's Disease Neuroimaging Initiative (J-ADNI) dataset, provided by National Bioscience Database Center in Japan.
Data were acquired using 1.5 T MRI scanners (GE Healthcare, Siemens, and Philips). Scanning parameters are as follows: flip angle, 8°; inversion time, 1,000 ms; field of view, 240 × 240 mm 2 ; slice thickness with no gap, 1.2 mm; repetition time, 2,400 ms for multicoil phased-array head coil and 3,000 ms for birdcage coil for GE GNENESIS SIGNA, SIGNA EXCITE, SIGNA HDx/HDxt, Siemens Avanto, MEGNETOM, VISION, Sonata, Symphony, Symphony Vision, and Symphony Tim, 2,300 ms for multicoil phased-array head coil for Philips Achieva; in plane resolution, 0.9375 × 0.9375 mm 2 , 1.0156 × 1.0156 mm 2 or 1.25 × 1.25 mm 2 ; acquisition plane, Sagittal; phase encoding direction, A/P. All recruited volunteers were divided into the following three groups: (a) AD, (b) NL, (c) and MCI. All participants underwent examinations; these included MRI and cognitive assessments, such as the

Mini-Mental State Examination (MMSE) and the Clinical Dementia
Rating (CDR). The J-ADNI study was a longitudinal study for AD with MRI and cognitive assessments at 6-month or 12-month intervals for 2 or 3 years depending on the target group (for more details, refer to the J-ADNI website). In this study, we used MRI performed at 0 and 6 months because if we can capture brain structural changes at short scanning intervals, it would be a more useful biomarker (Mubeen et al., 2017). Those who underwent an MRI scan with a different scanner at 0 and 6 months of age were excluded. Since we found significant age difference between diagnostic groups on the original J-ADNI dataset, age-matched participants were chosen by stratified random sampling from the three groups.
The MRI data provided by the J-ADNI database were not isotropic.
However, because this is a public database, this omission was outside of our control.
All data were collected after obtaining informed consent from participants and approval from the ethics committee of our hospital. Table 1 shows the detailed demographic information of the participants enrolled in this study.

| Image processing
All MR images were corrected for intensity inhomogeneity using the B1 correction algorithm (Narayana et al., 1988) and a nonparametric nonuniformity intensity normalization (N3) algorithm (Sled et al., 1998). Subsequently, phantom-based distortion correction (Maikusa et al., 2013) was performed to normalize variations between MRI scanners.
We constructed an automated pipeline to calculate the coefficient of probability change (CPC) elements within the VOIs as shown in Figure 1. This pipeline has four steps: VBM segmentation, creation of a single subject template (SST) and symmetrical registration, automatic extraction of the VOI, and calculation of the CPC elements.

| Create single subject template and symmetric registration
For unbiased longitudinal analysis, we first created an SST from individual MR images acquired at 0 and 6 months using the "ants-MultivariateTemplateConstruction2.sh." This script can create population-specific or individual templates by coupling the intrinsic symmetric pairwise registration (Avants et al., 2010) with an optimized shape-based sharpening/averaging template appearance. MR images acquired at two time points were subsequently registered to each SST.

| Segmentation
In the segmentation process, we used the VBM8 toolbox for image segmentation of SST-registered MR images into three types of brain tissues (GM, WM, and CSF) and background. It is assumed that the histogram of image intensity follows a Gaussian mixture model. where T and I indicate the tissue type, including the background and voxel values, respectively. p(T) is the prior probability of the image and can be obtained from a standard template constructed using data from many participants. This template reflects the existence probability for each tissue, which is provided in SPM. p(I | T) is the likelihood, that is, the probability that the voxel has an intensity I when the tissue type is known. This value is easy to obtain because the intensity distribution for each tissue has already been calculated using the Gaussian mixture model. p(I) can also be obtained from the image histogram. The aforementioned four processing steps (realignment, registration, re-slicing, and segmentation) were performed with SPM. We obtained the existence probabilities for the four tissues, including that for the background, at each voxel. These probabilities can be expressed as a vector with four elements.

| Coefficients with probability change
Here, we define the vector p (t) described in (2) at time t (with the first measurement as the reference time) for every voxel. The additional characters g, w, c, and b indicate GM, WM, CSF, and background, respectively. We denoted the vector at each measurement, excluding the reference time, as p (tt) . We assumed a linear relationship between the two vectors, p (t) and p (tt) , and describe the existence probabilities at an arbitrary voxel at two different time points, t and t ′ , as [g (t) w (t) That is, we assume that the existence probabilities at t ′ are defined as summation of the products of the probabilities at the first observation and the coefficients, which reflect the degree of change between the two observations. This relationship is expressed with the following equation: where Here, the elements in matrix F in (2)  Considering that 3 × 3 × 3 voxels were used to obtain the coefficients, we used 108 equations. In addition, we specified the condition that CPC has positive values to ease clinical interpretation; therefore, the resulting non-negative least-squares problem was solved using the incorporated function in Matlab.
When the CPC matrix F is a unit matrix, the probabilities do not show structural changes. CPC in F can be obtained by constructing equations that describe changes in each tissue. As an example of change in GM using probabilities at the first measurement, t, and at another time point, t ′ , we can obtain CPC values corresponding to the gray matter under the aforementioned constraints: The probability of the voxel being a part of the background was obtained by subtracting the total probabilities of GM, WM, and CSF from 1.0. Subsequently, probabilities in p (t) and p (tt) of less than 0.2 were neglected. Hence, if quantities of g (t) , w (t) , c (t) , or b (t) for a given voxel that were less than 0.2 were set to 0, CPC becomes much larger when these possibilities are extremely small because it indicates the ratio of the probabilities at two different time points. Here, the maximal CPC is limited to 5.0 (1.0/ 0.2) by excluding probabilities less than 0.2. This approach specifically highlights the changes in tissues that comprise a sufficiently large proportion (more than 20%) of a voxel.
Accordingly, we set the value to 0 and adjusted the remaining components to satisfy the condition that the total existence probability is 1.0. CPCs greater or less than 1.0 indicate increase and decrease of probabilities, respectively. CPC located on a diagonal element in the matrix, F ii , indicates the change in existence probability between two different time points in a tissue. However, the other CPC, F ij (i/=j), shows the degree of contribution of probability of tissue i to that of tissue j. To compare the CPC and the direct longitudinal changes in anatomical probability, we performed a simple longitudinal analysis. We calculated the averaged rates of the posterior probability changes, that is, gm(t + 1)/gm(t), wm(t + 1)/wm(t), and csf (t + 1)/csf (t), within each VOIs at the two time points after affine registration to SST.

| Automatic extraction of the VOI
We analyzed each SST image with the joint label fusion method (Hongzhi et al., 2013). This method is effective to label a VOI automatically according to the multi-atlas training set, Neuromorphometrics atlas (Neuromorphometrics Inc., 2016). This atlas features brain images of 30 participants from the J-ADNI database, which have been manually labeled into 236 regions. These data are commercially available (Neuromorphometrics). Each VOI of the atlas is defined in Neuromorphometrics General Segmentation Protocol and by the BrainCOLOR Cortical Parcellation Protocol.
After the extraction of VOIs, we calculated the average of each CPC element value within the extracted VOIs.

| Machine-learning classification
We constructed the classification models with machine learning and CPC elements according to the following two main steps: 1. We defined CPC elements to perform the classification. We focused on CPC components associated with brain atrophy.
Specifically, in case of brain atrophy, the GM region will erode and the CSF region will dilate. Therefore, we employed F cc , F gg , F gc , and F cg as the input variables for machine-learning classifications.
2. We then executed machine-learning classification using three types of classifiers to detect AD: support vector machine (SVM), random forest (RF), and gradient boosting classifier (GBC  Figure 2 shows the spatial distributions of F gg , F cc , F gc , and F cg , and the target SST T1-weighted image, which belongs to the NL and AD groups. The temporal changes in the representative indices using diagonal CPC (F cc and F gg ) have values less than 1 in most voxels, and these values in participants with AD were lower locally than in those with NL. These elements represent the ratio of tissue perseverance; hence, a deviation of F gg and F cc from a value of 1 indicates the degree of change from GM and CSF to other regions, respectively. F gc indicates the probability of a tissue changing from gray matter to CSF, whereas F cg indicates the reverse. High values of F gc indicate GM atrophy. In Figure 2, high F gc values can be observed in participants with AD within the medial temporal areas, including the hippocampus, comparatively more than that in participants with NL.

| RE SULTS
Moreover, we can see high F cg value boundary between GM and CSF in AD participant than NL.

| D ISCUSS I ON
From the ROC analysis of the NL/AD and NL/MCI group classification within anatomical regions with F cc , F gc , and F gg elements, higher AUC values than direct longitudinal changes were found within the lateral temporal regions, which have been previously associated with AD. Hence, these elements can be considered more indicative of GM atrophy.
In Figure 2, participants with AD had lower F cc values than those with NL and MCI who had moderate F cc in the whole brain. This result is possibly indicative of the regions with CSF shifting to other regions due to the structural change induced by brain atrophy. In contrast, F cg has high values around the boundary between GM and CSF regions, where F cc showed a low value. This change from CSF to GM does not have biological significance. However, our method does not evaluate at the completely same voxel between two time points. If the brain is perfectly spherical and the atrophy is toward the center, then a change from CSF to GM cannot occur. However, at the boundary of the sulcus, for example, it is possible that atrophy causes a migration of gray matter to voxel where CSF is indicated at first time point. We believe that these parameters affect atrophy, but these complex changes in brain shape due to disease progres- A longitudinal machine-learning approach consent from data obtained with fluorodeoxyglucose positron emission tomography (FDG PET) (over 12 months) was suggested by Gray et al. (2012), whereas another approach using MRI data collected at 0, 12, and 24 months since symptom onset was advanced by Farzan et al. (2015). The accuracies of FDG PET and MRI to distinguish AD from NL were 88.4% and 91.7%, respectively. A multimodal machine-learning algorithm constructed from baseline data, including MRI, FDG PET, and CSF, was proposed by Gray et al. (2013); it achieved accuracies of 89.0% and 74.6% to distinguish AD from NL and MCI from NL, respectively.
Zhang & Shen (2012)  patient-specific anatomical brain connectivity networks, which achieved accuracies of 84.2% and 70.4% to stratify NL from AD and MCI, respectively. Table 4 summarizes the performances of previously reported algorithms to distinguish NL from AD and MCI.
The multimodality classification frameworks consistently showed high performance; this result may be ascribed to the integration of complementary information contained in multimodality data, such as MRI, FDG PET, and CSF, into the machine-learning algorithm. However, multimodality approaches require additional scanning costs. Moreover, PET scans have the risk of radiation exposure, whereas a lumbar puncture is required to sample CSF, which is an invasive procedure that exacerbates treatment burden.
MRI-based approaches alone are minimally invasive. However, the performance of classifiers based on cross-sectional MRI data is relatively inefficient to distinguish AD/MCI/NL because of limited information of brain atrophy obtained through this method.
Longitudinal approaches can provide additional information on brain, such as temporal progress of atrophy and therapeutic out- our approach divided the information related to longitudinal brain changes to different tissue types such as GM, WM, and CSF, into nine elements; thus, the noise and errors generated in the process of calculating brain atrophy and/or co-registration between time points, for example, changes from GM to a background voxel that are sensitive to errors in detecting longitudinal changes, can be overcome by dividing the components using the CPC approach.
As shown in Table 4, all our results showed the best accuracy when using SVM. In general, RF and GBC are not suitable for cases where the sample size is not very large (Manuel et al., 2014). Our data had about 100 subjects in each group, which may explain why the SVM showed optimal results. Our method can detect brain preservation ratio and structural changes in the brain using only two brain MR images. The separated metrics with different characteristics, brain structural change, and preservation can be integrated by machine-learning method. We believe that this is the reason why our method can achieve high accuracy with only MR images and shortterm longitudinal analysis. Our method also calculated other elements of CPCs that do not relate to brain tissue, that is, rather related to background. We believe that this coefficient related to background normally should be zero and can be used whether or not brain extraction worked identically during a longitudinal assessment. Therefore, we suggest that mis-registration caused by unexpected head movements can be evaluated by CPC elements related to the background. We assessed averaged F bg , F bw , and F bc maps, and they were approximately zero; therefore, we think our results were likely not affected by head movements. But F bg , F bw , and F bc are insufficient for showing accurate registration of brain parenchyma (GM and WM). Consequently, we need new metrics to evaluate the registration success for a more accurate longitudinal assessment of brain.

| CON CLUS IONS
We proposed a novel metric to detect brain changes using CPC values for GM, WM, and CSF. We evaluated the efficiency of the proposed metric obtained from baseline and 6-month time points from the J-ADNI database. We performed machine-learning classification between NL versus AD and NL versus MCI. F gg , F cc , and F gc elements of CPC can reflect the previously characterized dynamics and neural manifestations of AD. Therefore, these elements can be used as surrogate biomarkers for computer-aided diagnosis.

ACK N OWLED G M ENTS
The We would like to thank Editage (www.edita ge.com) for English language editing.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest associated with this manuscript.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1002/brb3.1869.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets generated during the current study are available from the corresponding author upon reasonable request.