Identifying brain regions contributing to Alzheimer's disease using self regulating particle swarm optimization

In this article, we developed an approach for detecting brain regions that contribute to Alzheimer's disease (AD) using support vector machine (SVM) classifiers and the recently developed self regulating particle swarm optimization (SRPSO) algorithm. SRPSO employs strategies inspired by the principles of learning in humans to achieve faster and better optimization results. The classifiers for distinguishing subjects into AD patients and cognitively normal (CN) individuals were built using grey matter (GM) and white matter (WM) volumetric features extracted from structural magnetic resonance (MR) images. It could be observed from results that the classifier built using both GM and WM features provided accuracy of 89.26% which is better than the performance of classifiers built using either GM or WM features only. Moreover, consideration of clinical features in addition to volumetric features improves the accuracy further to 94.63% which is better than the performance reported by recent works in literature. In order to identify the brain regions that are important for AD vs CN classification problem, we used SRPSO to extract GM and WM features that yield better classification performance. Using 50 features identified by SRPSO, an accuracy of 89.39% was obtained which is close to the accuracy based on all features. The features identified by SRPSO were mapped back to the brain to identify brain regions that exhibit degeneration in AD. In addition to identifying areas known to be involved in AD like cerebellum, hippocampus, this helped in finding newer areas that might contribute towards AD.


| INTRODUCTION
Alzheimer's disease (AD) is the most commonly occurring cause of dementia as per a recent study by the World Health Organization. It causes neurodegeneration which leads to cognitive and memory impairments. The number of people suffering from AD worldwide were reported to be 33.9 million in 2011 which is likely to triple by 2050. 1 This makes early and accurate diagnosis of AD an important problem as clinical symptoms of AD become apparent only after a significant amount of brain tissue has already been damaged. Magnetic resonance imaging (MRI) has strongly contributed to progress in this direction. Its high spatial resolution as well as sensitivity to brain's shape and volume 2 allows it to capture structural atrophy in early stages of AD. Furthermore, MRI techniques do not employ any radioactive or radiation emitting substances which makes it safer for repetitive use in tracking development of the disease.
Several MRI based approaches have been developed for automatic classification of subjects into AD patients or cognitive normal (CN) subjects. Broadly, these approaches can be divided into two categories: methods that use information from specific regions of interest (RoI) and methods that employ voxelwise features obtained from whole brain MRI data. RoI based methods focus on structures in the brain that are known to be involved in AD. [3][4][5] The involvement of hippocampus in AD is well known and has been the motivation for many computational studies that focused on this problem. 3,6,7 Similarly, many studies have also developed models based on features extracted from other brain areas like amygdala, 8 entorhinal cortex, 3 cerebellum, 9 frontal lobe, 10 temporal lobe, 11 and so forth. A limitation of RoI based methods is that their performance depends on experimenter's expertise in selecting appropriate RoIs. Further, they do not provide any information about the contribution of those regions towards the disease that are excluded from analysis.
The performance of methods that utilize whole brain voxels do not rely on an expert for selection of RoIs as all voxels are considered in the analysis. This has encouraged many researchers to develop whole brain MRI based methods for distinguishing AD patients from CN subjects. [12][13][14] One of the first studies in this direction demonstrated that a support vector machine (SVM) trained using whole brain MRI data for the problem of AD vs CN classification performed better than a trained radiologist. 15 Further, Cuingnet et al 14 also showed that methods based on whole brain voxels have higher sensitivity and specificity than methods which employ features extracted from specific RoIs.
Most existing studies have only employed grey matter (GM) features extracted from MRI data for the problem of AD vs CN classification. However, there is ample evidence that suggests existence of a pathological relationship between GM and white matter (WM) atrophy [16][17][18][19][20][21] indicating that WM degeneration might be a powerful biomarker for detecting the progression of AD. Existing studies in this direction have mostly looked at data collected using Diffusion Tensor Imaging (DTI). 22 regions in AD subjects show non-ageing related degradation 24 but this has not been sufficiently exploited by computational models for automatically distinguishing AD  patients and CN individuals. Furthermore, whole brain voxel-based methods have the potential to automatically identify brain areas that contribute towards AD. This is generally described as a feature selection problem where the aim is to identify a smaller distinct set of features that are sufficient to accomplish a given task. For the problem of distinguishing AD patients from CN subjects using structural MRI data, this corresponds to identifying a small number of voxels that are important for the classification problem. A feature selection approach was developed by Mahanand et al 25 using a self-adaptive resource allocation network and integer-coded genetic algorithm using voxel based features extracted from GM regions. Similarly Chyzhyk et al 26 used evolutionary wrapper feature selection mechanism using genetic algorithm for training extreme learning machines on selected cluster of features. In an alternative approach, Mishra et al 27 ranked brain regions as per their impact during progression of disease which helped identify those brain regions that are affected by the onset of the disease.
Motivated by results highlighting WM degeneration in AD patients, in this article, we studied the impact of various clinical attributes and features extracted from MRI data on the performance of SVM for the AD vs CN classification problem. For this purpose, three different SVM classifiers were build using features extracted from GM regions, WM regions and a combination of GM and WM regions. The performance of all classifiers was obtained using 10-fold cross validation. A performance comparison of the classifiers showed that classifiers built using both GM and WM features performed better than classifiers built only using either GM or WM features. Further, consideration of clinical features like mini mental state examination (MMSE) and clinical dementia rating (CDR) along with GM and WM features resulted in further improvements in the classification accuracy. In addition, using these classifiers as a basis, a method was developed for identifying the brain regions that exhibited different characteristics in AD and CN subjects. This method uses the recently developed self regulating particle swarm optimization (SRPSO) 28 to select a small subset of features that are important for obtaining good classification performance. Based on the use of these techniques, the method has been named SRPSO-SVM classifier. SRPSO leverages the principles of self regulation 29,30 for faster optimization but was originally developed for an optimization problem with continuous variables. Here, it was extended to discrete optimization problems for selection of voxels that provide information useful for the classification problem. The features selected using the SRPSO-SVM classifier could possibly help in identifying the brain regions that exhibit differential atrophy across AD and CN subjects.
For evaluating the performance of SRPSO-SVM classifier, we used it to select 10, 20, 30, 40 and 50 features that are useful for higher classification performance. SRPSO-SVM was used in two different scenarios, namely selection of features from GM regions and feature selection based on combination of GM and WM regions. Based on the features selected by the SRPSO-SVM classifier, the performance of SVM classifier was computed using 10-fold cross validation for each scenario. For the different number of features selected, the SRPSO-SVM classifier constructed using GM and WM features performed better than the classifier build using GM features alone. This also establishes the importance of WM features in developing automated models for distinguishing AD and CN subjects. Following identification of features, automatic anatomic labeling (AAL) template was used to map the selected features to the brain region from where they were recorded. This enabled us to identify the brain regions that exhibited different properties in normal and AD subjects. It was observed that features derived from cerebellum and hippocampus were selected most often by the SRPSO-SVM classifier which is in line with clinical studies involving AD subjects. 9,31 Other brain regions selected by the SRPSO-SVM classifier that have also been implicated in clinical studies include frontal lobe, 32 temporal lobe, 33 occipital lobe with calcarine fissure, 34 cingulate cortex, 35 amygdala, 36 praecuneus 37 and parietal lobe, 38 supplementary motor area, 39 postcentral gyrus, 37 fusiform gyrus, 40 and para-hippocampus gyrus, 25 and so forth. Besides, those indicated above, several other regions were also detected by the SRPSO-SVM classifier which might provide information about other brain regions that are involved in AD but have not yet been clinically identified.
The rest of the article is organized as follows: section 2 gives details about dataset and its distribution in training and testing sets. Also, data preprocessing is described in section 2. Section 3 provides mathematical modeling of SVM as a classifier and SRPSO as a feature selection technique. Section 4 offers experimental setup details and results followed by concluding remarks in section 5.

| Participants and data
The standard dataset used is this study was obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) (adni.loni.usc.edu). A total of 299 participants were considered which consists of 137 AD patients (67 male and 70 female) and 162 (76 male and 86 female) controls aged between 55 years and 91 years. The collection of subjects used in this study is same as that used in Reference 14. Standard 1.5T MR images from baseline or screening visits were used and similar demographics (gender and age) were maintained in training and testing dataset. All images were subjected to same correction procedures post acquisition for removal of imaging artifacts. These correction procedures included 3D gradwarp correction, B1 non uniformity correction and N3 bias field correction. The clinical scores for the subjects were also considered in the analysis. CN subjects had MMSE scores between 24 and 30 and CDR score equal to zero. Similarly, AD subjects had MMSE scores between 20 and 26 and CDR score of either 0.5 or 1. For the purpose of classification MMSE and CDR scores for all subjects were normalized to the interval [0,1]. A summarized description of the data used in this study is provided in Table 1.

| Data preprocessing
All T1 weighted images were preprocessed using Statistical Parametric Mapping Toolbox 12 (SPM12) in MATLAB 9.0. Figure 1 shows a schematic description of different steps involved in preprocessing. Each MR images was segmented into GM, WM, and cerebrospinal fluid (CSF) tissue probability maps using the unified segmentation algorithm. 41 The unified segmentation algorithm employs a mixture of Gaussians to build a probabilistic generative model of the data. The voxel-wise tissue probability maps obtained from the model are combined using a Bayesian formulation. After segmentation, all images were registered to a custom template which was iteratively generated using the DARTEL toolbox. 42 In each iteration, the toolbox computes a deformation from the template to each image and an inverse of the deformation is applied to the images. These transformed images are then averaged to compute a new template. All images registered to the final template obtained using DARTEL were normalized with respect to the Montreal Neurological T1 (MNI152) template. The normalized images were smoothed using a 10-mm full-width at halfmaximum isotropic Gaussian kernel. Each smoothed image is then modulated with the Jacobian determinant of its deformation field. These preprocessing steps resulted in 2122945 features for each image. To reduce the dimensionality of the feature space, the obtained images were resampled using a 6mm wide isotropic Gaussian kernel. This resulted in 35557 features. To discard artifacts like background and air, a brain mask was applied to each resampled images. This preprocessing procedure finally resulted in GM and WM tissue probability maps each of which consisted of 8729 features.
For calculation of hippocampal volume (HV), a hippocampus mask was extracted from standard AAL atlas using WFU PickAtlas 3.0 toolbox. [43][44][45] The resolution of generated mask was manipulated to match the preprocessed dataset. The resultant binary mask was then applied to GM tissue density maps obtained after MNI normalization. This resulted in 1878 voxels extracted only from hippocampus regions of left and right hemisphere. The HV was calculated for each subject by summing the intensity of the voxels contained by this mask as described in Reference 44.

| SRPSO-SVM CLASSIFIER FOR FEATURE SELECTION
The problem of feature selection for AD focuses on identifying the features that contribute towards improving the classification performance. By mapping a given selected feature back to the voxel from which it was extracted, it is possible to identify the specific brain of the steps involved in preprocessing of T1-weighted MRI images region that exhibits different structural properties between AD subjects and normal individuals. This might help in finding brain regions that show degradation in AD. In order to identify such brain areas, we describe below a feature selection approach that utilizes SRPSO and SVM and hence has been named the SRPSO-SVM classifier. To provide the necessary background, this section begins with a description of SVM and SRPSO. Using this background, the problem of feature selection utilizing the SRPSO-SVM classifier will be formulated.

| Support vector machines
Support vector machine (SVM) is a supervised learning approach that has been used for a wide range of problems in machine learning. In this study, a SVM classifier for two classes was built using samples ( where x i R p represents a sample from class y i {−1, +1}. A standard SVM classifier estimates the parameters ω and b which describe a hyper plane that separates samples from the two classes. This is referred as a hard margin SVM. However, hard margin SVM is limited to problems where samples from the two classes are linearly separable. For linearly inseparable problems, soft margin SVM was developed which relaxes the constraints imposed by a hard margin SVM. A soft margin SVM estimates a hyper plane that minimizes the following objective function: In the above equation, parameter C determines the penalty of misclassification error which is represented by ξ i . In our work, value of parameter C remains 1 throughout the experiments.
3.2 | Self regulating particle swarm optimization The original particle swarm optimization (PSO) 46 algorithm is inspired by social behavior of animals in a group such as a flock of birds, school of fish, etc. It involves simulating a swarm of particles whose positions are given by a D-dimensional vector in the search space. The position of the i th particle in the t th iteration given by which denotes a candidate solution and its velocity in the t th iteration is denoted and by To search for optimal solutions, particle move around in the search space by updating their velocity using experience acquired by all particles in the swarm. The best position explored by the i th particle is denoted by P t i and best position explored by the whole swarm is denoted by P t g . In each iteration, the velocity of the i th particle is calculated by considering its personal best position and the global best position of the swarm using the following equation where c 1 and c 2 represent the acceleration coefficients and r 1 and r 2 are the random numbers distributed uniformly in the range [0,1]. Based on the computed velocity, the position of the i th particle in (t + 1) th iteration is given by Similar to original PSO algorithm, SRPSO is an iterative evolutionary technique that explores the search space for prospective solutions to a given problem. In addition to the experience of particles in the swarm, SRPSO incorporates in each particle the ability to adapt its velocity based on two strategies grounded in human cognitive psychology. These strategies include selfregulating the inertia weight and self-perception of search directions. The first strategy updates the inertia weights (ω i ) for the best particle and other particles using different update rules to achieve faster exploration of the search space. The second strategy emphasizes collaborative exploration among particles in SRPSO. Next, the two strategies of SRPSO are described in detail.
• Self-regulating the inertia weight: The parameter inertia weight (ω i ) in SRPSO determines the balance between exploration and exploitation performed by the particles. Original PSO employed a "common" inertia weight for all particles which is decreased linearly whereas in SRPSO each particle had an individual inertia weight which is updated using a self-regulating inertia weight strategy. In this strategy, the particle that achieved the current global optimum increases its inertia weight in a given iteration whereas all other particles decrease their inertia weight. This encourages the particle that achieves current global optimum to perform exploitation by accelerating in its current direction of search. Thus, inertial weight of the i th particle is updated using the following equation in this strategy: where η is a constant to control the rate of acceleration which is set to 1. 4ω is change in inertia weight is computed as follows: where ω I and ω F are initial and final values of inertia weight, respectively. N Iter is the total number of iterations.
• Self-perception of search directions: Unlike PSO, in SRPSO, the best particle and other particles have different velocity update mechanisms. Each particle other than the best particle considers its perception of the search direction of other particles to adjust its own velocity. This is referred as social cognition in SRPSO. Based on this, the general velocity update equation in SRPSO for a particle having is given by: where p se i represent the parameter that determines perception for the self-cognition and p so i determines perception for the social cognition. The best particle gives priority to its current search direction over its self and social knowledge. Hence, the value of p se i and p so i are set to zero for the best particle. For rest of the particles, p se i is set to 1. Social cognition is realized in SRPSO in randomly selected iterations. For this purpose, in each iteration a random number, denoted by a, is generated in the range [0, 1]. If a is greater then a threshold λ then p so i is set to 1, otherwise p se i is set to zero. As stated by Tanweer et al, 28 λ was set to 0.5 for all experiments. A higher value of λ causes low social cognition in which case particles do not pay attention to global best. A lower value of λ resulted in excessive social cognition leading to premature convergence to current global best. Based on this strategy, the velocity for best particle is updated as and for rest of particles velocity is updated as Based on the updated velocity, the new position of the i th particle at time (t + 1) is computed as ( Table 2)

| Features selection
Several PSO-based feature selection approaches have been developed in literature that rely on a particular classifier to get an estimate about the optimality of a solution that a given particle represents. 47 Each solution represents a collection of features selected by the particle. An estimate of fitness for a particular solution is determined using the performance of the classifier which is built only using features that are selected in that solution. The SRPSO-SVM classifier uses SRPSO to explore the search space for solutions and the performance of SVM is used to obtain an estimate for the fitness of a solution.
To select M features, the position of each particle is initialized using an M-dimensional vector and each element in the vector is randomly initialized to a value in the interval [1, N f ] where N f is the total number of features. The position of a given particle represents the collection of features selected by that particle. For each particle, a SVM classifier is built using the selected features for the problem of AD vs CN classification. The fitness (η i ) of the i th particle is estimated using the classification accuracy, given as Based on the estimate of fitness, the position of the particles is updated using Equation (9). It might be noted that the updated position of the particle may contain non-integer values. To map these non-integer values to a unique feature, these values were rounded to the nearest integer. This procedure was repeated for 1000 of iterations in SRPSO. Pseudo code for feature selection using SRPSO-SVM classifier is given in Algorithm 1.

| RESULTS AND DISCUSSION
In this section, we present results of the four different studies that are used to evaluate the effectiveness of the approach presented here. An initial study is conducted to understand the usefulness of GM and WM features for AD vs CN classification problem using SVM classifiers. In addition, this study also assesses the effectiveness of HV and clinical features for this classification problem.
The clinical features consist of class wise scores of MMSE and CDR tests of each subject, which were normalized in the interval [0,1], prior to their use in classification. In the next study, the performance of the SVM classifier for AD vs CN problem using different combination of features was compared with the performance reported in recent works in literature. Subsequently, an evaluation of the SRPSO-SVM classifier for feature selection is done using volumetric features (GM and WM) subsets of various sizes. Thereafter, the features selected by the SRPSO-SVM classifier are mapped to specific brain areas from where they were extracted to identify areas that are important for better AD vs CN classification. All results reported in this section have been obtained using the training/testing splits mentioned in Table 1. Each sample was subjected to the preprocessing steps described in Section 2. After preprocessing, 8729 features were obtained from both GM and WM regions resulting in a total of 17 458 features. It is known that WM and GM matter legions show different pattern of degradation. 48 To avoid issues arising because of these differences, features obtained from both GM and WM regions were normalized to the interval [0,1]. The performance evaluation was conducted using sensitivity, specificity, accuracy, precision and F1 score.
All the experiments described in this article were conducted on a machine having an Intel core i5 processor with processing speed of 2.50 GHz CPU and 8 GB of RAM. The implementations for experiments were written using Python 3.7 and executed on a machine with Ubuntu v18.04.

| Performance comparison using volumetric and clinical features
In this section, the performance of the SVM classifier for the AD vs CN problem was evaluated using ten different combinations of features. These scenarios include performance evaluation using features extracted from GM regions, WM regions and a combination of GM and WM regions. The scenario that includes features obtained from both GM and WM regions will be concisely referred as (GM + WM). Other scenarios included consideration of clinical scores (MMSE and CDR) and HV in addition to the volumetric features.
The training/testing results of the classifier were obtained using the same splits in all the scenarios described above. Table 3 presents the classification accuracy, sensitivity, specificity, F1 score, and precision. It can be observed from Table 3 that the GM feature based classifier has an accuracy of 87.91% which is 6% more than the accuracy of the WM based classifier. The same pattern can also be observed based on the other metrics used for evaluation. This may be due to the higher discriminability exhibited by GM regions across AD and CN subjects. 49 In case of (GM + WM) based classifiers, accuracy is 89.26% which is 2% more than the accuracy obtained using GM features alone. These results show that inclusion of WM features with GM features improves classification accuracy. When GM + WM features are combined with HV, accuracy further improves to 90.6%. This improvement in accuracy can be attributed to the fact that HV has been shown to differ significantly across AD and CN subjects. 3,6,7 Similarly, considering MMSE and CDR scores in addition to GM + WM features resulted in classification accuracy of 91.27% (GM + WM + MMSE) and 93.28% (GM + WM + CDR), respectively. But, considering HV together with clinical scores and GM + WM features did not result in further improvements in classification accuracy. The best classification accuracy of 94.63% was achieved by the classifier built using GM + WM features and both clinical scores.

| Performance comparison with recent works
In this section, the performance of the SVM classifier built using various combinations of features is compared   (10)) for each particle. 7 Regulate inertia weight for each particle based on its fitness. 8 Update particle's best and global best. 9 end for with the performance reported by other recent works in literature. 14,50-52 Cuingnet et al 14 is considered for performance comparison as it employs the same training and testing data sets as employed here for developing the SRPSO-SVM classifier. The performance results of other methods used for comparison have been reproduced from the corresponding publications. Table 4 presents the results of comparison. Note that all the other works used for comparison have only employed GM features for the purpose of classification.
It may be observed that the SVM classifier built in this work using GM features has a sensitivity of 78% which is 3% lower than that reported by Cuingnet et al. 14 This can be attributed to the differences in the versions of softwares used in preprocessing, resampling and masking. However, the sensitivity of the SVM classifier improved when other features are included in the analysis. Classifiers based on GM + WM, GM + WM + VoI, and GM + WM + clinical features resulted in sensitivities that are 4%, 7%, and 10% better than the performance reported by Cuingnet et al, respectively.
In terms of classification accuracy, the performance of GM + WM features based classifier was similar to the performance reported by Sun et al 51

| Feature selection using SRPSO-SVM classifier
In this section, the SRPSO-SVM classifier was used to select 10, 20, 30, 40, and 50 features in two different settings, namely feature selection using GM and (GM + WM) regions. The effectiveness of the selected features was evaluated based on the performance of the SVM classifiers trained only using the selected features. The training/testing splits were same as those employed in the previous section and classification accuracy (Equation (10)) was used for performance evaluation. Table 5 provides the classification accuracy of the SRPSO-SVM classifier for different number of selected features. It can be clearly seen that the training accuracy of both GM and (GM + WM) based classifiers increases systematically with an increase in the number of selected features. Further, the training accuracy for both classifier types approaches 100% using 50 features alone. Testing accuracy also increases in both cases with an increase in the number of selected features. However, the performance of (GM + WM) classifiers increased more rapidly in comparison to GM classifiers due to presence of comparatively larger number of significant features. This clearly shows that features extracted from WM regions are useful for building classifiers with better performance.
It was also observed that, in all cases, (GM + WM) based classifiers performed better than GM based classifiers in terms of classification accuracy on testing samples. This can be clearly observed from Figure 2 which shows a plot of classification performance versus number of selected features for both settings. Further, (GM + WM) based classifier using 50 features performed better than the (GM + WM) classifiers using all features. This may be attributed to the removal of less discriminative features in

| Mapping selected features to brain regions
Features selected by the SRPSO-SVM classifier in the previous section were used to identify the specific voxels from where these features were extracted during preprocessing. By mapping the identified voxels back to the brain areas, it was possible to identify the areas which exhibited discriminative volumetric changes across AD subjects and CN individuals. For this purpose, the identified voxels were mapped to a brain atlas using automated anatomical labeling (AAL). Figure 3 shows a plot of mapping the voxels extracted from subset of 10 selected features to the corresponding brain areas in a glass brain. Similarly, all voxels from the subset of 20, 30, 40, and 50 selected voxels were extracted and mapped to the AAL template. This amounted to total 150 voxels mapped to the template from all selected subsets. The histogram of these selected voxels around brain regions is shown in Figure 4. This resulted in identification of brain areas already reported in literature 25,31,[37][38][39][40]53 which includes areas like cerebellum, 54 temporal lobe, 55 and cingulate coretx. 34 In addition, our model identified some brain areas whose involvement in AD is yet to be confirmed clinically, like parietal lobe. 25 The features selected by SRPSO-SVM indicate that parietal lobe and some specific sub regions within it like precuneus exhibit degeneration of brain tissue in AD subjects. A future direction of clinical research could be validating the extent of involvement of these brain regions in AD.

| CONCLUSION
In this article, a new approach has been developed for finding brain regions which might contribute towards AD. This method utilizes SVM for the problem of AD vs CN classification and the recently proposed SRPSO. SRPSO was originally developed for continuous optimization problems and here we extend it to the discrete problem of feature selection. The classifiers used in this method employ Gray and White matter features extracted from structural MRI images of AD subjects and CN individuals. Based on the performance evaluation, it is shown that classifiers built using both GM and WM features perform better than classifiers built using GM features alone. On the basis of these results, we used SRPSO to select GM and WM features that are important for the problem of AD vs CN classification. The performance of classifiers built using 50 features selected by SRPSO was similar to the performance of classifiers built using all GM and WM features. Mapping the features selected by SRPSO to specific brain regions using AAL allowed us to identify prospective brain areas that show degeneration in AD subjects. In addition to brain areas that are known to be involved in AD (like cerebellum), this analysis also identified brain areas whose involvement in AD is still debatable (like parietal lobe). F I G U R E 4 Frequency of occurrence of voxels in brain regions related to AD ORCID Shirin Dora https://orcid.org/0000-0001-6182-4124