Alzheimer’s disease diagnosis based on the visual attention model and equal-distance ring shape context features

Alzheimer’s disease (AD) is an irreversible neurodegenerative disease caused by rapid degeneration of brain cells. More and more researchers focus on effective and accurate methods for the diagnosis of AD. In this paper, a method to identify AD by extracting equal-distant ring shape context features from saliency map of structural magnetic resonance imaging (sMRI) is proposed. The experimental results on the thin-layer MR images of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset showed that our method helped improve the performance of identifying brain diseases. Speciﬁcally, the classiﬁcation accuracy of 94.83% for AD versus CN, 98.31% for AD versus MCI and 85.77% for MCI versus CN, respectively. At the same time, experiments on Open Access Series of Imaging Studies dataset and clinically collected thick-layer MR images verify the classiﬁcation performance of the method. The results show that this method may have higher application value in clinical application, with classiﬁcation accuracies of 96.56% and 98.18% for AD versus CN, respectively. Compared with the methods based on gray matter (GM) density, cortical thickness and hippocampal volume, our method achieved higher accuracy of AD (or MCI) and CN classiﬁcation.


INTRODUCTION
Alzheimer's disease (AD) is a common neurodegenerative disease, mainly manifested as cognitive function impairment, behavioral disorder and mental abnormality, which has seriously affected the daily life of the elderly [1]. At present, there are at least 50 million AD patients or other types of dementia patients in the world. With the aging of the global population, AD patients are expected to double by 2050 [2,3]. In 2018, the cost of treatment and care for AD patients in the world had reached trillions of dollars, which had brought a heavy economic burden on patient's family and society. Because the damage of central nervous system in patients with AD is irreversible, there is no effective clinical treatment at present, and the only treatment scheme is to delay the progression of the disease as far as pos-  [4][5][6]. Studies have shown that if MCI is not treated as early as possible, further decline of cognitive function will lead it to develop into AD. Although MCI has a high risk of developing into AD, if it can be detected and treated as soon as possible, the condition of MCI patients does not necessarily develop to AD [7][8][9]. Therefore, early detection, diagnosis and treatment of MCI can delay the progression of AD, which has important clinical and social significance [10].
Typically, the accurate diagnosis of AD, MCI, and cognitive normal cohorts (CN) by clinicians depend heavily on neuropsychological tests, such as Mini-Mental State Examination (MMSE), Functional Activities Questionnaire (FAQ) and Clinical Dementia Rating (CDR). But neuropsychological tests are more subjective and only suitable for patients who have some clinical symptoms. With the rapid development of medical imaging modalities, magnetic resonance imaging (MRI), Single-Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) images has played an important in diagnosis of brain diseases [11][12][13][14][15][16][17][18][19][20]. However, the diagnosis of AD by clinicians using medical imaging mainly depends on a large number of clinical training and experience, and has a certain degree of subjectivity, which will affect the accuracy of AD diagnosis. Extensive clinical training and experience make it more difficult for new or inexperienced clinicians to diagnose AD. Therefore, with the development of machine learning, the study of computer-aided diagnosis (CAD) methods that accurately classify AD (or MCI) and CN by combining medical imaging and machine learning has become a hot topic.
Although these modalities of new imaging techniques have been used in the auxiliary diagnostic research of AD in recent years, structural MRI (sMRI) is still an effective aid to assist in the early diagnosis of AD. The classification performance of CAD methods based on sMRI and machine learning usually depends on features, and therefore feature extraction becomes a critical step in the classification framework. The most commonly used feature extraction methods for sMRI data include voxel-based methods and region of interest (ROI)-based methods.
Voxel-based methods extract voxel-wise imaging features from the whole brain sMRI to construct classifiers for distinguishing AD (or MCI) from CN. For example, Klöppel et al. [21] used voxel-based morphometry (VBM) to generate a gray matter (GM) density map of the whole brain as the input of support vector machine (SVM) classifier to train a classification model to diagnose AD. Li et al. [22] used a multivariate method to extract six cortical features of each participant, and used a linear SVM model to classify MCI and NC. Chu et al. [23] mainly extracted features from GM segmentation map of T1 MR images, and excluded voxels less than 0.2, which have 299,477 voxels in each subject, served as the input features to the feature selection and SVM classifier. Salvatore et al. [24] proposed a method based on machine learning for AD (or MCI) and CN classification, which used PCA to reduce the density of GM and white matter (WM) images to get the features.
ROI-based methods employ imaging features extracted from brain regions, while these regions are usually pre-determined based on biological prior knowledge or anatomical brain atlases. For example, Zhang et al. [25] and Kim et al. [26] proposed a method to classify AD (or MCI) and CN by SVM. The input features of these methods are calculated as the volume of GM within 93 ROIs on MRI and PET images, as well as the original values of three cerebrospinal fluid (CSF) measurements. Wee et al. [27] constructed a regional cortical thickness similarity map for each subject to describe the relative changes in cortical thickness between ROI pairs, which can significantly improve the classification performance of AD. In the literature, [28] proposed a new method for feature selection of hybrid voxel-wise, which combine t-test and genetic algorithm based on Fisher's criterion. The method of Tong et al. first used sparse representation techniques to calculate grading biomarkers for each MCI subject. Then, the grading biomarkers are combined with age and cognitive indicators to provide a more accurate prediction of MCI to AD conversion [29]. Liu et al. [30] first non-linearly register each sMRI separately onto multiple pre-selected atlases, and then extract multiple sets of atlas features for this MR image to construct ensemble classification models for AD/MCI diagnosis. Sørensen et al. [31] extracted the volume and texture features of the hippocampal in sMRI as the input of the SVM classifier to classify AD. Ahmed et al. proposed two AD classification methods based on hippocampal features: in literature [32], they used the circular harmonic functions (CHFs) to extract local features from the hippocampus and posterior cingulate cortex (PCC) to learn SVM classifiers for AD/MCI diagnosis. In literature [33], the classifiers trained independently based on hippocampal and CSF features were combined, followed by another classifier to further refine the diagnostic performance. Zhao et al. [34] also proposed an AD classification method based on hippocampal features. They extracted 56 features of each hippocampus, including intensity, shape, texture and wavelet features.
In general, the features of voxel-based methods usually have high dimensionality, which may cause potential overfitting problems. Therefore, the classification performance of voxelbased methods largely depends on the dimensionality reduction methods. The ROI-based features have low feature dimension and can extract the features of ROIs which are highly related to disease. Therefore, the feature ROI based can improve the classification performance and has been widely used [35]. In this paper, we propose a feature extraction method of equaldistant rings shape context (EDRSC), including saliency map detection, equal-distant rings segmentation and shape context algorithm based on chessboard distance. First, the ROIs of sMRI are extracted, including left and right hippocampus. Second, a visual attention model called PFT (Phase spectrum of Fourier Transform) model is exploited to detect saliency map of ROIs. Then, the shape contour of saliency map is segmented by equal-distant rings, and then the EDRSC features of saliency map are extracted. Finally, Support Vector Classification (SVC) is used to build a disease classification model. The main contributions are that (1) The ring based on equal distance can be uniformly divided the whole shape contour, and the distribution of all points on the image contour can be uniformly reflected.
(2) In the rectangular coordinate system, the set of the pixels in each ring are determined by the chessboard distance, and the spatial position information of the whole target shape can be obtained, including the distance information and direction information. (3) Attention selection is an important mechanism of human visual perception. It is a conscious activity in which human chooses and keeps important information from a large amount of information input from the outside world and ignores useless or secondary information. Therefore, in this paper, the visual attention model is used to detect the saliency map of MR images, and the selection of salient regions containing important information to construct EDRSC features can improve the classification performance of our method.  clinic showed that regardless of the thin-layer MR images or the thick-layer MR images, which can get good classification results and have higher application value in clinical application.

MATERIALS AND IMAGE PRE-PROCESSING
All thin-layer 3D images in this study are downloaded from the public dataset Alzheimer's Disease Neuroimaging Initiative (ADNI), which included MR images of subjects with AD, MCI and CN (The URL of the dataset is adni.loni.usc.edu). ADNI dataset was launched in 2003 to connect researchers with research data. The dataset is divided into four stages: ADNI1, ADNI/GO, ADNI2 and ADNI3. It collects a large amount of MRI and PET images, genetic data, blood biochemical indicators and CSF data. The primary goal of ADNI is to verify and determine the relationship between the collected data, determine the progression of AD, and provide the basis for early diagnosis and treatment of AD. ADNI's research protocol was approved by the local institutional review board. The study protocol is specifically as follows: ADNI recruited more than 800 adults including CN subjects, AD subjects and MCI subjects to participate in the study, where age range is 55-90 years (all subjects signed a written informed consent). Specifically, 200 CN subjects were observed for 3 years, and 400 MCI subjects and 200 AD subjects were followed up for 3 years and 2 years, respectively.

Subjects
In our method, the images of 448 subjects in the ADNI1/GO stages are mainly selected for the experiments. These MR T1weighted images are acquired using MPRAGE or equivalent protocols of different resolutions with a slice thickness of 1.2 mm, which have undergone several pre-processed steps of research groups belonged to the ADNI. In detail, first, the geometric distortion of the image caused by the gradient model was corrected,and then the B1 non-uniformity of the image intensity was corrected. Finally, the N3 histogram peak sharpening algorithm was applied to reduce the intensity non-uniformity of the image. The detailed statistical of all research subjects in our method are shown in Table 1, which are divided into three different classes. CN: They were the normal control group collected by ADNI who did not have depression, MCI, or other dementia. The MMSE score of these subjects is 24 to 30, with the CDR score of 0 [36,37]. MCI: They have no significant other cognitive impairments and maintain their daily activities. The MMSE score is between 24 and 30, and CDR of 0.5. AD: They were the ADNI collection of subjects identified as AD who met the NINCDS/ADRDA criteria for possible AD [38]. The MMSE score of AD is between 20 and 26, and CDR of 0.5 to 1.

MR image pre-processing
The pre-processing is divided into three steps: tissue segmentation, discriminative ROIs extraction, and image slices generation. In this section, we will describe these three steps in detail. 1) Tissue segmentation: Many studies have shown that the main morphological and structural abnormalities of AD are GM in the brain. Therefore, the accuracy of CAD system is largely dependent on brain tissue or structural segmentation, such as GM or WM tissue section. In this study, all of original 3D MR images downloaded from the ADNI dataset in the NIFTI format are segmented using the CAT12 (dbm.neuro. uni-jena.de/cat/) toolkit running on MATLAB (mathworks.cn) software. CAT12 is a MATLAB toolkit based on SPM12 (fil. ion.ucl.ac.uk/spm/), which developed by Christian Gaser and Robert Dahnke of Departments of Psychiatry and Neurology at Jena University Hospital, Germany. The tissue segmentation procedure can be implemented via the module 'Segment Data'. This is mainly to register all 3D MR images into the MNI space (MNI152 T1 1.5mm brain) by Dartel registration to achieve spatial standardization [39,40]. Finally, the skull of each MR image is removed, and the GM MR image of size 121 × 145 × 121 voxels are obtained. The results are shown in Figure 1a.
2) Discriminative ROIs extraction: In general, AD and agematched healthy old people have apparent morphological structural abnormalities compared to their brain structures, including the volume reduction of the hippocampus and the increase in the ventricles. Therefore, in this study, we performed t-tests on different categories of subjects to obtain areas of severe atrophic brain atrophy associated with AD. It can be obtained from Figure 1b that the discriminative ROIs are the left and right hippocampus region (p <0.0001). Therefore, the left and right hippocampus are used as a set of distinguish the ROIs in order to better compare AD (or MCI) and CN. The specific ROIs extraction steps are as follows: First, in order to extract the most discriminative ROIs of the hippocampus, we use the AAL Atlas to make the left and right hippocampus mask according to the brain region number [33,41]. Then, multiply the obtained mask  3) Image slices generation: Brain MRI scan is to scan the whole brain one by one along the anatomical axis of the human body. One layer is an image slice, and each slice is a conventional 2D image. All of these 2D images come together to form a 3D MRI. The extraction of 3D brain MRI features are very complex and time consuming. In order to reduce the time of the feature extraction, we use the MRIcro software to save each layer of the Transverse View section of the left and right hippocampus ROIs image as 2D images with BMP format (McCauslandcenter.sc.edu/crnl/mricro/). The results are shown in Figure 1c.

METHOD
This study can be described as the following several main steps. First, all 3D MR image are pre-processed according to the pre-processing steps in Section 2.2 to obtain the BMP image of ROIs. Second, PFT model is exploited to detect saliency map of ROIs BMP image. Then, the shape context of equal-distance ring-based method is used to extract shape features of saliency map. Finally, SVC is used to build the disease classification model. The framework of our method is illustrated in Figure 2, and we will describe the detailed process of feature extraction in later sections.

Saliency map detection
Visual attention mechanism is the key to ensure high efficiency of visual cognition process. It can select visual sensory information and only provide the important information to visual perception process, while the other information is rejected, thus making visual cognition process active and selective. With the increasing interest of researchers in visual attention research Concentric Rings Partition based on Equal-Distance and the increasing ability of computer to process information and realize complex computer vision system, many visual attention models are proposed, such as NVT (Neuromorphic Vision Toolkit) model [42], STB (Saliency ToolBox) model [43], SR (Spectral Residual) model [44], and PFT model [45]. In this paper, we use the PFT model for saliency map detection because it runs faster and more effectively than the NVT model, STB model, and SR model. The PFT model is a model that calculates the salient region of the image in the transform domain. It uses the phase spectrum of the Fourier transform to extract the salient region of the image. I (x, y) represents an input image. It is assumed that F and F −1 represent Fourier transform and inverse Fourier transform of the image, respectively. Therefore, the saliency map of the detected image using the PFT model can be expressed by the following equation [45]: Where g(x, y) represents a two-dimensional Gaussian filter (sigma=8), and p(x, y) is determined by the following formula: p(x, y) = P ( f (x, y)).
Where P (.) represents the phase spectrum obtained by Fourier transform of the input image, and f (x, y) is defined as follows:

Image feature extraction
Traditional shape context (SC) is mostly used for shape matching and target recognition [46,47], which is a feature description method based on shape contour proposed by Belongie et al. [48]. The traditional SC is calculated as follows: Step1: Contour detection. For a given shape I , the edge of the contour is obtained by the edge detection operator, and a set of discrete points P = {P 1 , P 2 , P 3 , … , P N } is obtained by sampling the edge of the contour.
Step2: Shape context calculation. In the log-polar coordinate system, with any point P i as the reference point, multiple concentric circles are established by Euclidean distance in the local area where P i is the center of the circle and R k (k = 1, 2, 3, 4, 5) is the radius. Each concentric circle is divided into multiple divisions in the circumferential direction as shown in Figure 3a. The vector relative position of the point P i to the other points is reduced to the number of dot distributions within each sector. The statistical distribution histogram of these points, called the shape context of point P i .
In this study, a new feature extraction method based on shape context of equal-distance ring is introduced. comparing Figure 3a,b, it can be seen that the ring based on equal-distance divides the whole shape contour evenly and can better reflect the distribution of all points on the image contour. In order to be able to quantify, the direction parameter and distance parameter are introduced into the equal-distance rings, where the direction parameter represents the number l of sectors divided into a circle, such as the direction parameter in Figure 3b is 12, and the distance parameter represents the number k of the equal-distance rings, as shown in Figure 3b is 5. Figure 3b is a schematic diagram of a partition with five equal-distance rings.
Suppose an image I , the size of M × N , and all the points on the contour of image are recorded as P = {P 1 , P 2 , P 3 , … , P N }, (x, y) is the horizontal and vertical coordinate value of point P i on the contour, R k is the radius of the kth ring of equaldistance(k = 1, 2, 3, 4, 5), as shown in Figure 3b. It is obvious that here, the contour image is divided into a set of rings of equal distance, That is, R 1 = R 2 = R 3 = R 4 = R 5 . Here, the shape context based on the equal-distance rings uses the chessboard distance instead of the Euclidean distance on the basis of the traditional shape context, and the ring partition can be performed in the following manner: Where d x,y represents the checkerboard distance of each point P i coordinate (x, y) on the contour to the equal-distance rings center coordinates (x c , y c ), the formula is as follows: In this study, the shape context features based on equal-distance rings are carried out in rectangular coordinate system. First, the chessboard distance between each pixel point and the center point of the image is calculated, and then the chessboard distance is compared with the radius of each ring to determine the set of pixel points in each ring. Finally, the histogram statistics of all the ring sectors can get the spatial position information of all the points of the entire target shape, including distance information and direction information. Let S k be the set of pixel values of the kth ring (k = 1, 2, 3, 4, 5), and H l be the number of pixel value sets on the l th sector of each ring (l = 1, 2, 3, … , 12), so the feature vector size of each ROI is 1 × 60. The detailed rules are as follows:

Classification model
In this study, we used LIBSVM to build SVC model for AD classification [49]. LIBSVM is an open source library based on SVM [50][51][52][53][54]. It was developed by Professor Chih-Jen Lin of Taiwan University. It is mainly used for classification (supporting binary classifications and multiple classifications) and regression. LIB-SVM is characterized by its simplicity of operation, ease of use, fast and efficient, and relatively few adjustments to the parameters involved in SVM.

Validation methods and evaluation metrics
To estimate the classification performance of this method, 10 times 10-fold cross validation was used to verify the classifica-tion experiment results. The basic principle of 10-fold cross validation is to randomly divide the obtained image features into 10 groups, each group is 10% of the total data, of which 9 groups are used each time to build the classification model, and the rest of group is used for testing. Then repeat the above steps for 10 times to calculate the total accuracy. 10 times 10-fold cross verification is to repeat the above steps 10 times. In the experiment, four metrics values such as Accuracy (ACC), Sensitivity (SEN), Specificity (SPE) and area under the ROC curve (AUC) are given to illustrate the classification performance of the method. In generally, the method has a low classification performance in AUC value of 0.5-0.7; at 0.7-0.9, the classification performance can be considered moderate; if it is greater than 0.9, the classification performance is considered to be higher [55]. The Where TP is true positive, TN is true negative, FP indicates false positive and FN represents false negative.

Classification experimental results
In the experiment, we first test the classification performance of the method. In this study, left and right hippocampus image features are extracted, respectively. Therefore, the input features are divided into three types, including only left hippocampal features (HL for short) , only right hippocampal features (HR for short) and combined with left and right hippocampal features (HC for short), and three classification results were obtained. In order to better evaluate the positive effect of the equal-distant rings segmentation step in this method, we compared the traditional SC method (called SC) with our EDRSC method. At the same time, in order to compare the influence of different distance measurement methods on the method, we use different distance measurement methods to conduct experiments. Among them, EDRSC-based chessboard distance is called EDRSC-CD, EDRSC-based Euclidean distance is called EDRSC-ED, and EDRSC-based city block distance is called EDRSC-BD. Table 2 summaries all the results. It can be seen from Table 2: (1) The classification performances of EDRSC-ED, EDRSC-BD, and EDRSC-CD are superior to SC, which shows that the features extracted based on the step of equal-distant rings segmentation have better classification results. (2) Among the three classification tasks, the classification results of EDRSC-ED and EDRSC-BD are not much different from EDRSC-CD, but the classification results of EDRSC-CD achieved the highest classification accuracy (The best results are shown in bold in the table). Specifically, in classifying AD from CN, the classification accuracy of HC and HL of all three methods are higher than of HR. Among them, Using HL features, the EDRSC-CD achieved the classification ACC of 97.43%, SEN of 97.82%, SPE of 97.37%, and AUC

Comparison with state-of-the-art methods
In Table 3, the classification accuracy of our method is compared with the results of several methods that use sMRI data as the research subject and SVM as the classifier to classify AD versus CN and MCI versus CN, including two voxel-based methods [23,24] and five ROI-based methods [25-27, 32, 33]. It is worth noting that the performance evaluation of these methods concerns the feature extraction method rather than the design of the classifier. And in order to ensure the classification performance of each method, the classification results of each method in the table are the best results obtained through experiments on its original dataset [25, 26, 32 56, 57]. Although the sMRI data selected by all methods for experiments are not exactly the same, the sMRI images of all methods come from ADNI and are obtained by MPRAGE or equivalent protocols of different resolutions, which have been uniformly processed by several pre-processing steps of ADNI research groups (See Section 2.1 for details). Therefore, although the results in Table 3 are may not completely comparable, we can roughly comparing our study (i.e. the last row of Table 3) with these state-of-the-art methods to verify the efficacy of our proposed method.
Specifically, of the two voxel-based methods, Chu et al. [

Verification on the OASIS dataset
To further verify the classification performance of this study, we conducted training and testing on another public dataset (OASIS dataset).The OASIS is a series of MRI datasets, including OASIS-1, OASIS-2 and OASIS-3, which can be used publicly by researchers. OASIS-3 is a dataset used to classify and  Table 4.
We compared the results of the AD and CN classification experiments on the OASIS and ADNI datasets, as shown in Figure 4. It can be seen from the Figure 4

Verification on the thick-layer MRI dataset
To validate the classification results of this study in clinical practice, we added thick-layer sMRI collected from clinical practice to ADNI for training and testing. All the thick-layer MR T1weighted images were collected by the Guangxi Medical University First Affiliated Hospital, and the thickness of the layer is 7 mm (a total of 212 subjects, including 62 AD subjects, 90 MCI subjects, and 60 CN subjects). The regional ethics committee approved the study and obtained written informed consent from all participants. We guarantee that all participants information will be kept confidential and will not be used for commercial purposes.The detailed statistical of all research subjects are shown in Table 5.
From the previous experimental results, it can be seen that the EDRSC method using HC features has better classification performance of AD (or MCI) and CN. As a result, we used the HC features method to test the thick layer of sMRI. The results are shown in Figure 5. Specifically, for classifying AD from CN, when only the ADNI dataset images are available, the ACC of 94.83% (SEN = 96.10%, SPE = 94.18%,

LIMITATIONS
Our method has the following limitations. First, in this study, we only considered the imaging modality for AD, CN and MCI classifications. However, there are other modalities data in ADNI, for example, genetic data, blood biochemical indicators and CSF data. These modalities data may also contain other supplementary information about diseases, which can further improve the classification performance. Second, studies have shown that MCI includes pMCI and sMCI, where patients with sMCI do not convert to AD, and patients with pMCI convert to AD [58]. We only studied the classification between AD (or CN) and MCI without a more detailed classification, such as further classification between pMCI and sMCI. Finally, like most studies, we only consider the binary-class classification problem (i.e. AD vs. CN, MCI vs. CN and AD vs. MCI), and do not perform the multi-category classification task. In the future, we will solve the above limitations and further improve the classification performance.

CONCLUSION
sMRI is an effective tool for diagnosing AD. It can be seen from sMRI that the morphological structure of AD patients changes significantly compared with the age-matched CN, such as the decrease of hippocampal volume and the increase of ventricular volume. Based on the changes of morphology and structure, this paper proposed a method for the diagnosis of AD by extracting EDRSC features based on saliency map of left and right hippocampus from sMRI, respectively. The open datasets (ADNI and OASIS) and the collected clinical thicklayer images were used to carry out the experiments. The experiments showed that this method has higher performance than the existing feature extraction methods (such as GM density, cortical thickness and hippocampal volume or shape).