Can dementia be predicted using olfactory identification test in the elderly? A Bayesian network analysis

Abstract Background Previous studies suggest that olfactory dysfunction is associated with cognitive decline or dementia. Objective To find a potential association between the olfactory identification (OI) and dementia onset, and build a prediction model for dementia screening in the older population. Methods Nine hundred and forty‐seven participants from the Shanghai Aging Study were analyzed. The participants were dementia‐free and completed OI test using the Sniffin’ Sticks Screening Test‐12 at baseline. After an average of 4.9‐year follow‐up, 75 (8%) of the participants were diagnosed with incident dementia. Discrete Bayesian network (DBN) and multivariable logistic regression (MLR) models were used to explore the dependencies of the incident dementia on the baseline demographics, lifestyles, and OI test results. Results In DBN analysis, odors of orange, cinnamon, peppermint, and pineapple, combined with age and Mini‐mental State Examination (MMSE), achieved a high predictive ability for incident dementia, with an area under the receiver operating characteristic curve (AUC) larger than 0.8. The odor cinnamon showed the highest AUC of 0.838 (95% CI: 0.731–0.946) and a high accuracy of 0.867. The DBN incorporating age, MMSE, and one odor test had an accuracy (0.760–0.872 vs. 0.835) comparable to that of the MLR model and revealed the dependency between the variables. Conclusion The DBN using OI test may have predictive ability comparable to MLR analysis and suggest potential causal relationship for further investigation. Identification of odor cinnamon might be a useful indicator for dementia screening and deserve further investigation.


| INTRODUC TI ON
Dementia is an overall term for diseases and conditions characterized by a decline in memory, language, problem-solving, and other thinking skills that affect a person's ability to perform everyday activities. Alzheimer's disease (AD) is the most common cause of dementia. There are 47 million people with dementia worldwide. By 2050, the number of people with dementia is estimated to increase to more than 131 million (Prince, Comas-Herrera, Knapp, Guerchet, & Karagiannidou, 2016). Because effective treatment for dementia is lacking, it is imperative to explore the risk factors and provide early identification of cognitive decline and dementia. Accumulating evidence from both human studies and disease models indicates that intercellular transmission and the subsequent templated amplification of some misfolded proteins (e.g., amyloid-β and τ, α-synuclein, and TAR DNA-binding protein 43) are involved in the onset and progression of various neurodegenerative diseases (Peng, Trojanowski, & Lee, 2020). Except for traditionally recognized mmega-3 fatty acids, recent findings reveal nicotinamide adenine dinucleotide and related metabolites playing important roles in the adaptation of neurons to a wide range of physiological stressors and in counteracting processes in neurodegenerative diseases, and chronic gamma entrainment and tacrine-benzofuran hybrids may offer neuroprotective effects, which might provide new therapeutic opportunities (Adaikkan & Tsai, 2020;Fancellu et al., 2020;Lautrup, Sinclair, Mattson, & Fang, 2019). Olfactory dysfunction, which increases substantially with aging, represents an important clinical symptom suggesting the early stage of neurodegenerative disorders (Attems, Walker, & Jellinger, 2015). Previous cross-sectional and longitudinal population-based studies suggest that olfactory dysfunction is associated with impairment in various cognitive domains and incident cognitive decline and dementia, and emphasize its essential role as a predictive marker (Roalf et al., 2017;Wehling, Nordin, Espeseth, Reinvang, & Lundervold, 2011).
The Shanghai Aging Study (SAS) is a community-based cohort study for investigating the progression of cognitive decline in Chinese elderly, with study design, operational procedures, and diagnostic criteria similar to most cohort studies in developed countries and published previously (Ding et al., 2014). At the baseline of SAS, the Sniffin' Sticks Smell Test-12 (SSST-12) was used to examine the olfactory identification (OI) ability of the study participants.
Our previous study of the cross-sectional phase of SAS explored the relation between lower total OI score and mild cognitive impairment (MCI; Liang et al., 2016). At the prospective phase, we further demonstrated that the inability to smell peppermint was associated with a higher risk for incident dementia (Liang et al., 2020). The associations, however, were examined only by multivariable logistic regression (MLR) model. The predictive value of OI test and identification ability of certain odors needs to be further validated.
A Bayesian network (BN) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG; Scutari & Denis, 2014). BN is ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor.
Experience has shown that BN and associated methods are geared to reasoning with uncertainty in a way closely resembling physicians (Kammerdiner, Gupal, & Pardalos, 2007;Lucas, 2007;Pearl, 2014).
Physicians who address to develop computer-assisted system for making clinical decisions are frequently confronted by the complexity and uncertainty in the models and prediction. In many cases, the situation is even worse as many of the processes in medicine are only partly known (Lucas, 2007). During the past decade, BN has become an important tool for building decision-support systems in medical sciences and is now steadily becoming main stream in some areas (Mani, Valtorta, & McDermott, 2005). Although the BN model has been used in studying gene expression levels for the PD (Mestizo-Gutiérrez, Jácome-Delgado, Rosales-Morales, Cruz-Ramírez, & Aranda-Abreu, 2019), and in predicting the AD using clinical data (Khanna et al., 2018), according to our literature search, there is no study that has used the method for predicting dementia using the OI data from observational studies. In this study, by using the data from SAS, we examined the performance of the BN analysis in predicting incident dementia and compared the BN with the MLR model. Our study also aimed to find associations between the baseline variables, including olfactory function, and dementia onset, and to build a prediction model with high performance for dementia screening in the older population. The identification of one or several odors that are sensitive for dementia prediction would benefit large scale population screen programs for dementia prevention and intervention in elders.

| Study participants
The participants of the current study were a subcohort of SAS.
In brief, SAS was designed to establish a prospective communitybased cohort to examine the prevalence and incidence of dementia and MCI in Chinese older adults residing in central Shanghai (Ding et al., 2014). Between January, 2010and December, 2011 participants aged 60 years or older were recruited and completed the clinical interview as the baseline. Among them, 1,782 were as- (e) had chronic obstructive pulmonary disease or had experienced an acute upper respiratory tract infection within 1 week; (f) used alcohol and drugs excessively; (g) had dementia or other severe neurological diseases; or (h) refused to participate, were lost to follow-up, or were deceased; (i) did not cooperate for a completed data collection at the follow-up interview. Finally, 947 participants completed the follow-up interview and were included in the current study (Figure 1). Detailed recruitment and follow-up procedures of SAS were reported previously (Ding et al., 2014;Liang et al., 2016Liang et al., , 2020.

| Baseline data
At the baseline, data on demographics, lifestyles, and medical history of each participant were obtained from a face-to-face interview by trained research nurses and neurologists. Height and weight were measured and used to calculate the body mass index (BMI). History of chronic diseases, including hypertension, coronary artery disease (CAD), diabetes, and stroke were asked and confirmed by medical records maintained by participants (Ding et al., 2014). Depression was defined as present if the scores of Center for Epidemiologic Studies Depression Scale (CESD) ≥16 (Eaton, Smith, Ybarra, Muntaner, & Tien, 2004).
The SSST-12 kit was produced by Burghart Medical Technology, Hamburg, Germany (Burghart Medical Technology). Participants were asked to sniff each odor sticks and to choose one of four answers from a list that described best the odor. The administration methods of SSST-12 were described in detail elsewhere (Liang et al., 2016).
DNA was extracted from blood or saliva collected from the participants at the baseline (Ding et al., 2015). Apolipoprotein (APOE) genotyping was conducted using the TaqMan SNP method (Smirnov, Morley, Shin, Spielman, & Cheung, 2009). The presence of at least one ε4 allele was defined as APOE-ε4 allele positive.

| Diagnosis of cognitive function
Both at the baseline and follow-up, the cognitive function of each participant was evaluated by a battery of neuropsychological  (1) to (5) and (7), and those with less than 6 years of education were administered by tests (1) to (4) and (6) and (8). The normative data and detailed description of these tests were reported elsewhere (Ding et al., 2014(Ding et al., , 2015.
Two study neurologists, one neuropsychologist and one neuroepidemiologist reviewed the functional, medical, neurological, psychiatric, neuropsychological data, and Clinical Dementia F I G U R E 1 Flow chart of the subcohort participants in the current study

| Descriptive analysis
Data of demographics, lifestyle, medical history, and OI test results were presented as mean with standard deviation (SD) or median with interquartile range (IQR) for continuous variable, and as number and percentage for categorical variables. Difference between groups was tested by the chi-squared test for categorical variables, and Student's t test or Mann-Whitney U test for continuous variables. A two-sided p-value <.05 was considered as statistically significant. The descriptive analyses were performed using Stata 16.0 (StataCorp LLC).

| Bayesian network analysis
Prediction for incident dementia was conducted using multinomial discrete BN (DBN). Before entering the DBN, continuous variables were discretized into ten categories based on their own deciles.
Although at the cost of losing some information, the discretization may accommodate skewness of the variables and nonlinear relationships between them, and speed up the computation substantially (Hartemink, 2001;Sachs, Perez, Pe'er, Lauffenburger, & Nolan, 2005;Scutari & Denis, 2014).
The K-fold cross-validation method was used during the DBN structure learning and validation. K-fold cross-validation is a standard way to obtain unbiased estimates of a model's goodness of fit and to handle the overfitting problem when applying only one single dataset in statistical learning (James, Witten, Hastie, & Tibshirani, 2013). In the current study, we randomly split the dataset into five equal partitions, instantiated five identical DBNs, and trained each one on four partitions while validating on the remaining partition. In each iteration, the prediction was made for the one held-out partition. In the end, the validation for the whole dataset was obtained by combining the prediction for the five held-out partitions (James et al., 2013). When learning the structure of the DBNs, an initial black list was used to block the arcs from dementia to the baseline variables, and the arcs from the other baseline variables to sex and age, and no other constraints were used.
The hill-climbing (HC) algorithm was used to learn the structure of the DBNs. The HC starts from a network with no arcs, then adds, removes, and reverses one arc at a time, and finally picks the change that increases the network's Bayesian information criterion score the most (Scutari & Denis, 2014).
The performance of the DBNs was evaluated using metrics including sensitivity, specificity, accuracy, and area under the receiver operating characteristic (ROC) curve. Terminology and derivations of the metrics were given in detail elsewhere (Cao, Fang, Ottosson, Naslund, & Stenberg, 2019). A successful prediction model for incident dementia was defined as one with an area under the ROC curve (AUC) >0.7 (Mandrekar, 2010;Marzban, 2004).
We also compared the performance of the DBNs with that of the traditional stepwise MLR model based on bidirectional variable selection. The K-fold cross-validation was also used for the stepwise MLR analysis.
The DBNs were constructed using the package bnlearn in software R version 3.62 (R Foundation for Statistical Computing) (Scutari & Denis, 2014). The stepwise MLR analysis was conducted using package MASS in R (Ripley, 2002), and a two-sided p-value <.05 was considered as statistically significant.

| Characteristics of the study participants at baseline
In general, there was no significant difference in the baseline characteristics between the 835 excluded participants and the 947 included participants, except for the percentage of positive APOE-ε4 carriers (18.2% vs. 16.5%, p < .001). Although the mean age and BMI of the included participants were a bit larger (70.51 vs. 70.12 years, and 24.49 vs. 24.07 kg/m 2 , respectively), the differences were not clinically significant (Table 1).
After a mean of 4.9 (SD = 0.8) years follow-up, 75 of the 947 included participants were diagnosed with new-onset dementia.
Compared to the 872 participants without dementia, the 75 dementia cases were 8-year older (77.8 vs. 69.9 years) when recruited. Although there was no statistically significant difference in baseline BMI between the dementia cases and those without dementia (controls) (24.5 vs. 24.5 kg/m 2 ), the cases averagely were shorter (156.5 vs. 162.0 cm), weighed less (59.1 vs. 64.4 kg), and had less education (9 vs. 12 years). CAD, stroke, and APOE-ε4 positive were more frequently observed in the cases ( Table 2). The participants with incident dementia had lower correct identification rate in eight odors (leather, cinnamon, peppermint, banana, liquorice, coffee, rose, fish) among the 12 ones in the baseline OI test ( Table 2). The cases were also showed lower OI sum score (OIS) and MMSE score at baseline (Table 2).

| Structure of the Bayesian networks and their performance
When no other constraints except for the initial black list were adopted in the DBN structure learning process using the HC algorithm, the probability of dementia incidence was found only dependent on age ( Figure S1). Although there were dependencies observed among demographic variables and among odors, separately, no dependency was observed between the two groups of the variables.
Besides, no dependency was observed for education, MMSE, and APOE-ε4.
When using the initial DBN to predict the incident dementia, that is, only age used as a predictor, the accuracy of the model is  (Table 3).
To investigate whether including simple dependencies of incident dementia on OI test and other baseline variables may improve the performance of the initial DBN, in addition to age, we included the arcs from a single odor to dementia (i.e., the dependencies of incident dementia on a single odor) into the DBN one by one first. It turned out that cinnamon performed best for prediction in validation, with an AUC of 0.779 (95% CI: 0.672-0.886).
Although the accuracy (0.630) of the DBN is relative low because of the low specificity (0.600), its sensitivity is as high as 0.895 (Table 3).
To further improve predictive ability of the DBN, we incorporated other statistically significant variables of the MLR analysis in TA B L E 1 Characteristics of the study participants at baseline the DBN one by one. It turned that the DBNs incorporating dependencies on a single odor and MMSE performed best in validation.
The performance metrics of the DBNs including the dependencies on age, MMSE, and one single odor are shown in Table 3, and the ROCs are shown in Figures 2 and 3 for the training and validation, respectively. In general, using baseline age, MMSE and one odor among orange, cinnamon, peppermint, and pineapple may achieve a very good prediction in validation (AUC > 0.80) (Table 3 and Figure 3).
Again, the DBN including the dependency of dementia incidence on cinnamon showed the highest AUC of 0.838 (95% CI: 0.731-0.946) and a high accuracy (0.867) ( Table 3). The structure of the DBN is shown in Figure 4.

Performance of the DBNs including the dependency of incident
dementia on other baseline variables is shown in Table S2. Compared to the DBNs including the dependencies of incident dementia on age, MMSE, and one odor, DBNs including more dependencies of incident dementia showed worse predictive ability (Table 3 and   Table S2).

| D ISCUSS I ON
The DBN analysis in our study indicated that using baseline age, MMSE, and one odor among orange, cinnamon, peppermint, and pineapple may achieve a very good prediction (AUC > 0.80) for incident dementia. Cinnamon odor is an indicator with a high sensitivity of 0.895.
Although the underlying mechanism is not ascertain, olfactory dysfunction is known as one of the early symptoms of some  neurodegenerative disorders, such as AD and Parkinson's disease (PD) (Hawkes, Shephard, & Daniel, 1997;Koss, 1986;Serby, Corwin, Conrad, & Rotrosen, 1985). This may provide a perspective into the process of early anatomical change in neurodegenerative disease. Some evidence indicated that AD-related pathology would first occur in the olfactory bulbs and tracts, where amyloid-β protein (Aβ), tau, and α-synuclein are concentrated (Schofield, Finnie, & Yong, 2014). The lesion involves multiple levels of the olfactory system as it progresses, including the surrounding olfactory bulb, olfactory epithelium, and olfactory pathways connecting cognitive regions in the brain (Attems, Walker, & Jellinger, 2014). A meta-analysis provided evidence that in AD higher order olfactory functions appear to be more strongly affected than in PD. The stronger deficits found in odor identification and recognition in AD may thus be interpreted as the sum of perceptual and cognitive deficits, whereas detection thresholds deficits in PD, might be less dependent on cognition (Rahayel, Frasnelli, & Joubert, 2012).

TA B L E 3 Performance of the predictive models
Braak et al indicated that AD pathology in the olfactory system happens during as "transentorhinal stage" and "limbic stage" and involves central olfactory regions such as entorhinal and piriform cortices more than the bulb (Braak et al., 1996). This is also possibly a reason why AD patients are more impaired in cognitively demanding tests of olfaction (such as identification) compared to sensory tests (such as threshold).
Many studies have examined the use of olfactory identification test as a predictor of the development of dementia (Roberts et al., 2016). Combining early markers such as MMSE, APOE genotype, and olfactory identification deficit have been shown strong prediction capability for dementia in long-term cohort studies, however, the prediction models were mainly based on logistic regression analysis, and the performance of the models was not validated using unseen data or cross-validation (Conti et al., 2013;Devanand et al., 2008Devanand et al., , 2015Liang et al., 2020;Stanciu et al., 2014). Sun et al. summarized the findings of two prospective longitudinal cohort studies and 30 cross-sectional studies, and concluded that although a positive association between poorer performance on olfactory and dementia was demonstrated, hyposmia had only moderately predictive value (Sun, Raji, MacEachern, & Burke, 2012).
There are several advantages of using BN. First, commonly used methods in epidemiological studies such as logistic regression and F I G U R E 4 Structure of the discrete Bayesian network (DBN) including the dependency of incident dementia on baseline age, MMSE, and cinnamon related methods do not take account of conditional dependence that may exist between the covariates. Conditional dependence between some of the risk factors may be already known or may be regarded as plausible on biological grounds (Karhausen, 1987;Susser, 1991).

However, such information could be incorporated into BN models
to reveal the potential relationships between the health or disease status and the associated risk factors (Li, Shi, & Satz, 2008). Second, high correlation among predictors has long been an annoyance in regression analysis. The crux of the problem is that the linear regression model assumes each predictor has an independent effect on the response that can be encapsulated in the predictor's regression coefficient. As opposed to creating problems of multicollinearity, the associations between candidate predictor variables are naturally accounted for when defining a BN's conditional probability distributions using. The HC algorithm used in the study may search a structure starting from either an empty, full, or possibly random DAG, or an initial DAG chosen according to existing knowledge. The main loop then consists of attempting every possible single-edge addition, removal, or reversal relative to the current candidate network. The change that increases the score the most then becomes the next candidate. The process iterates until a change in a single-edge no longer increases the score. By gradually taking into account of the relationships between the variables, the problem of multicollinearity therefore can be reduced in a BN analysis (Nguefack-Tsague, 2011).
Third, the DAG proposed by the BN captures the dependence structure of multiple variables and, used appropriately, allows more robust conclusions about the direction of causation. BN analysis revealed a richer structure of relationships than could be inferred using the traditional multivariable regression methods such as logistic regression and highlight potential pathway unseen previously for further investigation (Moffa et al., 2017).
However, there are also some limitations in our study. We collected data on potential confounders as many as possible to be used in the analysis model. But there are still uncollected confounders, such as occupation, leisure time activities, which could influence the cognitive function. APOE has been identified as a major genetic risk factor for AD. However, the APOE frequencies have a significant variation in populations with different ethnicities. The frequency of the APOE-ε4 allele in our Shanghai Aging Study is 9.3%, which is in the range of that in Asian populations (6.3%-9.3%), but lower than that in Caucasian and African American populations (11%-27%) (Ding et al., 2014). In our study, the APOE-ε4 allele did not link to any of the parameters due to the relatively small sample size and the low frequency of the  (Scutari & Denis, 2014). We also noticed that including too many dependencies of incident dementia on the potential baseline predictors only incorporates noise rather than information in prediction, which results in a very low performance in validation (accuracy = 0.212 and AUC = 0.559) and suggests the overfitting problem in the DBN. Although limited by the software packages currently available and adopting the compromising methods so far, we would like to explore the hybrid BN in the future and see whether it could improve the predictive ability further. In the SSST-12 test, the order of the identification items might also contribute to the effect. However, we could hardly find a reference that explained if the item order is randomized or not. Additionally, the result on each item might be affected by an unknown interaction between the odor and the response options, and this should be carefully considered in future studies.

| CON CLUS ION
The DBN incorporating age, MMSE, and one odor test may have predictive ability comparable to MLR analysis, while DBN may also reveal the dependency between the variables in static data and suggest potential causal relationship for further investigation. and National Project of Chronic Disease (2016YFC1306400).The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.

CO N FLI C T O F I NTE R E S T
The authors declare that there is no conflict of interest related to the paper.

E TH I C A L A PPROVA L
The study is an observational study and was approved by the Medical Ethical Committee of Huashan Hospital, Fudan University, Shanghai, China (approval number: 2009-195). All participants and/or their legal guardians gave their written informed consent for participation in the study. There is no personal identification disclosed in our data.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1002/brb3.1822.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data are not publicly available but may be available upon reasonable request and with permission of the principle investigator Dr.