High‐fat diet and oral infection induced type 2 diabetes and obesity development under different genetic backgrounds

Abstract Background Type 2 diabetes (T2D) is an adult‐onset and obese form of diabetes caused by an interplay between genetic, epigenetic, and environmental components. Here, we have assessed a cohort of 11 genetically different collaborative cross (CC) mouse lines comprised of both sexes for T2D and obesity developments in response to oral infection and high‐fat diet (HFD) challenges. Methods Mice were fed with either the HFD or the standard chow diet (control group) for 12 weeks starting at the age of 8 weeks. At week 5 of the experiment, half of the mice of each diet group were infected with Porphyromonas gingivalis and Fusobacterium nucleatum bacteria strains. Throughout the 12‐week experimental period, body weight (BW) was recorded biweekly, and intraperitoneal glucose tolerance tests were performed at weeks 6 and 12 of the experiment to evaluate the glucose tolerance status of mice. Results Statistical analysis has shown the significance of phenotypic variations between the CC lines, which have different genetic backgrounds and sex effects in different experimental groups. The heritability of the studied phenotypes was estimated and ranged between 0.45 and 0.85. We applied machine learning methods to make an early call for T2D and its prognosis. The results showed that classification with random forest could reach the highest accuracy classification (ACC = 0.91) when all the attributes were used. Conclusion Using sex, diet, infection status, initial BW, and area under the curve (AUC) at week 6, we could classify the final phenotypes/outcomes at the end stage of the experiment (at 12 weeks).


| INTRODUC TI ON
Mortality due to type 2 diabetes (T2D) is higher in developing countries due to delayed diagnosis and treatment, as T2D is a "silent" disease that develops asymptomatically over the years. 1 Also with continuous development in living standards, diabetes is increasingly common in the daily lives of people. Therefore, the question of how rapid and accurate diagnosis as well as analysis of diabetes is performed is a topic worth studying. In medicine diabetes is diagnosed based on fasting blood glucose (FBG), glucose tolerance, and random blood glucose levels. [2][3][4][5][6][7] Obesity, together with age, is the major risk factor for T2D development. Although obesity is not yet a wholly established risk factor for autoimmunity, glucose imbalance and the development of insulin resistance due to an abnormal accumulation of adipose tissue in obese patients correspond with increased events of autoimmune diseases. 8 Based on the World Health Organization (WHO), the global prevalence of diabetes among adults above age 18 years increased from 4.7% in 1980 to 8.5% in 2014. 9,10 Furthermore, the growing events of diabetes in middle-and low-income countries were reported to be a major cause of blindness, kidney failure, heart attacks, stroke, and lower-extremity amputation. 10 WHO projects that diabetes will be the seventh leading cause of death in 2030. 9,10 T2D is much more likely to develop among people with metabolic syndrome (METS), which is a collection of risk factors that increase the chance of developing as nonalcoholic fatty liver disease (NAFLD), cardiovascular diseases, stroke, dyslipidemia, kidney failure, and ultimately death from multiple complications. 11,12 The reports show that overall obesity by body weight (BW), body length (for body mass index calculation), and central obesity can be a strong predictor for T2D development. [13][14][15] Unfortunately, it is estimated that up to 80% of patients with diabetes will eventually develop METS-associated diseases. 12 Furthermore, numerous studies have shown that the prevalence and severity of diabetes-related complications, including diabetic neuropathy, retinopathy, proteinuria, and cardiovascular complications, are connected. 16,17 Several studies have reported that patients with diabetes are more prone to developing oral bacterial infections. They are well known to have an impaired defense mechanism and therefore considered to be immunocompromised causing inflammation and influencing T2D development. 18 The microbiota probably plays an important role in the development of these conditions. 19 Various studies from our laboratory and collaborators have proven the suitability of the novel and genetically highly diverse mouse genetic reference population, known as collaborative cross (CC), as an appropriate murine model for studying the genetics of complex trait diseases, including diet-induced T2D. [20][21][22][23] These studies have shown that genetic background plays a central role in the pathogenesis of T2D, subsequently suggesting genetic factors that may underline this variation in T2D development. [22][23][24] In the current study, we have assessed the response of the different CC lines for developing T2D and BW gain due to double challenges by high-fat diet (HFD) and oral infection.
The prior the diagnosis, the much easier it is to control it. Therefore, at this step, machine learning (ML) can help make a preliminary judgment about T2D based on daily physical examination and can serve as a reference for doctors. [25][26][27] Recently, numerous algorithms have been developed to predict diabetes. This includes traditional ML methods 27 like support vector machine, decision tree (DT), and logistic regression to deal with large data sets. 28 In this study, we used a DT and random forest (RF) for prediction.
The reason for using ML methods for prediction studies is to make the call before it develops the disease.
Based on these findings, we expanded our current study design by focusing on T2D and obesity developments. It was observed on maintaining different CC lines on HFD challenge and co-challenge with two oral bacteria under HFD as well as separately.

| Ethical statement
The Institutional Animal Care and Use Committee (No. 01-19-013) of Tel Aviv University (TAU) approved all the experimental procedures. These were in line with the Israeli guidelines that follow the National Institutes of Health of USA animal care and use protocols.

| CC lines and dietary challenge
The study cohort comprised 471 mice (222 females and 249 males; details of mice are presented in Table 1) produced from the 11 different CC lines provided by the small animal facility at TAU (details of the breeding colony are available at Iraqi et al. 29

| Study design
The experimental period spanned 12 weeks with two environmental challenges of HFD and oral infection with mixed-oral bacteria. At the initial time point (8-week-old mice), BW was measured using an electronic balance. The mice were divided into two dietary groups, in which HFD (42% fat) was provided for the experimental group and CHD (11% fat) for the control group. At week 5 of the experiment (age 13 weeks), mice from both dietary conditions were further divided into two groups for the infection challenge, where the experimental groups were orally infected with mixed-oral bacteria by gavage and control groups were placebo-infected without bacteria as a control group. At week 12 of the experiment, glucose tolerance ability was assessed using the intraperitoneal glucose tolerance test (IPGTT). After overnight recovery, mice were weighed and killed.

| IP glucose tolerance test and area under the curve calculation
This test was performed to detect disturbances in glucose metabolism that can be related to diabetes or prediabetic conditions elaborated in our previous publications. 23

| Oral infection challenge
The mice were treated with antibiotics to standardize the oral microbiota status before the infection. The use of sulfamethoxazole (10 mL/500 mL) in water administration for 10 days, followed by 3 days of recovery (antibiotic-free). Then the infection challenge was started with oral infection of 400 μL of the mixed-oral bacteria (Pg and Fn) per mouse. The infection procedure was repeated every other day thrice for 5 days of week 5. In parallel, the treatment of placebo infection to the control groups with 400 μL of 2% Carboxymethyl cellulose (CMC) in distilled water and 1% PBS (CMC-PBS, 2:1) was given.

| Data analysis
The IBM SPSS (Statistical Package for the Social Sciences) software platform (version 24) was used for data analysis. The variation between the CC lines and the significance (p < 0.05) was assessed using one-way analysis of variance (ANOVA).

| Heritability and genetic coefficient of variation analysis in the CC lines
The heritability estimates were obtained from unpublished data of a wide variety of traits presently being studied at TAU on the  37,38 In this study, we used Scikit-Learn's default implementation, with 100 trees in a forest.

| Regression models
We used two regression models, namely linear regression and KNN regression, as they produced some meaningful results for our data.

| Model validation
To evaluate the ability of the model, usually two validation methods, namely hold-out method and k-fold cross-validation method, are used. [39][40][41][42] Based on the goal of each problem and the size of data, we can choose a method of choice to solve the problem. 43 In this study, we used K (4)-fold cross-validation.

| RE SULTS
Based on ANOVA, a significant sex effect of BW was found among the 11 different CC lines. Males were found to be heavier and bigger than females.

| Sex effects within the CC line vary between different lines at different time points
The sex effects for BW and glucose tolerance ability, at

| Variations in glucose tolerance ability in response to dietary and infection challenges in different CC lines at different time points
The  showed a deterioration in glucose tolerance ability also in the noninfection condition, which has a positive ΔAUC 6-12 value.

| Variations in FBG in response to dietary and infection challenges in different CC lines at different time points
Our results have shown variations in FBG levels for females at week 6 ( Figure S5A Focusing on diet effect under the infection condition, significant (p < 0.05) differences were observed among the different groups, which were translated by a higher FBG in the HFD with infection than the CHD with infection.

| Pearson's correlation between BW and glucose tolerance
Variations were observed between different CC lines and within the same line when comparing diets and infection conditions.
For females in the noninfected group ( Figure 5A), overall, a weak . Based on the color key, the correlation coefficient −1 ≤ r ≤ 1 is significant at **p < 0.01 and *p < 0.05.
CC lines in terms of the correlation between the traits, yet the most notable was observed in IL557 and IL3912, which showed opposite correlation on CHD in the infected group.
For males in the infected group ( Figure 6D), overall, a positive correlation was observed on both dietary challenges. Two CC lines, IL5001 and IL5003, showed a significant (p < 0.01) positive correlation between the two traits. Three CC lines, namely IL557, IL1912, and IL4141, showed a highly negative correlation between the traits than they showed with different directions of correlation in the noninfected group within the same diet ( Figure 6C).
For males in the infected group on HFD ( Figure 6D)  traits are generally in the range of 0.45-0.85, whereas the CV G value has been observed to be in the range of 0.08-0.40, much higher than the benchmark of 0.071. Thus, the data show that an absolute magnitude of genetic variation among the CC lines observed is higher than that found within a typical outcrossing population.

| Classification and regression results
For better comparison, first, we used all features for predicting different aspects of diabetes, and the results are presented in Tables 3   and 4. Tables 3 and 4 show that different classification algorithms perform differently for the same line. In Table 3 Tables 3 and 4. In the data set, this method, which used RF as the classifier, performed the best. Tables 3 and 4 show that RF is able to predict better diabetes severity. Figure 7 shows that regression may be suitable for some CC lines but not for all the lines under study, which clearly demonstrates the accuracy to make a better comparison. Figure 7 shows an example of a successful regression, as the sss values are greater than 0.7. We also observed a difference in the performance of the two algorithms, as neighbors regression produced an average R 2 of 0.837, whereas linear regression produced an average R 2 of 0.736.

| DISCUSS ION
Obesity is a major public health problem and a risk factor for several chronic diseases, including T2D. 44  Diabetes mellitus is a chronic condition that develops over time and is preceded by the prediabetic state. How to exactly predict and diagnose this disease using ML is worth studying and is one of our objectives. The right diagnosis will lead us to introduce prevention strategies, and it must be noted that accurate prediction needs more indexes. The best result for the data set is above 0.80 in most CC lines, indicating ML can be used for prediction of diabetes, but finding suitable attributes, classifiers, and data mining methods is very important. Based on our results, we chose the method using all features, but still RF had the best result among the four classifiers, as also observed by Zou et al. 60 Therefore, our observations provide valuable insight into the potential application of the AUC as a predictive measure for T2D and highlight the need. These ML methods have also been recently applied by Ben-Assuli et al 61 for faster diagnosis and treatment of NAFLD. Our results provide a significant resource for further studies to determine the causal relationship and the progression of T2D; therefore, the prospect of using personalized medicine is a promise. The results presented in this paper is the first step toward applying personalized/precision medicine approach based on early prediction and prevention approaches.
To our knowledge, this is the first study to examine the effect of the host genetic background on the development of a single disease or a combination of two or three diseases, including obesity and diabetes, using the CC mouse model. Finally, it is believed that assessing more CC lines and increasing the number of mice within each line as well as using both the multitrait and the multilocus analytical methods developed specifically for this genetically highly diverse reference population. This will enable future studies to map quantitative trait loci with unprecedented precision, allowing the direct identification of potential candidate genes and their multilocus epistatic interactions associated with susceptibility to obesity and T2D development.

AUTH O R CO NTR I B UTI O N S
Iqbal M. Lone was involved in data analysis and writing the manuscript; Nadav Ben Nun was involved in statistical analysis; Aya Ghnaim was involved in data collection and analysis; Arne S. Schaefer was involved in project design and fund support; Yael Houri-Haddad was involved in project design, execution, and fund support; Fuad A. Iraqi was involved in project design, data analysis, fund support, team supervision, and drafting and reviewing and approving the final version of the manuscript.