Combined analysis of whole‐exon sequencing and lncRNA sequencing in type 2 diabetes mellitus patients with obesity

Abstract This study sought to find more exon mutation sites and lncRNA candidates associated with type 2 diabetes mellitus (T2DM) patients with obesity (O‐T2DM). We used O‐T2DM patients and healthy individuals to detect mutations in their peripheral blood by whole‐exon sequencing. And changes in lncRNA expression caused by mutation sites were studied at the RNA level. Then, we performed GO analysis and KEGG pathway analysis. We found a total of 277 377 mutation sites between O‐T2DM and healthy individuals. Then, we performed a DNA‐RNA joint analysis. Based on the screening of harmful sites, 30 mutant genes shared in O‐T2DM patients were screened. At the RNA level, mutations of 106 differentially expressed genes were displayed. Finally, a consensus mutation site and differential expression consensus gene screening were performed. In the current study, the results revealed significant differences in exon sites in peripheral blood between O‐T2DM and healthy individuals, which may play an important role in the pathogenesis of O‐T2DM by affecting the expression of the corresponding lncRNA. This study provides clues to the molecular mechanisms of metabolic disorders in O‐T2DM patients at the DNA and RNA levels, as well as biomarkers of the risk of these disorders.

capture enrichment, high-throughput sequencing and bioinformatic data analysis. 3 In recent years, there have been a number of studies using whole-genome exon sequencing to screen type 2 diabetes mellitus (T2DM) susceptibility genes. Albrechtsen 4 sequenced the WES of 2000 Danish populations in three stages and found that the microtubule-actin crosslinking factor 1 (MACF1) 2290 amino acid was replaced by methionine to proline (M2290V) will increase the risk of T2DM.
T2DM is a chronic endocrine metabolic disease characterized by disorders of carbohydrate, fat and protein metabolism, which pathogenesis is closely related to environmental and genetic factors. 5,6 With the development of modern social economy and lifestyle changes, the incidence of diabetes is getting higher and higher.
T2DM occurs mostly between 35 and 40 years old, accounting for more than 90% of diabetic patients and affecting more than 400 million people worldwide. 7 Obesity type 2 diabetes mellitus (O-T2DM) usually refers to T2DM with a body mass index (BMI) that meets the criteria for overweight or obesity. Obesity is the main independent risk factor for T2DM, accounting for 80%-90% of the causes of diabetes. 8 Obesity enhances insulin resistance and causes hyperinsulinemia in patients with T2DM. Therefore, the treatment of O-T2DM is relatively difficult. Individual risk of O-T2DM is strongly influenced by genetic factors. 9,10 As a complex metabolic disease, O-T2DM has a distinct family history, so it is of great significance to explore its pathogenesis from the perspective of genetics.
Therefore, this study analysed the specific DNA mutation sites between O-T2DM patients and healthy people, and integrated DNA and RNA analyses, combined with functional enrichment and metabolic pathway analysis to explore its pathogenesis, in order to provide references for the diagnosis and treatment of T2DM patients and reveal the biological basis of O-T2DM. After the subjects were enrolled, venous blood was collected from both groups on a fasting day for subsequent experiments.

| DNA extraction and sequencing
DNA samples were evaluated by agarose gel electrophoresis and Qubit analysis. This study used Agilent's liquid-phase chip capture system to efficiently enrich human all exon region DNA and then perform high-throughput sequencing on the Illumina platform. The library and capture experiments were performed using the Agilent SureSelect Human All Exon V6 kit, following the instructions and the latest optimized protocol.

| Bioinformatics analysis
After obtaining the sequenced reads, the bioinformatics analysis was carried out in the presence of GRCh37 or hg19. It generally includes the following parts: sequencing data quality assessment and mutation detection.

| Screening of mutation site
The single nucleotide polymorphisms (SNP)/InDel (insertion and deletion) detected by the basic analysis was subjected to mutation site screening. Firstly, filter the 1000 human genome (1000G)

Bullet points
What is already known about this subject? Please remember to also include this between the title page and structure abstract in your paper.

| Mutation site harmful classification
According to the standards and guidelines for sequence variation proposed by the American Society of Medical Genetics and Genomics (ACMG), 11 the mutations are classified into five types: pathogenic, likely pathogenic, uncertain significance, like benign and benign.

| Copy number variations (CNV) analysis
Similar to single nucleotide variants (SNVs), many CNVs are normal polymorphisms in the biological genome, and this benign CNV does not cause pathological changes in the organism. However, some malignant CNVs have also been found to be associated with diseases such as nervous system disorders and cancer. In order to filter out benign CNV from the CNV results detected by the software, we use the DGV and CNVD databases to classify the detection results.

| DNA and RNA conjoint analyses
In order to screen out the true diabetes-related mutations from the massive variation test results, we need to further analyse and screen the mutation detection results. The most significant GO entries and pathways involved in the mutated gene were determined by significant enrichment analysis. In addition, we used a precise algorithm, combined with sequencing results and a variety of databases to screen and sort candidate genes to construct a correlation map between gene-DHS -diabetes. Finally, the online software GeneMania was used to perform protein functional interaction network analysis of candidate diabetes-related mutant genes, including protein-protein and protein-DNA-genetic interactions.

| Statistical analysis
The statistical differences were analysed using the SPSS (version 20.0, IBM SPSS Statistics) by independent-samples t test. All data were shown as the means ± SEM. P values < .05 were regarded as statistically significant.

| Participants information description
Our study enrolled 6 O-T2DM patients and 6 healthy crowds. All  (Table 1).

| Whole-exome sequencing data summary
Twelve samples of fasting whole blood were collected: WSR001-WSR006 for DM-DHSS patients and WZC001-WZC006 for healthy crowd. WES showed that a total of 158.36 Gb raw data were obtained. In these raw data, the average error rate was 0.1% and the Q30 content was more than 87.91% (Table S1). The valid WES data were aligned to the reference genome (GRCh37/hg19) by BWA and sort the results using SAMtools comparison. Then, use Picard mark duplicate reads. Finally, in the current study of 12 samples, the mapped content and the fraction of target covered with at least 10X content were more than 99.54% and more than 99.2%, respectively.
The average sequencing depth of the target region in the 12 samples of our study was 138.35 (Table S2).

| Variation detection results
Based on the mapped results, we used SAMtools to identify single nucleotide variant (SNV) sites and filter SNV sites. A total of 277 377 SNV sites were found in exonic. Among these SNV sites, 100 SNV sites belong to the stoploss type, which means that the substitution codon of the base becomes a non-stop codon due to substitution by one base. Subsequently, we used ANNOVAR software to annotate the SNP, which covers the location information, type  (Tables S3 and S4).

| Screening of mutation sites and classification of their harmfulness
In addition, we performed mutation site screening on the SNP/InDel information detected by the basic analysis and finally obtained 5607 mutation sites. Based on the priority level of the disease, here we list the top ten mutation sites ( Table 2). We refer to ACMG's evidence to classify the harmfulness of the mutation sites. The number of mutation sites for each of the harmful categories obtained from the bioinformatics analysis is shown in Table 3. Finally, we performed a structural variation hazard analysis and a total of 160 sites were found (Appendix S1).

| Mutant gene screening shared between samples
On the basis of filtering the harmful parts, the common mutation sites between the two groups were screened according to the principle that 10% of the patients shared and 90% of the control groups did not share. A total of 454 mutation sites were screened in O-T2DM patients compared to healthy controls. As shown in Table 4, we list the top ten consensus mutant genes.

| Bioinformatics analysis of mutation sites
Genes perform their biological functions by co-ordinating each other, especially for the complex disease of T2DM, which may be a phenotypic difference caused by mutation of multiple genes.
Therefore, we identified the most important metabolic pathways and signalling pathways involved in mutant genes through significant enrichment analysis. We performed GO enrichment analysis on shared mutant genes from three categories: biological process

| Gene-disease phenotype correlation analysis
In this study, in order to determine the correlation between candidate genes and diseases, we compared the sequencing results  Figure 3). In addition, we used Phenolyzer software to rank candidate genes. The higher the ranking, the more likely it is associated with O-T2DM. Here, we list the top 20 significantly related genes ( Figure 4).

| Protein function interaction analysis
We used online software GeneMania 12 to perform protein functional interaction network analysis of candidate genes, including protein-protein, protein-DNA-genetic interactions, pathways, reactions, gene-protein expression data, protein domains-phenotypic screening profiles. Then, use Cytoscape software to construct the co-expression network ( Figure 5). As shown in Figure 5, a total of 21 genes and 54 co-expressed proteins associated with them are included.

| Mutant gene expression levels shared between samples
Among all the consensus mutation genes in the O-T2DM group, NOP9 has mutations in all six patients with O-T2DM. This was followed by PCDH11Y, which showed mutations in peripheral blood samples from five patients (Table 5). Subsequently, we analysed the

| Screening of common mutant genes and differentially expressed consensus genes in O-T2DM patients
If a gene is judged to be a patient-associated mutant gene at the genome level, and its expression at the RNA level is also significantly different from that of a normal population, the gene may be a functionally important gene. In the current study, we screened 3 genes in O-T2DM patients for common mutations at the DNA level and differentially expressed genes at the RNA level. They are MAP7, NOD2 and ZNF429, respectively (Table 6, Figure 7).

| Pathway analysis of mutant genes and differentially expressed genes in O-T2DM patients
Based on the above analysis, the key genes selected, and all the genes at the genomic level, and the genes differentially expressed at the transcriptional level, were screened for common pathways for enrichment analysis. As shown in Table 7, a total of three pathways are associated with these differentially expressed genes and lncRNAs.

| D ISCUSS I ON
O-T2DM is a polygenic genetic disease. To date, more than one hundred O-T2DM susceptibility genes have been obtained from GWAS in the O-T2DM population. It is currently believed that most of the functional variants are hidden in exons 13 and are caused by lowfrequency and rare mutations. 14 However, GWAS is not sensitive to low-frequency mutations and rare mutations, which may lead to partial information missing. Whole-genome exon sequencing has a high sensitivity to the discovery of disease-related low-frequency and rare mutations. 15 The low-frequency mutation means that the minor allele frequency (MIF) is equal to or greater than 0.5% and less than 5%, and the rare mutation means that the allele mutation frequency is less than 0.5%. Since whole-genome exon sequencing can effectively identify genetic susceptibility genes and mutation sites of complex diseases, it is widely used in the molecular mechanism research and molecular diagnosis of human diseases.
In this study, we performed a full-exome sequencing of 12 samples (6 normal subjects and 6 obese diabetic patients) using WES technology. And compared with the existing database (dbSNP   In the study of disease, it is important to determine the association of candidate genes with disease. Combining sequencing results with various databases, we filter and sort candidate genes to construct a correlation map between gene-phenotype-O-T2DM. By comparing with the database, a total of 22 472 genes were found to correlate with the pathogenesis of T2DM in our sequencing results. In addition, we use the Phenolyzer software to rank the candidate genes. The results showed that the top three genes in the association were IRS1 (insulin receptor substrate 1), TCF7L2 (transcription factor 7 analogue 2) and PPARA (peroxisome proliferator-activated receptor-α gene).
As an important mediator of insulin binding to its receptors and exerting biological effects, IRS1 plays an important role in the control of blood glucose homeostasis. 19 A recent study suggests that the cause of insulin deficiency in obese patients may be due to a weakened IRS1 signal. 20 A study of African-Americans found that IRS1 mutations and endocrine disorders caused by obesity synergistically reduce insulin sensitivity, suggesting that IRS1 variability and obesity together become an important predictor of insulin resistance. 21 In this study, we found that IRS1 produced a non-frameshift insertion mutation in the exonic region of O-T2DM patients and that IRS1 mRNA expression levels were up-regulated in O-T2DM patients compared with healthy controls (Log2fold-change = 0.11).
Therefore, based on the close relationship between IRS1 gene and obesity and lipid metabolism, we believe that IRS1 may be a promising target for clinical prediction of O-T2DM.
Transcription factor 7-like 2 (TCF7L2) gene is closely related to T2DM and obesity. [22][23][24] Patients with a single nucleotide polymorphism (SNP) TT or TC genotype in the TCF7L2 gene are 2 to 1.4 times more likely to have T2DM than CC homozygous patients. 25 Studies have shown that TCF7L2 can play a significant role in regulating adipose tissue and pancreas via the WNT signalling pathway. 26 A recent study found that overweight and TCF7L2 were also significantly associated with T2DM. 27 In our current study, we found missense In the current study, we screened 3 genes (NOD2, MAP7 and ZNF429), in O-T2DM patients for common mutations at the DNA level and differentially expressed genes at the RNA level.
Nucleotide-binding oligomerization domain 2 (NOD2) is a gene with a caspase activation and recruitment domain. 31 Previous studies have shown that NOD2 mediates activation of the NF-kBT transcriptional regulator family in response to different peptidoglycan fragments 32 and that NOD2 can contribute to host defence by promoting the production of pro-inflammatory cytokines and antimicrobial molecules. 33 Deletion of the NOD2 gene abolished the resistance of BALA/C mice to HFD-induced obesity, 34 and the same phenomenon was observed in C57BL/6 mice. 35 In line with previous studies, in this study, we found that the NOD2 gene showed Our study links genomic mutation information to transcriptome expression regulation and provides a molecular target for clarifying the mechanism of O-T2DM development.

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.