Identification of Smad3‐related transcriptomes in type‐2 diabetic nephropathy by whole transcriptome RNA sequencing

Abstract Smad3 deficiency prevents the development of type 2 diabetic nephropathy; however, the underlying molecular mechanisms remain unknown. In this study, we aimed to identify Smad3‐related genes involved in the pathogenesis of diabetic kidney disease. High‐throughput RNA sequencing was performed to profile the whole transcriptome in the diabetic kidney of Smad3 WT‐db/db, Smad3 KO‐db/db, Smad3+/− db/db and their littermate control db/m mice at 20 weeks. The gene ontology, pathways and alternative splicing of differentially expressed protein‐coding genes and long non‐coding RNAs related to Smad3 in diabetic kidney were analysed. Compared to Smad3 WT‐db/db mice, Smad3 KO‐db/db mice exhibited an alteration of genes associated with RNA splicing and metabolism, whereas heterozygosity deletion of Smad3 (Smad3+/− db/db mice) significantly altered genes related to cell division and cell cycle. Notably, three protein‐coding genes (Upk1b, Psca and Gdf15) and two lncRNAs (NONMMUG023520.2 and NONMMUG032975.2) were identified to be Smad3‐dependent and to be associated with the development of diabetic nephropathy. By using whole transcriptome RNA sequencing, we identified novel Smad3 transcripts related to the development of diabetic nephropathy. Thus, targeting these transcripts may represent a novel and effective therapy for diabetic nephropathy.


| INTRODUC TI ON
The incidence of type 2 diabetes (T2D) has increased rapidly over recent decades and become a major public health crisis worldwide. Multiple factors, including obesity, ageing, diet, environment, genetic and epigenetic modifications, contribute to the development of diabetes. The increasing prevalence of T2D has led to the development of diabetic kidney disease (DKD), 1 the most common cause of end-stage renal disease. However, the precise mechanism of DKD in T2D remains unclear.
There are several mouse models for studying T2D and DKD, such as mouse homozygous for the obese (Lep ob ) and the diabetes (Lepr db ) mutations. Studies have revealed that in BKS-db/db (BKS.Cg-+ Lepr db /+ Lepr db /J) mice, renal pathological changes are altered in the age of 12 weeks, which is accompanied with a markedly increase in high blood glucose and the development of urinary microalbumin excretion, indicating that kidney injury. 2 Recent genetic and epigenetic studies using genome-wide association study (GWAS) and epigenome-wide association study have found that a number of genes are altered in DKD, including genomic DNA polymorphisms of collagen type IV alpha 3 chain (COL4A3, rs55703767), 3 TGF-β1( rs1800470), TNF-α(rs1800629), IL-6 (rs1800797) and IL-1β (rs16944). [4][5][6] However, functions of these genes associated with DKD remain unclear and need to be confirmed by further functional studies in experimental models of DKD.
TGF-β1 has been implicated as an important factor that involves in various types of chronic kidney disease (CKD), including DKD. 7 It regulates many aspects of cell function, including cell proliferation, accumulation of extra cellular matrix and apoptosis of renal cells in early diabetic kidneys. TGF-β1 can be induced by many mediators of DKD, such as high glucose concentration, reactive oxygen species, advanced glycation end products, activated protein kinase C, renin-angiotensin II-aldosterone system components and endothelin, [8][9][10] and are associated with the development of obesity and diabetes. 11,12 Blockade of TGF-β by anti-TGF-β neutralization antibody can protect mice from diet-induced obesity and diabetes. 11 However, in a recent randomized, double-blind, phase 2 clinical trial, the use of humanized TGF-β1 monoclonal antibody (TGF-β1 mAb) fails to show its therapeutic effect on patients with DKD. 13 Findings from this clinical trial suggest the complexity of TGF-β1 in DKD and general blockade of TGF-β signalling may not be a good therapeutic approach for DKD due to the diverse role of TGF-β in renal inflammation and fibrosis. 14 Therefore, further studies to identify the downstream mediators of TGF-β signalling that are directly and specifically regulate DKD are urgently needed. Our previous studies have found that Smad3 serves as a key mediator of TGF-β signalling in several CKD models, including DKD. 14,15 By deleting Smad3 from db/db mice, we found that Smad3 knockout (KO)-db/ db mice were protected from DKD, with significantly reduced levels of albumin to creatinine, serum creatinine and histological injury in kidney, including mesangial matrix expansion and thickness of glomerular basement membrane over 12-32 weeks. 16 Interestingly, compared to Smad3 KO-db/db, Smad3 +/− -db/db mice show no protection effect on DKD, 16 revealing a necessary role for Smad3 in DKD. However, mechanisms through which Smad3 regulates DKD remain unknown, which was investigated in this study by performing a whole transcriptome RNA sequencing. A complete set of mRNA and non-coding RNA (ncRNA) transcripts in Smad3 WT-db/db, Smad3 KO-db/db, heterozygous Smad3 +/− -db/db and their littermate controls Smad3 WT-db/m and Smad3 KO-db/m were compared.

| Animal models
Smad3 −/− (deletion of exon 8 and disruption of exon 7) on C57BL/6J (H-2b) background (both sexes, aged 6-8 weeks, 20-25 g) were kindly provided by Dr Chunxia Deng. 17 In briefly, C57BL/6J Smad3 +/− mice were intercrossed with C57BLKs/J Lepr +/− (db/m) mice on a background to produce Smad3 +/− db/m heterozygous mice as previously described. 16 Double-heterozygous Smad3 +/− db/m male and female were intercrossed to generate double mutation of Smad3 and Lepr (Smad3 KO-db/db) and other genotype mice (Smad3 WT-db/m; Smad3 KOdb/m; Smad3 WT-db/db; Smad3 +/− db/db). All animal husbandry and animal experiments were approved by the Animal Ethics Experimental Committee at the Chinese University of Hong Kong and confirmed to be in accordance with local regulations. Mice kidneys were collected at 20 weeks of age from mice with the five different genotypes.

| RNA sequencing
Total RNA of kidney tissue was extracted from mice with five different genotypes (Smad3 WT-db/m; Smad3 WT-db/db; Smad3 KOdb/m; Smad3 KO-db/db; and Smad3 +/− db/db) at 20 weeks (n = 3-5 for each genotype) with TRIzol reagent (Invitrogen) in accordance with the manufacturer's instructions. Two microgram of total RNA was used as starting material for ribosomal RNA (rRNA) depletion.

| Analysis of RNA-Seq data
Using software from Illumina (bcl2fastq), sequencing reads were assigned into individual samples with each sample having an average throughput of 5.0 Gb (Table S2) and a total throughput of 99.9 Gb. In terms of sequence quality, an average of 94% of the bases achieved a quality score of Q30 where Q30 denotes the accuracy of a base call to be 99.9%.
In order to obtain high-quality clean sequencing data, Fastp (version 0.19.4) was used to filter low-quality reads, cut adapters and quality control raw FASTQ files to obtain clean reads. 18  The abundance of the RNAs was quantified and normalized using Salmon (version 1.11.2). 25 The Pearson correlation coefficients of transcriptome expression level were calculated based on normalized reads count. Tximport (version 1.12.3) 26 was applied to import quantification of transcript expression into R. Then, differentially expressed genes were determined and visualized by edgeR and ggplot2, respectively. The threshold to determinate differential expression was FDR value <0.05 and |log 2 (fold change) |> 1(down-regulated or up-regulated over twofold).

| Pathway and ontology analysis of differentially expressed genes
The profile of GO enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis performed by R package ClusterProfiler 27 was used to predict potential functional roles for significant differentially expressed protein-coding genes and lncR-NAs; the cut-off threshold was defined as P-adjust < .05.
The biological function of the uncharacterized lncRNAs can be predicted based on the functionality of the nearby protein-coding genes (lncRNA cis-genes). We searched for the protein-coding genes closest to all the identified lncRNAs by calculating the distance between the two chromosomal position by using the closest function in bedtools software. 22 GO and KEGG enrichments with P-adjust < .05 were considered significantly enriched by using R package ClusterProfiler. 27

| Detection and quantification of alternative splicing variants
The tool rMATS was used to analyse the alternative splicing (AS) respectively. Cytoscape (v 3.7.1) was used to draw the co-expression network map of lncRNA and protein-coding gene.

| Quantitative real-time PCR (qPCR)
RNA from the kidney tissues of mice (n = 5-7 for each group, respectively) was reverse transcribed to cDNA by using PrimeScript RT Reagent Kit with gDNA Eraser (Takara). Gene expression was quantified by real-time PCR with TB green Kit (Takara).
Quantitative real-time PCR was carried out on ABI 7900 system using the following program: 95°C for 1 minute; 40 cycles of 95°C for 15 seconds, 58°C for 15 seconds, and 72°C for 30 seconds and dissociation steps. Primers for all the tested DEGs and β-actin, an endogenous control for normalization, are listed in Table S3.
The relative expression of target genes to β-actin was calculated by using 2 − ΔΔC t method. 29 Data were shown as mean ± SEM. Statistical analysis was performed by using one-way ANOVA followed by Newman-Keuls multiple comparison test from GraphPad Prism 5.0 (Graph Pad Software).

| Overview analysis of transcriptomic RNA-seq
The earliest clinical sign of kidney damage from diabetes is the presence of albuminuria. According to the guidelines of National Kidney Foundation-Kidney Disease Outcomes Quality Initiative (NKF KDOQI), DKD was divided into microalbuminuria (albumin/creatinine ratio, ACR 30-300 mg/g) and macroalbuminuria (ACR > 300 mg/g) based on the level of albuminuria. As shown in our previous study, kidney injury in the db/db was evidenced by the development of urinary microalbumin (ACR), which was increased gradually from 12 to 20 weeks, peaked over 20-32 weeks. 16 A whole transcriptomic RNA-seq analysis of kidney tissue from five different groups (Smad3 WT-db/m; Smad3 WT-db/db; Smad3 KO-db/m; Smad3 KO-db/db; and Smad3 +/− db/db) at 20 weeks was performed.
The transcripts obtained by RNA-Seq were annotated with Gencode vM18 and NONCODEv5. The TPM value was used to quantify the assembled transcriptome, including differentially expressed protein-coding and lncRNA genes ( Table S4). The length of most transcripts was concentrated within 1000 nt ( Figure 1A). The transcripts with length over 200 nt and more than two exon without coding potential which calculated by CPAT and PLEK were selected as lncRNAs for further analysis. 30,31 These lncRNAs were classified into five categories according to their genomic loci: antisense, sense-overlapping, intergenic, intronic and other. The distribution proportion of lncRNAs is schematically presented in a pie chart ( Figure 1B). The majority of the identified lncRNAs were intergenic (64.91%). Column chart in Figure 1C showed these protein-coding and lncRNAs were widely distributed in all the mouse chromosomes. The total number of novel lncRNA was 1073, which was found based on GffCompare, and the detailed information of them (including genomic location, type and closest protein-coding gene) was listed in Table S5. Pearson correlation analysis showed that all the transcripts (including protein-coding RNA and lncRNA) between these 20 samples were highly correlated (r = .90-.97, P < .01; Figure S1).

| Biological function analysis of altered proteincoding genes in db mouse kidney with or without Smad3 knockout
Compared to control mice, further analysis revealed that many protein-coding genes were altered in the diabetic kidney of Smad3 WT or Smad3 KO db/db mice at 20 weeks. The details of these proteincoding genes (including Gene ID, Gene name, logFC, P-value, FDR) in

| Biological function analysis of LncRNA in db kidney with or without Smad3 knockout
High-throughput sequencing revealed that the majority of the human or mouse transcriptome can be referred as ncRNA. LncRNAs are a class of ncRNAs mainly uncharacterized and have unknown biological functions. Compared to control, many lncRNAs were also altered in Smad3 WT-db/db or Smad3 KO-db/db mouse models.
The number of significantly changed lncRNAs was summarized in Table 1, and the details of these lncRNAs (including Gene ID, Fold Change, TPM, P-value, FDR and closest protein-coding gene) were listed in Table S10.
Although the total number of ncRNAs identified by RNA-seq was larger than protein-coding gene, significantly changed ln-cRNAs was less than protein-coding genes in each comparison.
The top five most up-or down-regulated lncRNAs and their closest protein-coding genes in each comparison were list in Table 2.
Many researchers observed that lncRNA expression is often correlated with the expression of nearby genes. 46 As shown in Table 2, compared to Smad3 WT-db/m, Lpin2, Tspan2, Cdc73, Zer1 and Abcg2 were the top five up-regulated while Nus1, Sptssa, Zbtb44, Exoc3l2 and Pxdn were the most decreased lncRNA-closest genes in Smad3 WT-db/db.
Then, we also predicted the potential functional role of differentially expressed lncRNAs by analysing their closest genes in GO and KEGG database. The number of significant GO terms of differentially expressed lncRNAs was summarized in Table S11. The altered ln-cRNAs in each comparison were enriched in certain biological functions, respectively (Table S12). As shown in Figure 3A, in comparison 1, lncRNA-closest genes in Smad3 WT-db/db were enriched in GO_ BP of endothelial cell migration and GO_MF of "SMAD binding". In Figure 3B, the most significant GO_BP of "positive regulation of neuron differentiation" was found in differentially expressed lncRNAs in Smad3 KO-db/db compared to Smad3 KO-db/m. Compared to Smad3 WT-db/db, up-regulated lncRNAs in Smad3 KO-db/db were clustered in GO_BP of small GTPase mediated signal transduction ( Figure 3C).
KEGG pathway analysis of lncRNA-closest gene was performed to predict the potential functional role of altered lncRNAs (Table S13). As shown in Figure 3E, lncRNA-closest protein-coding genes were enriched in the pathways of "Hippo signaling pathway", "Axon guidance" and "Terpenoid backbone biosynthesis" in Smad3 WT-db/db when compared to Smad3 WT-db/m. In Smad3 KO-db/db, these genes were significantly enriched in the pathways of "Transcriptional misregulation in cancer", "Cell adhesion molecules" and "Type II diabetes mellitus" when compared to Smad3 KO-db/m ( Figure 3F), whereas "Transcriptional misregulation in cancer", "Rap1 signaling pathway", "Endocytosis", "MAPK signaling pathway" and "AMPK signaling pathway" were the most significant pathways compared to Smad3 WT-db/db ( Figure 3G). In Smad3 +/− db/db, due to the small number of differential expressed lncRNAs, only one lncRNA-closest gene was significantly clustered in several pathways ( Figure 3H).

| Smad3 related DEGs in db/db mouse kidneys
To identify Smad3-regulated genes in db/db models, DEGs with differential expression direction in comparisons 1 and 2 were considered as genes related to diabetic kidney injury and regulated by Smad3. As shown in Table 3 increased in the renal of DKD. 50 However, more experimental evident was needed to prove whether these DEGs were regulated by Smad3 in the diabetic kidney.

| Alternative splicing of mRNA
It has been reported that more than 95% of pre-mRNAs are alternatively spliced in mammals, which allows the generation of multiple splice isoforms from a single pre-mRNA. AS process increases the coding capacity of a given gene, which can have distinct structures or functions. 51 From the GO and KEGG results, we found that Smad3 homozygous deficiency in db/db model altered a number of proteincoding genes related to RNA splicing. We speculated that Smad3 may affect the process of RNA splicing and then count the significant al-

| The co-expression network of lncRNAs and protein-coding genes
To explore the putative functions of lncRNAs in DKD, the coexpression network was constructed based on the correlation analysis between the differentially expressed protein-coding genes and lncRNAs. The Pearson's correlation coefficients were calculated for significantly correlated pairs of protein-coding genes and lncRNAs with pcc > 0.99 or pcc > 0.9 and P-value < .05 as indicated in each comparison in Table S14. Correlated pairs of protein-coding genes and lncRNAs were chosen to build the network. Most genes interacted with few partners, whereas a small but significant proportion of them, the "hubs", interacted with many partners.
The core domain contains four lncRNAs co-expressed with genes appear to be involved in lipid disorders and metabolic syndrome caused by obesity and type-2 diabetes (Table S14)

| Real-time quantitative PCR validation of DEGs
The expression levels of seven candidate protein-coding genes and five lncRNAs were validated by quantitative PCR (qPCR). Total RNA of kidney tissue from Smad3 WT/KO-dm and db/db mice at 20 weeks was used for quantitative analysis. Specific primers were listed in Table S3. As shown in Figure 4F-L, seven protein-coding genes (Upk1b, Eda2r, Crybg2, Slitrk6, Il11ra2, Psca andGdf15) and five lncRNAs (Lnc-Ptpn1, Lnc-Trabd2b, Lnc-NUS1, Lnc-Sestd1 and Lnc-G3bp2) were selected for qPCR validation. As shown in Table S15, although change fold was different between qPCR and RNA-seq, the expression patterns were consistent between these two techniques.

| D ISCUSS I ON
Although the most important risk factor for DKD is hyperglycaemia, additional risk factors beyond hyperglycaemia are also involved in the process of renal injury. These include a range of metabolic factors, including excess fatty acids, carbonyl and oxidative stress. 54  Previous findings showed that hyperglycaemia accompany with elevated plasma concentrations of inflammatory molecules, such as cytokines, chemokines, adhesion molecules, growth factors and fatty acid, was found in diabetic patients, and being strong predictors of the development of diabetes. 57 The critical metabolic changes in diabetes mean that genes might be compensated dysregulated in highly metabolic organs, such as the kidney.
Results from this study confirmed that the largest numbers of DEGs which have the biological function of metabolism were altered during the development of DKD, including those genes that participate in hormone, steroid and fatty acid metabolism (shown in Figure 2 and Table S8). The change of functional genes mostly clustered in the pathway of cytokine-cytokine receptor interaction and cellular signalling such as MAPK and PPAR signalling cascade in the diabetic kidney (shown in Figure 2 and Table S9), which are functionally connected kinases that regulate the key cellular process involved in cell survival, death, differentiation, proliferation and metabolism. 58,59 The most interesting finding of this study was Smad3 homozygous KO from db/db mice affected genes mainly involved in RNA splicing and metabolism, but heterozygosity deletion significantly altered genes related to cell division and cell cycle. Regulation of RNA splicing is a critical step of gene expression. Smad3 KO significantly increased the events of skipped exon, which directly impacted on translation of genetic information into protein.
The kidney is one of the principal organs for drug and xenobiotic metabolism. The most important drug metabolism enzyme family is cytochrome P-450, a superfamily of membrane-bound isoenzymes that catalyses the oxidation of many drugs. Ten genes of cytochrome P-450 superfamily (Cyp2d40, Cyp27b1, Cyp2c69, Cyp2j7, Cyp2a5, Cyp2j9, Cyp2d22, Cyp51, Cyp4a12a and Cyp7b1) were altered in Smad3 KO diabetic kidney. This implies that the protective role of Smad3 homozygous KO in DKD may relate to transcripts participating in RNA splicing and xenobiotic metabolism. The analysis of RNA-seq data were consistent with our experimental results found in animal models.
To explore core genes and predict the function of unknown ln-cRNAs in the process of DKD, the co-expression analysis of protein-coding and lncRNAs was performed. It may help us to reveal a lot of exciting and uncovered important genes in diabetic related kidney injury. Even more importantly, it may provide a novel view on their integrative functions. It is reported that the mortality rate of knocking out the gene encoding the hub protein is three times higher than that of non-hub in yeast. 60 Our data also have some shortcomings, because the co-expression network analysis of protein-coding and lncRNAs is based on the protein interaction network database, there may be some false-positives for prediction of non-coding lncRNAs, which need further experimental proven.
In this study, we profiled the whole transcriptome including mRNA and ncRNA transcripts in Smad3 WT and Smad3 KO diabetic kidney by RNA-seq. The results from our transcriptome analysis may provide novel insights into gene expression and regulatory network in diabetic kidney, which may also enable us to a better identification of the appropriate therapy targets. Because the RNA-Seq and qPCR were performed with whole kidney samples, our results were unable to identify the regulatory role of Smad3 in cell-type-dependent DEGs expression during the development of DKD. Further studies need to be performed to clarify the spatial and temporal expression patterns of these DEGs within both normal and diseased kidneys by updated technology such as single cell-RNA-seq. Furthermore, multiple "omics", such as the genome, proteome, epigenome, metabolomics pharmacogenomics and microbiome technologies, can also be applied to define relevant biomarkers for DKD. Personalizing treatment based on the multi-omics approaches will be helpful in developing appropriate pharmaceutical intervention programs that can guide clinical individualized treatment.

CO N FLI C T O F I NTE R E S T
No potential conflicts of interest relevant to this article were reported. Hui-yao Lan https://orcid.org/0000-0003-4283-9755