Circulating RNAs as predictive markers for the progression of type 2 diabetes

Abstract Type 2 Diabetes Mellitus (T2DM) is the most prevalent form of diabetes in the USA, thus, the identification of biomarkers that could be used to predict the progression from prediabetes to T2DM would be greatly beneficial. Recently, circulating RNA including microRNAs (miRNAs) present in various body fluids have emerged as potential biomarkers for various health conditions, including T2DM. Whereas studies that examine the changes of miRNA spectra between healthy controls and T2DM individuals have been reported, the goal of this study is to conduct a baseline comparison of prediabetic individuals who either progress to T2DM, or remain prediabetic. Using an advanced small RNA sequencing library construction method that improves the detection of miRNA species, we identified 57 miRNAs that showed significant concentration differences between progressors (progress from prediabetes to T2DM) and non‐progressors. Among them, 26 have been previously reported to be associated with T2DM in either body fluids or tissue samples. Some of the miRNAs identified were also affected by obesity. Furthermore, we identified miRNA panels that are able to discriminate progressors from non‐progressors. These results suggest that upon further validation these miRNAs may be useful to predict the risk of conversion to T2DM from prediabetes.

Recently, circulating extracellular RNAs, especially microRNAs (miRNAs), present in various body fluids have emerged as potential biomarkers for various diseases and health conditions, including metabolic diseases such as diabetes and obesity. 9 While many miRNAs show promise as biomarkers to distinguish patients with T2DM from healthy individuals, meta-analysis examining specific miRNA concentration differences in plasma between T2DM and healthy individuals show inconsistencies among studies. 10 Many of these discrepancies can be attributed to sample preparation differences and technical issues inherent to various miRNA measurement methods. These factors make the development of circulating miRNA-based biomarkers for T2DM challenging. While qRT-PCR is very well suited for measuring and validating the concentration changes of specific miRNAs especially in clinical setting, small-RNA sequencing (sRNAseq) provides higher dynamic range than qRT-PCR and offers better discrimination between closely related family members, making it a better method for discovery-based miRNA studies. There are several commercial sRNAseq library construction kits available; however, they are prone to introducing significant bias, most likely occurring during adapter-RNA ligation steps. 11,12 Such biases result in an inaccurate miRNA profile and can cause difficulties when validating the findings.
Additionally, aligning and mapping miRNA can be a challenge due to their small size, and requires a specialized data analysis pipeline To address issues associated with sRNAseq, we adapted a modified small RNA sequencing library construction method that incorporates four degenerate bases at the appropriate ends of adapters to facilitate the RNA-adapter ligations. This reduces ligation associated bias and provides results that correlate better with qRT-PCR. [13][14][15] We have also developed a data analysis pipeline, sRNAnalyzer, 16 which allows for more accurate mapping of small RNA sequencing results. Using these improved methods, we analyzed the spectrum of small RNA from plasma obtained from prediabetic individuals who participated in the METSIM (METabolic Syndrome In Men) study. 17 From the baseline samples analyzed, half of the individuals progressed to T2DM, and the other half remained prediabetic, determined at 5-year follow-up. We identified several circulating extracellular miRNAs that are prognostic at baseline for the transition from prediabetes to T2DM. Some of these miRNAs were also affected by body mass index (BMI), with obese progressors showing greater concentration changes than progressors with a normal BMI. This is the first reported study of this type to look at predictive biomarkers of T2DM years before onset.

| Sample and study collection
Patient plasma samples were obtained from the METSIM (METabolic Syndrome In Men) study collection. METSIM was a populationbased study conducted in Finland from 2005 to 2010 to identify risk factors that would contribute to T2DM and cardiovascular disease in men. 17 Male subjects, aged between 45 and 70 years, were randomly selected from the population register of the town of Kuopio in eastern Finland. Participants had a 1-day outpatient visit to the Clinical Research Unit at the University of Kuopio, including an interview on the history of previous diseases and current drug treatment and an evaluation of glucose tolerance and cardiovascular risk factors.
Fasting blood samples were drawn after 12 hours of fasting followed by an OGTT. Glucose tolerance was evaluated based on OGTT as follows: NGT (fasting plasma glucose [FPG] <5.6 mmol/L), isolated IFG (FPG 5.6-6.9 mmol/L), and newly diagnosed type 2 diabetic subjects (FPG ≥7.0 mmol/L). The study was approved by the ethics committee of the University of Kuopio and Kuopio University Hospital, and it was in accordance with the Helsinki Declaration. The sample set represents 290 prediabetic individuals with isolated impaired fasting glucose (IsolIFG) that either progressed to T2DM (n = 145), or remained prediabetic (n = 145) after a five year follow up. Samples were controlled for age, body mass index (BMI), and FPG (Table 1,   Supplemental Table S1) between progressors and non-progressors.
The plasma samples were processed as follows: Blood was collected into EDTA-treated tubes, and spun at 1000 g at 4°C for 10 minutes to remove blood cells, and then 2000 g at 4°C for 15 minutes to remove platelets. Before RNA isolation, the plasma samples were spun at 10 000 g at 4°C for 10 minutes to remove any remaining cellular debris and platelets.

| miRNA isolation and library construction
Circulating RNA was isolated from 75 µL of frozen plasma using the miRNeasy kit (QAIGEN, Germantown, MD) according to the manufacturer's instructions. The RNA was eluted in nuclease-free H 2 0, and the quantity and quality were assessed using a Bioanalyzer (Agilent Technologies, Santa Clara, CA). To profile miRNA in plasma, we used a modified small-RNA library construction protocol. Briefly, the method utilizes adapters containing four degenerate nucleotides at proper ends to enhance the adapter-miRNA ligation and reduce ligation associated bias (3′ adapter sequence: /5rApp/(N:25252525) (N)(N)(N)TGGAATTCTCGGGTGCCAAGG/3ddC/; 5′ adapter sequence: rGrUrUrCrArGrArGrUrUrCrUrArCrArGrUrCrCrGrArCrGrA rUrCr(N:25252525)r(N)r(N)r(N)). 13 After adapter ligation and cDNA synthesis, the library was amplified for four cycles followed by an initial size-selection of library inserts in the range between 127 bp and 156 bp on a Pippin HT automated size-selection instrument (Sage Science, Beverly, MA). The purified fragments were then amplified for an additional 16 cycles, and then size selected again. This two-step size selection significantly reduces the adapter dimer in the library. Individual sRNAseq (small RNA sequencing) library concentrations were assessed by NEBNext Library Quant Kit for Illumina (New England Biolabs, Ipswich, MA), pooled (2 nmol/L final concentration) and then run on a NextSeq500 sequencer (Illumina, San Diego, CA).

| Data analysis
Sequence files were processed with an in-house small RNA analysis pipeline-sRNAnalyzer. 16 Briefly, the adapters were trimmed from the sequence reads, and low complexity (homo-polymer and simple repeat sequences), low quality and short reads (less than 15 nucleotides) were removed from the file. The processed reads were then searched against various sequence databases. For miRNA, the reads were mapped against miRBase (www.mirbase. org). Data analysis was based on mapping results with 0 mis-match allowed. The miRNA mapping data were normalized using read count per million of processed read and log2 transformed. Based on the results, several invariant miRNAs, including miR-21-5p, were identified. The miRTar database (mirtarbase.mbc.nctu.edu. tw/) was used to identify validated miRNA targets for gene enrichment analysis to identify biological processes that may be regulated by miRNA. In this approach we required that each miRNA target must be validated by at least two different techniques. The gene enrichment analysis was performed with DAVID (Database for Annotation, Visualization and Integrated Discovery, https:// david.ncifcrf.gov/).

| Novel miRNA analysis
After mapping against the miRNA database, the remaining unmapped reads from samples were combined and run through mir-deep2 18 to identify putative miRNAs. A novel miRNA database was then built and integrated into sRNanalyzer. Unmapped reads from individual samples were then run against the novel miRNA database to determine the number of miRNA candidates in each sample.

| qRT-PCR
Quantitative Reverse Transcription Polymerase Chain Reaction (qRT-PCR) validation of miRNAs was performed using TaqMan Advanced miRNA assays (Thermo Fisher, Waltham, MA). MiR-21-5p and miR-16-5p were used as a normalizers, since they were identified as an invariant miRNAs (low coefficient of variance across samples) based on the miRNA mapping results, and did not show concentration changes between progressors and non-progressors. Relative miRNA values are presented as 40 (maximum cycle number) -ΔΔCt values (Ct reference -Ct target ) between progressors and non-progressors (ΔCt progressors -ΔCt non-progressors . 14,19 Ct values greater than 35 were not considered as being expressed and were not used.

| Identification of predictive miRNA panel
To examine the discrimination power of miRNAs, a classification score was calculated for each sample based on selected miRNAs.
The geometric means of expression levels of the selected miRNAs were used as the classification score. The miRNA sets with the highest discrimination power were identified using greedy search algorithm. 20 In brief, the approach started with the miRNA with the most significant discrimination ability, and then the remaining miRNAs were added one at a time to compute the classification score. Using the score, we calculated the area under the curve (AUC) value from a receiver operating characteristic (ROC) curve. The set of miRNAs showing the highest value of AUC were chosen as the classification panel. To evaluate the classification/prediction performance, we tested the miRNA panels using either all study participants, obese participants, or participants with normal BMI.

| Statistical analysis
The Wilcoxon rank sum statistical test was used to determine the p-value for miRNAs showing concentration differences between TA B L E 1 Patient information and general mapping results

Category
All samples (mean) Non-progressor (mean) Progressor (mean) P-value progressors and non-progressors in both small-RNAseq and qRT-PCR. For multiple hypothesis correction, Benjamini-Hochberg was used to identify FDR-corrected P < 0.05, Multivariate analysis of covariance (MANCOVA) was used to identify miRNAs affected by significant differences in FPG between progressors and non-progressors.

| Sequencing results
From the 290 prediabetic individuals (Supplemental Table S1) that were selected from the METSIM study, 145 of them progressed to T2DM after five years (termed "progressors") and 145 remained at the prediabetic stage (termed "non-progressors"). The samples were matched with respect to BMI and age, but there was a significant difference in the fasting plasma glucose level between the two groups (Table 1) (Table 1). About 700 different miRNAs were observed (with at least one mapped read) in each sample, among them about 500 miRNAs had 10 or more mapped reads (Table 1).

| Novel miRNA and other types of extracellular RNA in circulation
In addition to the known miRNAs, we identified 13 novel miRNA candidates that showed significant concentration changes between progressors and non-progressors (P < 0.05), of which 3 also had a fold change >1.5 ( Figure 3A: Red circles, Supplementary Table S3) and 10 had a fold change <1.5, but>1.2 ( Figure 3A: Black circles, Supplementary Table S3). We also looked at the potential impact of BMI on the novel miRNA candidates and found that 9 were significantly elevated in the obese progressors vs obese non-progressors comparison (PO vs NPO) (Supplementary Table S3). Of the 3 novel miRNAs we originally identified that showed significant fold changes >1.5 between progressors and non-progressors, 2 were elevated in the PO vs NPO comparison ( Figure 3B, Supplementary Table S3.) Besides miRNAs, our mapping pipeline also provides reads mapped to other types of RNA (Table 1). We identified a snoRNA, U14A, that was elevated in progressors compared to non-progressors (log2 fold change = 0.61, P = 0.001493); and a piRNA, piRNA-426, that is elevated in PO compared to PN (log2 fold change = 1.08, P = 6.94088E-06) ( Figure 3C,D).

| Downstream pathways and biological processes associated with dysregulated miRNAs
Using validated miRNA targets, we explored the possible biological processes/pathways that might be affected by the dysregulated miRNAs ( Figure 5). A number of signal transduction related processes including the insulin signaling pathway, mTOR signaling pathway, TGF-β signaling pathway, and VEGF signaling pathway, were enriched in this analysis ( Figure 5A). In addition, there was enrichment for other processes including cell cycle, cell death, metabolism, and focal adhesion ( Figure 5A). Many of the direct gene targets in these pathways and processes represent a diverse set of family members, including ligands, receptors, intracellular mediators, and downstream effectors, with miRNAs targeting multiple components of the same pathway at different points ( Figure 5B, Table 3).

| D ISCUSS I ON
Currently there is no tool that can accurately predict the development of T2DM from prediabetes. By comparing the plasma circulating RNA profiles between two prediabetic groups-one that progresses to T2DM after 5 years and the other group remaining at the prediabetic stage-we identified 57 miRNAs showing significant concentration differences between the two groups at baseline (prediabetic stage). When limiting the comparison to obese progressors vs obese non-progressors most of these miRNAs showed even greater concentration differences. We validated some of those miRNAs by qRT-PCR, suggesting that the improvements in library construction and data analysis translate to better cross-platform data consistency, as we have demonstrated previously. 14,15 Our sequence analysis pipeline also provides novel miRNA candi- . Grey unfilled circles represent miRNAs with a P > 0.05, black circles represent miRNAs with a P < 0.05, and a Log2FC between ±0.28 and ±0.58 (Fold change between 1.2 and 1.5, respectively), and red circles represent miRNAs with a P < 0.05, and a log2FC>±0.58 (fold change greater than 1.5). E, qRT-PCR results for 9 selected miRNAs (normalized to the average of miR-16-5p and miR-21-5p) between obese BMI progressors vs obese BMI non-progressors (PO vs NPO). Values are represented as log2FC (Y-axis) and the miRNA identity is indicated on the X-axis. The qRT-PCR data (black bars) is derived from ΔΔCt values and the sRNAseq data (white bars) is derived from log2RPM values.Single asterisks indicate a P < 0.05, double asterisks indicate a P < 0.01 Grayed out values without an asterisks indicate an insignificant result that the study focused on global molecular comparisons, from microbiome to genome, between progressors and non-progressors, to identify predictive biomarkers for the manifestation of T2DM. Not surprisingly, the concentration changes of specific circulating RNA species that we observed at the prediabetic stage was not as high as what has been observed between T2DM and non-T2DM healthy individuals, where the difference in disease burden is more pronounced. Similarly, previous studies that have looked at prediabetes and T2DM have used miRNAs for diagnostic purpose and primarily done comparisons among NGT, prediabetes, and T2DM patient cohorts but not on the difference between progressors and non-progressors of prediabetic patients at baseline. [21][22][23] Of the 57 miRNAs showing concentration differences between the progressor and non-progressor groups, 26 of them (miR-10b-5p, miR-144-3p, miR-451a-5p, miR-192-5p, miR-125b-5p, miR-101-5p, miR-122-5p, miR-150-5p, miR-17-3p, miR-193a-5p, miR-193b-3p, miR-29c-5p, miR-320a-3p, miR-320b-3p, miR-550a-3p, miR-100-5p, miR-10a-5p, miR-181a-3p, miR-190a-5p, miR-136-5p, miR-375-3p, miR-487b-3p, miR-7-5p, miR-210-3p, miR-378a-3p, and miR-99a-5p) have been previously described as being associated with T2DM based on our recent meta-analysis of the field. 10 We selected 14 miRNAs for qRT-PCR validation and were able to confirm the concentration changes for 10 of these miRNAs (miR-122-5p, miR-127-5p, miR-192-5p, miR-210-3p, miR-3200-3p, miR-375-3p, miR-376b-3p, miR-4532-5p, and miR-660-3p, and miR-7641-3p). Some miRNAs could not be verified with qRT-PCR, probably due to the specific qPCR platform used and high-false negative rate associated with qRT-PCR as previously reported. [24][25][26] We investigated the impact of obesity on circulating miRNA and found a notable effect among progressors (Table 2). Conversely, in the non-progressors the impact of obesity was minimal (only one affected miRNA observed). As expected, obesity also affects miRNAs associated with prediabetic-to-T2DM transition (16 miRNAs showed concentration changes between obese progressors and non-progressors). Of the 16 obesity-associated miRNAs linked with the transition to T2DM, we followed up with nine selected miRNAs for qRT-PCR validation, and confirmed concentration changes of miR-122-5p, miR-127-5p, miR-136-3p, miR-192-5p, miR-210-3p, miR-4532-5p, miR-483-5p, and miR-7641-3p. As mentioned above, these miR-NAs have been previously implicated in T2DM. MiR-122-5p is a F I G U R E 3 Novel miRNAs, and other small RNAs associated with T2DM progression. A, Volcano plot of log2 fold change (log2FC) vs p-value between progressors and non-progressors. Grey unfilled circles represent miRNAs with a P > 0.05, black circles represent miRNAs with a P < 0.05, and a Log2FC between ±0.28 and ±0.58 (Fold change between 1.2 and 1.5, respectively), and red circles represent miRNAs with a P < 0.05, and a log2FC>±0.58 (fold change greater than 1.5). B, Log2FC values for the 2 novel miRNAs showing significant concentration changes between all progressors and non-progressors (white) and obese progressors and obese non-progressors (grey) (see Supplemental Table S2) been previously reported to be associated with metabolic syndrome, T2DM, insulin resistance, and obesity. [29][30][31] MiR-192 is a liver and pancreatic-enriched miRNA that has been shown to be associated with insulin resistance and is elevated in the plasma of prediabetic individuals, suggesting that elevated levels of miR-192-5p in circulation may signal progression towards T2DM. 31,32 MiR-210-3p has been previously shown to be involved in adipogenesis, is differentially expressed in visceral adipose tissues in obese T2DM individuals, and is associated with gestational diabetes in obese women. [33][34][35][36] Using validated miRNA targets, we explored the possible biological processes/pathways that might be affected by the dysregulated miRNAs. A number of signal transduction related processes that play important roles in T2DM pathophysiology, including the insulin signaling pathway, mTOR signaling pathway, TGF-β signaling pathway, and VEGF signaling pathway [37][38][39][40] were enriched in this analysis. In addition, these dysregulated miRNAs showed enrichment for other processes including cell cycle, cell death, metabolism, and focal adhesion, which are also involved in T2DM disease progression [41][42][43] . Interestingly, many of these pathways and processes showed stronger enrichment in the obese progressors vs non-progressors, compared to progressors vs non-progressors as a whole ( Figure 5A).
Furthermore, the direct gene targets represent a diverse set of family members associated with these pathways and processes (see Figure 5B, Table 3). For example in the cell cycle, miRNAs we find to F I G U R E 4 ROC analysis of miRNA biomarker panels. Receiver Operating Characteristic (ROC) analysis for predictive miRNA panels for all samples between progressors and non-progressors (A) and obese progressors and obese non-porgressors (B). Sensitivity and specificity are plotted against each other, and the average Area Under the Curve (AUC) is given for each patient group tested. All patients = black, obese patients = red, normal BMI patients = brown. C, qRT-PCR results for the five-miRNA panel (normalized to the average of miR-16-5p and miR-21-5p) between all progressors and non-progressors (black), obese BMI progressors and non-progressors (red), and normal BMI progressors and non-progressors (brown). D, qRT-PCR results for the eight-miRNA panel (normalized to the average of miR-16-5p and miR-21-5p) between all progressors and non-progressors (black), obese BMI progressors and non-progressors (red), and normal BMI progressors and non-progressors (brown). Values are represented as log2FC (ΔΔCt) on the Y-axis and the miRNA identity is indicated on the X-axis. Single asterisks indicate a P < 0.05, double asterisks indicate a P < 0.01. Grayed out values without an asterisks indicate an insignificant result. Green bars below values indicate agreement with sRNAseq (see Table 2 Figure 3B). We propose that the eight-miRNA panel will have better clinical utility when testing obese prediabetic patients. The value of using a "discovery-based" approach over a conventional candidate-based approach is that F I G U R E 5 Biological pathways and processes that might be affected by the dysregulated miRNAs. A, Enrichment of different biological processes and pathways regulated by miRNAs associated with T2DM progression in all patients (white bars) and obese patients (black bars). The miRTar database (mirtarbase.mbc.nctu.edu.tw/) was used to identify validated miRNA targets for gene enrichment (validated by at least two different techniques) and gene enrichment analysis was performed with DAVID (Database for Annotation, Visualization and Integrated Discovery, https://david.ncifcrf.gov/). B, Details of the gene targets (mRNAs) of some of the pathways and processes associated with T2DM progression. MiRNAs (diamonds) associated with T2DM progression are at the top. Different target mRNAs associated with different pathways and processes are shown (Pink-Insulin/PI3K, Purple-TGF-β, blue-cell death/apoptosis, green-cell cycle, red-metabolic process, orange-focal adhesion) with black lines depicting miRNA-mRNA interactions, and light grey lines depicting protein-protein interactions between pathway components as determined by STRING database (https://string-db.org/) TA B L E 3 Selected miRNAs targets from biological processes and pathways associated with T2DM progression new miRNAs not previously reported to be associated with a given disease or health condition may be identified. While many of the miRNAs we observed have been seen in other T2DM studies, the majority have no known association with T2DM or T2DM progression. We have also identified novel miRNAs not previously characterized, that are associated with T2DM progression and affected by obesity. With further validation, these miRNAs could be considered as promising predictive biomarkers for identifying prediabetic individuals that will progress to T2DM.

ACK N OWLED G M ENTS
We thank Inyoul Lee, Kathie Walters, and Mary Brunkow for feedback on the manuscript, and Minyoung Lee for technical advice.

CO N FLI C T O F I NTE R E S T S
The authors declare no conflicts of interests.