STAT1 and its related molecules as potential biomarkers in Mycobacterium tuberculosis infection

Abstract Tuberculosis (TB) is a severe infectious disease that seriously endangers human health. The immune defence mechanism of the body against TB is still unclear. The purpose of this study was to find the key molecules involved in the immune defence response during TB infection, and provide reference for the treatment of TB and further understanding of the immune defence mechanism of the body. Data from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE83456 were downloaded from GEO data sets for analysis, and a total of 192 differentially expressed genes were screened out. Most of these genes are enriched in the interferon signalling pathway and are defence response–related. We also found that STAT1 plays an important role in the immune defence of TB infection and it is one of the key genes related to interferon signalling pathway. STAT1‐related molecules including hsa‐miR‐448, hsa‐miR‐223‐3p, SAMD8_hsa_circRNA 994 and TWF1_hsa_circRNA 9897 were therefore screened out. Furthermore, expression levels of hsa‐miR‐448 and hsa‐miR‐223‐3p were then verified by qRT‐PCR. Results showed that both hsa‐miR‐448 and hsa‐miR‐223‐3p were down‐regulated in plasma from patients with pulmonary TB. Taken together, our data indicate that an mRNA‐miRNA‐circRNA interaction chain may play an important role in the infection of MTB, and STAT1 and related molecules including hsa‐miR‐223‐3p, has‐miR‐448, SAMD8_hsa_circRNA994 and TWF1_hsa_circRNA9897 were identified as potential biomarkers in the development of active TB.

children died from TB (including children with HIV-related TB) in 2017.
Meanwhile, of the 5.58 million new rifampicin-resistant cases, 82% were MDR-TB. 3 The occurrence, development and outcome of TB are not only related to the toxicity and quantity of bacteria, but also closely related to the immune function of the body. MTB is an intracellular parasitic bacteria, and the body's anti-TB immunity is mainly cellular immunity. The synergistic effect of cytokines with various immune cell subtypes including CD4 + , CD8 + and NK cells plays a key role in the immune defence of tuberculosis, the most important of which are macrophages, effector CD4 + T lymphocytes and IFN-γ, which is secreted by Th1 cells and induces macrophage activation. 4 However, MTB can hinder oxidative stress, apoptosis and autophagy as well as inhibits the synthesis of histocompatibility complex molecules and thus affects antigen presentation. These mechanisms inhibit and resist the macrophages' natural immune killing and specific immune response, so as to help MTB escape from the body's immune killing. 5 Therefore, a full understanding of the immune response mechanism of MTB infection has important theoretical significance for the clinical diagnosis as well as the research of new TB vaccine and immunotherapy. 6 The immune response in the blood can reflect the local response of the lung to pathogens, so the changes in whole blood composition can be used as a sensitive indicator in TB infection. 7 Study has shown that transcriptional signature of active TB reflects symptom status in pulmonary TB, the researchers have compared the transcriptional signature of healthy people, patients with pulmonary TB, patients with extrapulmonary TB and patients with sarcoidosis, and the meta-signature shows that it can differentiate active TB from healthy controls but the results in distinguishing TB from extrapulmonary TB were less than ideal. 8 However, in this study, we used bioinformatic methods to compare and analyse the original genetic data of the whole blood of the patients with pulmonary TB and healthy people, hope to investigate potential miRNAs and circRNAs that may play crucial roles in TB, so as to reveal the pathogenesis of TB at the molecular level and excavate potential biomarkers of TB.

| Acquisition of RNA information
The clinical samples of TB infection were retrieved from the GEO database. A total of 106 blood samples were selected from GSE83456, including 45 pulmonary tuberculosis (PTB) samples and 61 healthy control (HC) samples. 8 Pulmonary TB patients met the operational diagnosis of TB infection. Patients who are suffering from immunosuppressed diseases such as HIV, diabetes or autoimmune diseases, medication or had comorbidities that affecting the pulmonary system were excluded. All RNA information of the selected samples was downloaded for further analysis. The sample's information and data used in this section were all downloaded from public database; therefore, no patient consent or ethics committee approval was necessary.

| Data process
The original expression matrix was normalized and processed by R.
The limma package was used to screen out differentially expressed genes. 9 The P-value of genes was calculated using t test method, and Benjamini and Hochberg's method was used to calculate the adjusted P-value. The differentially expressed genes were screened out by the following selection criteria: at least a 2.0-fold change between healthy controls and pulmonary TB patient samples and with adjusted P-value < .05.

| Enrichment analysis
Gene set enrichment analysis (GSEA) sequenced the genes according to the differential expression degree of the two samples and then detected whether the preset gene set was enriched at the top or bottom of the sequencing table. 10 The analysis tests the expression of genomic rather than individual genes and can therefore include more subtle changes in expression. All genetic information of PTB and HC samples was uploaded to GSEA for further analysis. Database for annotation, visualization and integrated discovery (DAVID) v6.8 was used to analyse the differential expression genes in PTB, including molecular function (MF), biological process (BP) and cell composition (CC). 11 In addition, differential gene pathway analysis was performed in Functional Enrichment analysis tool (Funrich). 12 Pathway analysis was conducted to find out which cell pathways might be involved in the changes in differentially expressed genes. Therefore, crucial pathway related to differentially expressed genes can be identified.
The pathway analysis was validated using IPA software.
Ingenuity Pathway Analysis (IPA) is a software which can show the activation level of biological pathways, and because the database is updated once a week, it has high credibility and accuracy, which is a great help for bioinformatic analysis. 13 Therefore, we uploaded the 192 differentially expressed genes to IPA to perform canonical pathway and molecule function analysis. The statistical cut-off of the enriched functions or pathways was determined by Fisher's exact test P-value, which assesses whether correlations between meaningful molecules and known processes come from random matches, and the z-score, which evaluate the directional effects of one molecule on another or the effects of multiple molecular changes in the data set on biological processes, thus assess the match of observed and predicted regulatory patterns with a prediction for the activation state. Both P-value < .05 and absolute value of z-score > 2 are considered significant.

| Gene cluster identification and protein-protein interaction (PPI) network analysis
The differential genes in PTB samples were uploaded to STRING to obtain the protein network interaction diagram. 14 The result of STRING analysis was imported into Cytoscape v.3.7.1, and cluster analysis of differential genes was conducted using Molecular Complex Detection (MCODE) plug-in. 15 The genes contained in the gene cluster with the highest scores were imported into the STRING to draw the protein interaction network and further analyse which biological processes this gene cluster was participated in. The screened gene cluster was then uploaded to Network Analyst for further verification. 16

| Prediction of pivotal miRNAs and construction of gene-miRNA interaction network analysis
Genes related to the crucial pathway were selected and performed with miRWalk 2.0 to predict its targeted miRNAs. 17 To verify the accuracy of the results, five databases including TargetScan, miRanda, miRDB, miRWalk and RNA22 were used to do intersection. The final result obtained from the intersection is further processed with Cytoscape v 3.7.1; therefore, miRNAs which targeted more than two genes are selected.

| Quantitative reverse transcription polymerase chain reaction (qRT-PCR)
Two candidate plasma biomarkers for TB disease were screened for further confirmation. For this, a total of 18 participants, including 11 TB patients and 7 healthy volunteers, were recruited from Weifang No. 2 People's Hospital and Weifang Medical University, and there was no significant difference in age and gender between them (age: 20-52). TB patients were diagnosed based on sputum smear or culture positive and clinical symptoms, who were excluded if he or she had the following diseases such as cancer, diabetes, HIV or HBV infection, or other lung diseases. Informed consent was obtained from all patients prior to beginning the study.
Plasma sample was collected from each participant before initial therapy. Subsequently, total RNA was extracted from per sample using TRIzol (Invitrogen), and then, its concentration and purity were assessed by K5800 Micro-spectrophotometer (Kaiao). The reverse transcription was conducted using PrimeScript™ 1st Strand cDNA Synthesis Kit (Takara) at 42℃ for 60 minutes and then at 95℃ for 5 minutes. Next, based on LightCycler ® 480 II real-time PCR system (Roche), PCR was performed with SYBR ® Premix Ex Taq™ Kit (Takara) at the temperature of 95℃ for 2 minutes, followed by 38 cycles with the temperature of 95℃ for 30 seconds, 53℃ for 30 seconds and 72℃ for 30 seconds. U6 was applied as internal controls.
The 2 −ΔΔCt method was utilized to determine the relative expression of each selected miRNA between case and controls. Sequences of primers used in the study are shown in Table 1.

| miRNAs-circRNA prediction
StarBase v2.0 tool was used to predict the upstream molecules cir-cRNAs of the selected miRNAs, and the obtained data were processed using Cytoscape software. 18 The intersection of predicted results of each miRNA was obtained by using the cross-linked graph to identify relevant circRNAs.

| Sample information processing and screening of differentially expressed genes
According to the sample information and data matrix, 192 differentially expressed genes were extracted from the PTB samples, among which 156 genes were up-regulated and 36 genes were down-regu-

| The differentially expressed genes in the PTB samples were mainly enriched in interferon (INF) signalling pathway and immune response
Gene set enrichment analysis, DAVID and Fun Rich software were used for enrichment analysis of the samples' genes. Firstly, all gene expression information in PTB and HC samples was uploaded to GSEA software, and the hallmark gene set database was used to analyse genes at the overall level of expression profile. The significantly enriched gene sets were set at a default cut-off as P-value < .05 and

| Construction of protein-protein interaction (PPI) network and further excavation of gene clusters involved in immune system-related biological pathway
In order to screen out the core genes from the differentially expressed genes in the PTB sample, 192 differentially expressed genes were uploaded to the STRING for further analysis, and 170 was then processed with Cytoscape as shown in Figure 7. MCODE was used to process the network data to identify gene clusters (Table 2), genes in the first gene cluster with the highest score were selected for BP enrichment analysis, and it was found that the genes in this gene cluster were mainly involved in defence response and immune system-related function ( Table 3). The 38 genes in gene cluster 1 were therefore analysed by STRING and Network Analyst, and results shown in STRING manifested that gene cluster 1 mainly participated in defence response to virus, interferon signalling, interferon-alpha/beta signalling, cytokine signalling in immune system and immune system-related pathway, all of which were with high statistical significance according to FDR values, and the correlated F I G U R E 3 A, Top 9 biological processes were selected and shown in bar chart according to enrichment score. B, Use ClueGO to analyse the interaction networks of enriched biological processes, and multiple colour dots indicate that it revolved in multiple biological processes genes were then marked in STRING as shown in Figure 8A. To further identify which cytokine also plays an important role in the defence of TB infection, we selected genes related to cytokine signalling in immune system in STRING and then verified the result with Network Analyst. Genes related to cytokine signalling in immune system were highlighted as shown in Figure 8B, these genes were intersected with the selected 26 genes in STRING, and a total of 23 genes including GBP2, ISG15 and STAT1 were obtained for further analysis.

| Further miRNA mining and interaction network analysis
Twenty-eight genes related to the cytokine signalling in immune system were screened out, and gene-miRNA analysis was performed with miRWalk 2.0 software. The intersection of miRNA results predicted by TargetScan, miRanda, miRDB, miRWalk and RNA22 databases was selected as the prediction result. The selection conditions were set as P < .05, the minimum seed sequence length was 7 mer, F I G U R E 4 The Funrich software drew a bar chart of 10 biological pathways based on the P-value and the percentage of genes, among which biological pathways with P-value < .05 are statistically significant. The results showed that the biological pathways with significantly enriched were immune system-related

F I G U R E 5
The canonical pathway analysis of IPA. Results show that there are total of 8 pathways were highly activated, especially the interferon signalling pathway which had the highest activation scores (P-value = 4.27E-21, z-score = 3.207). The remaining high activation pathways include the role of pattern recognition receptors in recognition of bacteria and viruses (P-value = 2.64E-09, z-score = 2.646), activation of IRF by cytosolic pattern recognition receptors (P-value = 6.3E-08, z-score = 2.121), oncostatin M signalling (P-value = 3.19E-04, z-score = 2.000), TREM1 signalling (P-value = 2.6E-03, z-score = 2.000), death receptor signalling (P-value = 5.21E-03, z-score = 2.000), neuroinflammation signalling pathway (P-value = 8.48E-03, z-score = 2.449) and T cell exhaustion signalling pathway (P-value = 4.61E-02, z-score = 2.000). The depth of the colours in the bar chart is based on the z-score, and generally, an absolute z-score greater than 2 is considered meaningful and the target gene binding region was 3′UTR. Cytoscape was used to draw the interaction network as shown in Figure 9. miRNAs with high number of gene cross-links (≥2) were selected (Table 4).

| Verification of potential biomarker expression by qRT-PCR
Nine miRNAs were verified, and it was found that miR-223-3p and miR-448 had high reliability, which all target STAT1. Then, the selected biomarkers including miR-223-3p and miR-448 were validated in TB plasma samples using qRT-PCR analysis. Consistent with the prediction, the results showed that the expression levels of miR-223-3p (P-value = .016) and miR-448 (P-value = .021) in plasma of TB patients were obviously lower than that of healthy controls ( Figure 10).

| D ISCUSS I ON
Tuberculosis is an infectious disease that seriously endangers human health. It is caused by MTB, which parasitizes in macrophages. 19 In this study, we screened out 192 differentially expressed genes including 156 up-regulated genes and 36 down-regulated genes in PTB patients. The genes including BATF2, AIM2, FCGR1B, HP, TLR5 and ANKRD22, and total of 136 genes with significant differences were consistent with the original study. 8 Then, databases including GSEA, DAVID, Funrich and IPA were used to do gene enrichment analysis, and results show that these genes are mainly involved in interferon signalling pathway and cytokine signalling in immune system. IPA activation z-score also shows that interferon signalling pathway has been highly activated. And STAT1 was found highly related to interferon signalling pathway as well as cytokine signalling . The biological function analysis shows the enrichment of differential genes in biological function classification, ranking from high to low according to the -log (P-value) value (ie ranking from small to large according to the P-value), and the heat map of biological function shows that the up-regulated expression of differential genes is related to the activation or inhibition of biological function. Orange means z-score > 0, blue means z-score < 0, and grey means no z-score; Z-score > 2 means that the function is significantly activated, and z-score < −2 means that the function is significantly  between the two pathways is unknown, but has potential research value. Research also shows that interferon (IFN)-γ release assays (IGRAs) are probably the most accurate tests for the detection of latent TB infection. 23 In order to validate our results, several bioinformatic analysis tools were used to do the enrichment analyses including gene set and modular enrichment analysis. Interestingly, we found that genes Meanwhile, we want to know which cytokine pathways also play important roles in the anti-tuberculosis process besides the interferon pathway; therefore, two databases were combined to screen out the genes with high confidence. Finally, 28 genes were selected to do next step analysis. Reactome pathway analysis shows that cytokine signalling in immune system consists of interferon signalling, signalling by interleukins, growth hormone receptor signalling and TNFR2 non-canonical NF-kB pathway. Further miRNA analysis shows that miRNA-223-3p and miRNA-448 with high reliability have a common target gene STAT1, which involved in both interferon-gamma

F I G U R E 8
A, STRING analysis shows the interaction between genes in cluster 1, molecules related to interferon signalling are coloured in red, molecules related to interferon-alpha/beta signalling are coloured in blue, molecules related to cytokine signalling in immune system are coloured in green, and molecules related to interferon-gamma signalling are coloured in yellow. B, Network analysis was used to validate the results, and molecules involved in cytokine signalling in immune system are noted in red signalling and IFN I signalling. Meanwhile, IPA core analysis also shows that in the top upstream regulators, STAT1 plays a leading role, and has been highly activated, with a P-value = 9.81E-61 and z-score = 6.392. In

F I G U R E 9
Interaction network between genes involved in cytokine signalling in immune system and its targeted miRNAs. Genes are coloured in blue, and node size is adjusted according to number of targeted miRNAs; miRNAs are coloured in red; miRNAs targeting more than two genes simultaneously are coloured in green  27 However, the expression level of miRNAs also infected by its upstream molecular circRNAs, which are non-coding RNA molecules that do not have a 5' cap end and a 3' tail and are covalently bonded to form a ring structure, and circRNAs act as miRNA sponge, which contains miRNA response elements and can bind to miRNA, preventing miRNA from inhibiting target genes, therefore up-regulating corresponding gene expression. 28 And due to its important role in cell information regulation, circRNAs have attracted much attention as new molecular markers in recent years. 29 In our study, we observed 9 miRNAs that target at least 2 genes which involved in cytokine signalling in immune system. However, can inhibit the apoptosis of macrophages and probably involved in the modulation of innate immune response due to its high enrichment of the induction of pathways in cancer together with miR-146a, miR-155, miR-423-3p and miR-21-5p. 34,35 Another research shows that expression levels of miR-223 from both peripheral blood (PBMC) and pleural fluids (PFMC) of patients decreased significantly, therefore may suppress the inflammatory response. 36 Our results shows that hsa-miR-223-3p has down-regulated and the inflammatory response pathway has been highly activated according to IPA core analysis and the molecule function analysis shows that in the inflammatory response module, although immune response and antiviral response parts are highly activated, the inflammation part is inhibited in an overall level.
However, a study has shown that hsa-miR-223 is up-regulated in patients with tuberculosis (no more than 2 weeks of chemoprophylaxis), while there is no significant change in patients with latent tuberculosis. 37 It is speculated that the differential expression level of miR-223 might be related to the differences in the infected TB strains. 38 The specific molecular mechanism remains to be further studied. Studies on the role of miR-448 in TB infection are scarce, and some reports show it is relevant to cell proliferation and migration. 39 In our study, the lower expression of miR-223-3p and miR-448 indicated that a mR-NA-miRNA-circ -RNA interaction chain may be present in patients with pulmonary TB; therefore, with STAT1 highly expressed, the circRNA would have potential value of further detection.
The potential value of circRNAs as a novel biomarker for disease diagnosis has been confirmed, and some of these cir-cRNAs were found up-regulated in active TB patients and are related to cytokine-cytokine receptor interaction and chemokine signalling pathway, and yet some circRNAs including hsa_cir-cRNA_091692, hsa_circRNA_102296, hsa_circRNA_029965 and hsa_circRNA_103571 were reported down-regulated in active TB patients. [40][41][42] As our study has confirmed that hsa-miR-223-3p and hsa-miR-448 are significantly down-regulated in patients with pulmonary TB, SAMD8_hsa_circRNA994 and TWF1_hsa_circRNA9897 have high potential value to be used as novel biomarkers. Hence, our results lay a foundation for further researches.
Based on our current study, we found that cytokine signalling in immune system especially interferon signalling pathway is extremely important through TB infectious disease. Therefore, we identified molecules that have significant relevance with interferon signalling, and STAT1 along with its related miRNAs, and circRNAs including miR-223-3p, miR-448, SAMD8_hsa_circRNA994 and TWF1_hsa_circRNA9897 were found as potential biomolecules in the host defence response to TB infection. Further qRT-PCR results show that hsa-miR-223-3p and hsa-miR-448 both down-regulated in patients with pulmonary TB compared with healthy controls. We believe that through the regulatory networks of these molecules, the interferon signalling pathway of macrophage can be affected.

ACK N OWLED G EM ENTS
The current study was funded by the Natural Science Foundation of Shandong Province of China (Nos. ZR2018MH001 and ZR2018ZC1054). The funding agencies had no role in the study design, data collection, analysis or preparation of the manuscript. We thank Heng Li for his technical support.

CO N FLI C T O F I NTE R E S T
All the authors declare that there are no conflicts of interest relevant to this article.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data sets used and analysed during the current study are available from the corresponding author on reasonable request.