Risk stratification by long non‐coding RNAs profiling in COVID‐19 patients

Abstract Coronavirus disease 2019 (COVID‐19), caused by severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), has become a global pandemic worldwide. Long non‐coding RNAs (lncRNAs) are a subclass of endogenous, non‐protein‐coding RNA, which lacks an open reading frame and is more than 200 nucleotides in length. However, the functions for lncRNAs in COVID‐19 have not been unravelled. The present study aimed at identifying the related lncRNAs based on RNA sequencing of peripheral blood mononuclear cells from patients with SARS‐CoV‐2 infection as well as health individuals. Overall, 17 severe, 12 non‐severe patients and 10 healthy controls were enrolled in this study. Firstly, we reported some altered lncRNAs between severe, non‐severe COVID‐19 patients and healthy controls. Next, we developed a 7‐lncRNA panel with a good differential ability between severe and non‐severe COVID‐19 patients using least absolute shrinkage and selection operator regression. Finally, we observed that COVID‐19 is a heterogeneous disease among which severe COVID‐19 patients have two subtypes with similar risk score and immune score based on lncRNA panel using iCluster algorithm. As the roles of lncRNAs in COVID‐19 have not yet been fully identified and understood, our analysis should provide valuable resource and information for the future studies.


| INTRODUC TI ON
Previous transcriptomic studies about COVID-19 have focussed on the analysis of protein-coding transcripts, using mRNA expression profiling to characterize the patterns and potential functional roles of their translated proteins. 4 However, studies revealed that only less than 3% of the human genome is believed to be coding regions. 5,6 The rest is called junk DNA among which lncRNAs are a group of non-coding RNAs with more than 200 nucleotides in length. The development of next generation sequencing technology has greatly shed light on the function of lncRNAs in the biological process of different disease. Besides cancer, studies have highlighted the role of lncRNAs in other disease, like immune disease. 7,8 Although some studies investigated the altered lncRNAs between COVID-19 patients with healthy control, no related studies have investigated the lncRNA expression pattern based on the severity of COVID-19 patients so far. 9,10 The functions for lncRNAs in COVID -19 have not been unravelled.
The present study aimed at identifying the target lncRNAs based on peripheral blood mononuclear cells (PBMCs) from patients with SARS-CoV-2 infection as well as health individuals.
Particular attention was paid to differential regulations of ln-cRNAS between severe and non-severe symptom groups in terms of lncRNA expression pattern. Our findings from the screening of lncRNA biomarkers are expected to improve the understanding of COVID-19 subtypes and may facilitate the further exploration of diagnosis, prognosis and even therapeutic strategies for this devastating disease.

| Patients and specimen collection
We collected blood from 29 patients enrolled in local hospital from March to April 2020 after written informed consent from patients. Eligibility criteria for moderate or severe patients were based on the 7th guideline. Briefly, moderate patients were with fever and respiratory symptoms. Radiologic assessments found signs of pneumonia. Severe type were patients meet any of the following criteria: (1) Shortness of breath, RR>=30 times/min; (2) Oxygen saturation <=93% at rest; (3) Alveolar oxygen partial pressure/fraction of inspiration O2 (PaO2/FiO2) <=300 mm Hg. (4) CT chest imaging shows that lung damage develops significantly within 24-48 h. Critically severe type was patients meet any of the following criteria: (1) Respiratory failure requiring mechanical ventilation; (2) Signs of septic shock; (3) Multiple organ failure requiring ICU admission. For controls, blood was collected from 10 healthy adult donors after written informed consent. All donors were consented for genetic research.
The use of human samples in this study has been ethically approved by the hospital ethics committee. Written informed consent was obtained from patients and healthy individuals before sample and data collection. The study was conducted to the principles of the Declaration of Helsinki. Lymphocyte subtyping and serum inflammatory indictor determined assay was performed at medical laboratory of hospital.

| Temporal RNA transcript isolation, sequencing and processing
Human PBMCs were isolated by centrifugation. Peripheral blood was layered and centrifuged at 950 g for 30 min. After isolation on a

| Data pre-processing and screening of differentially expressed lncRNAs
In order to identify the biological significance of each probe, the comprehensive gene annotation files were obtained from GENCODE in GTF format. The GENCODE annotation was the default gene annotation displayed in the Ensembl Genome Browser.
Finally, 2511 expressed lncRNAs were annotated. The Affymetrix probe level data were obtained by reading the CEL files using the ReadAffy function of the Affy R package, and then, the raw data were pre-processed (background correction, normalization and summary expression computation). Empirical Analysis of Digital Gene Expression Data in R (edger) package (http://bioco nduct or.org/packa ges/relea se/bioc/html/edgeR.html) was applied to explore the differentially expressed lncRNAs (DELncRNAs) between three groups using the criteria of P-value <0.05 and |log2 (fold change) | >= 1.

| Hierarchical cluster analysis and Principal Component Analysis
To investigate the relationships among the severity of patients, hierarchical cluster analysis and PCA were performed for all patients.
Total 39 patients, including 12 non-severe patients, 17 severe patients and 10 normal patients, were embedded in this step. Patients with similarly expressed lncRNAs tended to close up in hierarchical cluster analysis and PCA. We carried out the hierarchical cluster analysis using flashclust function in WGCNA R package and PCA using factoextra R packages. 11

| Construction of the COVID-19 risk score
To further investigate the prognosis of COVID-19 patients, we developed a prognosis risk score model using LASSO (Least Absolute Shrinkage and Selection Operator) regression. We selected nonsevere patients and severe patients. All lncRNAs are included. We divided patients into training set (50% patients) and test set 1 (50% patients) randomly. LASSO regression was carried out on training set and test sets ten thousand times. To assess the performance of  patients. First, we carried out WGCNA (weighted gene correlation network analysis) algorithm for all lncRNAs of non-severe and severe patients using default parameters. We used dynamic tree cutting method to identify the lncRNAs co-expression modules.

| Integrative clustering using iCluster algorithm
The co-expression modules were assigned to different colours for visualization. We calculated the correlation of co-expression modules and seriousness. We selected lncRNAs co-expression modules which were significantly correlated with seriousness. Next, we implemented iCluster algorithm for the lncRNAs modules we selected.
We chose the k where the curve of percent explained variation levels off. The number of the clusters is k + 1.

| Assessment of patients' immune status
To figure out whether immune statuses of different iCluster patients are different, we assessed the immune status of patients using GSVA R package. 12 First, we screened the immune indices that were significant different between non-severe and severe COVID-19 patients using Wilcoxon's test. Immune indices which p-value less than 0.01 were selected. Next, we screened gene signatures of these immune indices using Boruta R package. 13 Finally, we used GSVA algorithm to assess the immune score based on the gene signatures.

| Model selection
The optimal combination of clusters was determined minimizing a Bayesian Information Criterion (BIC). An 'elbow' point was noted at K = 3, beyond which the BIC kept increasing, and thus, the 3-class solution was chosen. Figure S3C shows that the results were highly comparable for individual unsupervised clustering versus integrative clustering, indicating that the iCluster groupings represented the combined information of all platforms and lacked bias to a particular data type. To compare the resultant iCluster groupings to the molecular subclasses developed by Hoshida, 14 we assigned each of our patients to one of the three Hoshida subclasses using their transcriptional predictors. We found strong concordance between the iClusters and the Hoshida subclasses.

| Statistical analysis
R language software was used to test for differences in means between specific severe and non-severe groups. The null hypothesis of no differences in means was tested using a two-tailed t test with a P-value < 0.05 deemed as significant.

| Patients information
All COVID-19 patients were randomly recruited, and among those, 12 patients had mild or moderate symptoms, accounting for 45.2%, whereas 17 patients had severe or critically severe symptoms, accounting for 54.8%. The mean age of severe patients is 74 (ranging from 52 to 91) and 69.14 (ranging from 58 to 82) of non-severe patients. In this study, a majority of patients (74.4%, 29/39) were male and most patients (87.2%, 34/39) were over 60 years, consistently with previous literature reports. 15,16 For the immune status, we observed several immune cells showed drastic alteration whereas disease progressed from non-

| Altered lncRNAs between severe vs. nonsevere patients
To profile the peripheral lncRNA signature response to COVID-19, we performed a transcriptional analysis of lncRNAs using RNA-Seq.
A spectrum of 2511 functionally active lncRNAs was identified by utilizing stringent criteria (RPKM in at least 10% of samples). The comparison between the severe and non-severe COVID-19 patients was demonstrated in a volcano plot (Figure 2). Overall, we found 687 lncRNAs using the criteria of P-value < 0.05 and |log2(fold change)|>=1 (Table S1). A number of most significantly regulated lncRNA signatures were listed in Table 1

| Development of weighted co-expression network and identification of key modules
Selection of the soft-thresholding power is an important step when constructing a WGCNA. We performed the analysis of network topology for thresholding powers from 1 to 20 and identified the relatively balanced scale independence and mean connectivity of the WGCNA. As shown in Figure 4A Figure 4D). We identified the turquoise module and brown module as the modules most relevant to the disease severity.

| lncRNA subtypes categorize COVID-19 patients based on iCluster algorithm
The optimal combination of subtypes was determined via minimizing a Bayesian Information Criterion (BIC). An 'elbow' point was noted at K = 3, beyond which the BIC kept increasing and thus the 4-class solution was chosen ( Figure S4). In total, four subtypes were identified ( Figure 5A). The outcome of iClusters algorithm showed that COVID-19 is a heterogeneous disease with multi-subtypes. The signature of each iCluster was shown in Table S2 (Table 2). Based on the risk score, we observed a significant difference between severe and non-severe COVID-19 patients (P = 7e-6) ( Figure 5B) low immune score when compared to iCluster 1 and iCluster 4, consistent with the severe clinical condition ( Figure 5D).

| D ISCUSS I ON
Long non-coding RNA (lncRNA) is a subclass of endogenous, nonprotein-coding RNA, which lacks an open reading frame and is more than 200 nucleotides in length. As the lncRNAs expression pattern remains unclear in COVID-19 patients, our study is the first analysis to provide detailed lncRNAs information as molecular biomarkers.
Here, we used RNA sequencing to characterize the lncRNA expression pattern in peripheral blood from 17 severe patients and 12 nonsevere patients and 10 healthy controls. Overall, we observed that many lncRNAs were significantly altered as we compared between the three groups: severe, non-severe COVID-19 patients and healthy control. For example, we observed lncRNA GATA5 was significantly elevated in severe condition. In a paper published by Gennadi,  indicates more severe condition. It is our surprising finding that in severe patients, we observe two subtypes (iCluster2 and iCluster3) with similar immune scores and risk scores. It indicates that severe patients can be divided into two subtypes by lncRNA expression pattern. It is of great value for us to further analyse the differed ln-cRNAs in the prediction of severity of COVID-19.
One disadvantage of this study is that the data limit our ability to further analyse the function of these lncRNAs. We plan to compare lncRNA expression-based subtypes with mRNA expression-based subtypes in further study. Moreover, we hope more studies can focus on the risk prediction or prognosis ability of lncRNAs with a larger number of COVID-19 patients in the future.
In summary, we have presented a lncRNA atlas of the peripheral immune response to COVID-19. These data highlight immunological features associated with severity of the disease. These lncRNAs can be new surrogate biomarker of diagnosis and prognosis in vitro of COVID-19 patients.

| CON CLUS IONS
In conclusion, we have identified a substantial number of COVID-19 related lncRNAs in this study, and we have imputed potential immunological functions for them in the pathogenesis of COVID-19 patients. Moreover, our results provide interesting potential clues into the mechanisms of lncRNA panel in the severity of COVID-19 ARDS.
As the roles of lncRNAs in COVID- 19 have not yet been fully identified and understood, this analysis should provide valuable resource and information for the future studies.

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors confirm that there are no conflicts of interest.