Single‐cell RNA sequencing identify SDCBP in ACE2‐positive bronchial epithelial cells negatively correlates with COVID‐19 severity

Abstract The coronavirus disease 2019 (COVID‐19), caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), has resulted in many deaths throughout the world. It is vital to identify the novel prognostic biomarkers and therapeutic targets to assist with the subsequent diagnosis and treatment plan to mitigate the expansion of COVID‐19. Since angiotensin‐converting enzyme 2 (ACE2)‐positive cells are hosts for COVID‐19, we focussed on this cell type to explore the underlying mechanisms of COVID‐19. In this study, we identified that ACE2‐positive cells from the bronchoalveolar lavage fluid (BALF) of patients with COVID‐19 belong to bronchial epithelial cells. Comparing with patients of COVID‐19 showing severe symptoms, the antigen processing and presentation pathway was increased and 12 typical genes, HLA‐DRB5, HLA‐DRB1, CD74, HLA‐DRA, HLA‐DPA1, HLA‐DQA1, HSP90AA1, HSP90AB1, HLA‐DPB1, HLA‐DQB1, HLA‐DQA2, and HLA‐DMA, particularly HLA‐DPB1, were obviously up‐regulated in ACE2‐positive bronchial epithelial cells of patients with mild disease. We further discovered SDCBP was positively correlated with above 12 genes particularly with HLA‐DPB1 in ACE2‐positive bronchial epithelial cells of COVID‐19 patients. Moreover, SDCBP may increase the immune infiltration of B cells, CD8+ T cells, CD4+ T cells, macrophages, neutrophils and dendritic cells in different lung carcinoma. Moreover, we found the expression of SDCBP was positively correlated with the expression of antigen processing and presentation genes in post‐mortem lung biopsies tissues, which is consistent with previous discoveries. These results suggest that SDCBP has good potential to be further developed as a novel diagnostic and therapeutic target in the treatment of COVID‐19.


| INTRODUC TI ON
Since December 2019, the outbreak of a novel coronavirus, which causes coronavirus disease 2019 (COVID- 19), has brought terrible suffering to patients worldwide and has resulted in not only continuously increasing mortality rates but also considerable economic losses. 1,2 The patients who initially develop fever or respiratory symptoms can then be divided into a mild group and a severe group based on the severity of their symptoms. A large proportion of patients with the common mild disease were classified into the mild group, and approximately 15% to 20% of the patients were classified into the severe group and needed to receive oxygenation as part of their adjuvant therapy. Although all patients need to receive a good standard of care, those belonging to the severe group place the highest pressure on doctors. 3 Thus, identification of the differences in gene expression between the patients in the mild group and those in the severe group is necessary. We are committed to discover the differentially expressed genes (DEGs) between patients with mild symptoms and those with severe symptoms and promoting the development of a treatment for COVID-19 in a targeted manner. The physiological response after viral infection is usually manifested by viral replication and subsequent changes at the cellular level. 10 Once the virus enters and infects host cells, the infected cells detect the presence of viral replication via a number of pattern recognition receptors (PRRs). In the context of viral infection, the detection of viral replication is mediated to a large extent by an intracellular PRR family that identifies abnormal RNA structures that typically form during viral replication. 11 Human leucocyte antigen (HLA), which is the major pattern recognition receptor in humans, is related to antigen processing and presentation pathways. Notably, bronchial epithelial cells exhibit similar antigen presentation and processing properties. 12 The current research on COVID-19 is focussing on two major problems that remain to be resolved: the mechanism through which COVID-19 results in the development of mild and severe symptoms remains unclear, 13,14 and effective therapeutic targets for COVID-19 have not been discovered. 13 We aim to explore the internal mechanism that leads to the development of mild or severe COVID-19 symptoms through a comprehensive analysis of the gene expression profile of ACE2-positive bronchial epithelial cells and to identify new genes that can serve as therapeutic targets.
To compare the transcriptional response of patients with mild and severe COVID-19 symptoms, the transcriptional signals that might serve as the biological basis for COVID-19 were identified.
We downloaded a single-cell RNA sequencing data set (GSE14 5926) from the Gene Expression Omnibus database that contains information on cells from the bronchoalveolar lavage fluid (BALF) in six patients with severe COVID-19 and three patients with mild COVID-19. We then selected cells with ACE2 expression levels greater than 0 as ACE2-positive cells, and these cells were subsequently annotated as bronchial epithelial cells using SingleR. 15 We subsequently used these cells to identify the DEGs between patients with mild disease and those with severe disease and constructed a coexpression network through weighted gene coexpression network analysis (WGCNA) to identify the modules that are most important for antigen processing and presentation pathways. We also performed a human reference interaction (HuRI) analysis to determine the unbiased relationship between the above-mentioned pathways and modules (http://inter actom e-atlas.org). The obtained results were then verified by gene expression profiling interactive analysis (GEPIA), using the tumour immune estimation resource (TIMER) and through Pearson's correlation coefficients. These results suggest that SDCBP has good potential to be further developed as a novel diagnostic and therapeutic target in the treatment of COVID-19.

K E Y W O R D S
ACE2, antigen processing and presentation, COVID-19, HuRI (human reference interactome), SDCBP (Syndecan-Binding Protein), WGCNA (weighted gene co expression network analysis) the read count matrix of the COVID-19 patients and healthy control. H5 expression data were read using the Read10X_h5 function (Seurat package). Nine cases were selected for analysis, and these included three cases of mild disease and six cases of severe disease. All ACE2-positive cells (count>0) were selected using the subset function (dplyr package) in R software, and unique cells with molecules detected count (nCount) > 75 000 and unique cells with percent. mt (percentage of mitochondrial genes) values greater than 40 were subsequently filtered out (Seurat package).

| Dimensionality reduction and clustering
The first 2000 genes with the most significant differences in expression were obtained using the FindVariableFeatures function (Seurat package). We applied a linear transformation function, the ScaleDate function (Seurat package), to preprocess the data, which is an essential step in data standardization before the application of a dimensionality reduction technique, such as principal component analysis (PCA). We then performed a PCA of the scaled data using the RunPCA function (Seurat package) and clustered the cells using the FindNeighbors and FindClusters functions (Seurat package).
We subsequently utilized the UMAP algorithm for dimensionality reduction to visualize these data sets and make them suitable for exploration.

| Cell type annotation of single cells
We used the SingleR (Single-cell Recognition) function (SingleR package) to annotate single-cell RNA-seq clusters based on Human Primary Cell Atlas data, which is a reference data set of samples with known labels. Based on similarity to the above-mentioned reference set, the ACE2-positive cells were labelled from the test data set.

| Biomarker genes that showed differential expression between the mild and severe groups
We identified markers of the mild group compared with the severe group using the Find All Markers function (Seurat package).
These biomarkers were detected in cells with at least 25% gene expression in both populations, regardless of the level of gene expression.

| Gene enrichment analysis
Using the gseaGO and gseaKEGG functions (clusterProfiler package), we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses of the differentially expressed biomarker genes.

| Construction of the co-expression module
First, the usability of 5000 genes was evaluated, and the adjacency matrix A mn was defined as follows: where S mn was Pearson's correlation coefficient between gene m and gene n and Aij was the contiguity between gene m and gene n. We selected β = 5 as the soft-thresholding parameter (scale-free R2 = 0.88).
We then transformed the adjacency matrix into a topological overlap matrix (TOM), and genes with high absolute correlation values were divided into different gene modules according to dissimilarity measures based on the TOM through average linkage hierarchical clustering with a minimum of 30.

| Identification of modules
We calculated the correlations between the module eigengenes (the major component in the principal component analysis for each gene module) and antigen processing and presentation genes to measure the significance of each module. The module significance (MS) was defined as the average absolute gene significance measured for all the genes in a given module. In general, the module with the highest absolute MS was considered to exhibit the strongest link between given genes.

| Direct genetic association identification
We downloaded the HuRI unions from http://www.inter actom eatlas.org/ and then analysed the PPIs in modules with high absolute biological significance. Subsequently, the genes in the modules that are most directly related to various important biological functions were acquired.

| Extraction of ACE2-positive cells and identification of cellular information
In this study, we obtained single-cell sequencing data from nine cases in the GSE14 5926 data set (Table 1) (Table S1).

| The relationship between 222 ACE2-positive cells and bronchial epithelial cells
We created a cell-by-gene expression matrix and performed dimensionality reduction by uniform manifold approximation and projection (UMAP) for further clustering analysis/( Figure 2A). 16 Clusters were then further identified according to the different types of cell expression through graph-based clustering, and four clusters were ultimately obtained using the Seurat package ( Figure 2B). 15 It is observed that the gene expression features of bronchial epithelial cells between mild and severe patients have different distribution in above four populations, indicating certain expression differences between bronchial epithelial cells from mild and severe patients.
Interestingly, we discovered that all ACE2-positive cells belonged to bronchial epithelial cells by SingleR using the Human Primary Cell Atlas as the reference data set 17 ( Figure 2C). Furthermore, we used canonical markers (Table S2), especially MUC1 and SLC34A2, to match the ACE2-positive cells to known cell types, and gene expression profiles were visualized by heat map ( Figure 2D) and the epithelial cell markers were visualized by violin plot (Figure 2A-E), [18][19][20][21][22][23][24][25][26] which confirmed that all ACE2-positive cells belonged to bronchial epithelial cells.

| Antigen processing and presentation genes were significantly up-regulated in ACE2-positive bronchial epithelial cells of patients with mild disease
The cluster results showed that the main components of the ACE2positive cells of patients with mild and severe disease exhibited significant differences, which hinted at further exploration of DEGs.
We then identified 1925 DEGs by comparing cells from mild cases with those from severe cases ( Figure 3A) (Table S3) and HSP90AB1, are also closely related to multiple immune processes. 30,31 These DEGs between the mild and severe groups might be key to determining the severity of COVID-19.

| Correlation of SDCBP with the expression of antigen processing and presentation genes in bronchial epithelial cells from COVID-19 patients
For the exploration of prognostic markers and therapeutic targets, we built coexpression networks through a WGCNA of bronchial epithelial cell gene expression in data sets of the mild and severe patient groups described in earlier parts of the study. 32 The dendrogram obtained from the cell sample clustering using the average linkage method indicated that the microarray is feasible ( Figure 4A). A significant difference in the distribution of gene expression related to antigen processing and presentation was found, and these genes were concentrated in the samples from the mild group, which further indicates that these genes are associated with disease severity ( Figure 4B).
We selected β = 5 as the soft-thresholding power (scale-free R 2 = 0.88; Figure S2A-B), and seven modules were then screened by average linkage hierarchical clustering ( Figure S2C). Our findings showed a strong link between the genes in the yellow module and the genes in antigen processing and presentation pathways ( Figure 4B).
We further analysed the yellow module and found 83 genes ( Figure   S2D-E) (Table S4). Using the HuRI atlas as the reference, we found that SDCBP (SDCBP genes or SDCBP-protein A-genes confirmed by Y2H assay) has direct or indirect interaction with 20 of the 83 proteins and exhibited most PPIs with proteins in the yellow module ( Figure 4C). 33 These indicate that SDCBP plays a potential role in human antigen presentation in bronchial epithelial cells from COVID-19 patients.

| The relationship between SDCBP and antigen processing and presentation genes in lung tissue
Our findings were confirmed using the validation data set GSE14 that between SDCBP and HLA-DPB1 ( Figure S3, Figure 5D-F). 36 We also found that the increase in the SDCBP gene expression level was significantly associated with the increase in the canoni- Figure 5G).

cal markers of B cells, CD8+ T cells, CD4+ T cells, macrophages, neutrophils and dendritic cells in LUSC and LUAD (
All those cell markers are precisely provided by TIMER. 37 The analysed results suggested that SDCBP is positively correlated with antigen processing and presentation genes in lung tissues.

| The relationship between SDCBP expression and antigen processing and presentation genes in epithelial cells of post-mortem lung biopsies epithelial cells from patients die of COVID-19
For some patients with sincerely severe symptoms, inevitable death was known as the worst outcome. We downloaded the data about two post-mortem lung biopsies PB1 and PB2 from the validation data set GSE15 8127 (https://www.ncbi.nlm.nih.gov/geo/query/ acc. cgi?acc=GSE15 8127). We picked only ACE2-positive cells therefrom for further analysis. We also use canonical markers (Table S2) to match the ACE2-positive cells to known cell types (Table S5-S6), from which we discovered that SDCBP was positively correlated with the antigen processing and presentation genes (Figure 6), such as HLA-DPB1, HLA-DRB1, HLA-DRA, HLA-DQB1, HLA-DMA.
Thus, there is a suspect that SDCBP might be a new target in the treatment of COVID-19, which could provide new ideas for achieving a good prognosis.

| D ISCUSS I ON
We selected ACE2-positive bronchial epithelial cells from BALF for analysis in this study, and our results provide evidence showing that the expression of antigen processing and presentation genes, particularly HLA class Ⅱ, is significantly increased in patients with mild disease. Furthermore, based on the above-mentioned findings, we To date, there is no evidence of any effective treatment for severe COVID-19. Although many therapies have been proposed, quarantine is the only intervention that appears to be effective. 41 Thus, there is an urgent need to find new treatments.
Patients with mild disease tend to quickly recover without any Due to a lack of virus-specific immunity, a strong innate immune dysregulation known as a "cytokine storm" leads to the aggravation of COVID-19 infections. 45 Notably, the H1N1 influenza pandemic and the H5N1 outbreak of highly pathogenic avian influenza have shown that some patients infected with the virus die not from the infection itself but from overimmunity. 46  SDCBP is a PDZ domain-containing adaptor protein that can influence the trafficking of transmembrane proteins 48 but also reportedly regulates the assembly of signalling complexes, including signalling downstream of Toll-like receptors(TLRs) 49 . In this study, SDCBP-expressing cells belong to epithelial cell; we found the expression of canonical airway epithelial and immune cell markers in SDCBP-expressing cells (Table S7). WGCNA 32 can be used to find clusters (modules) of highly correlated genes. The relationship between genes has been verified by a series of related studies. HuRI provides binary protein-protein interactions 33 between 17 500 tested proteins that have been verified by Y2H assays.
The present study provides a new prognosis and therapy target for COVID-19 through the use of WGCNA and HuRI: SDCBP.
However, further in vivo and in vitro studies are necessary to clarify the molecular mechanisms associated with SDCBP or HLA Ⅱ.
Because the global COVID-19 epidemic remains at the pandemic phase, any further progression in elucidating the viral infection mechanism might be valuable to vaccine development.

CO N FLI C T O F I NTE R E S T
The authors declare that there are no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
All the data used in this study were downloaded from public databases as stated in the manuscript.