Single‐cell RNA sequencing analysis to characterize cells and gene expression landscapes in atrial septal defect

Abstract This study aimed to characterize the cells and gene expression landscape in atrial septal defect (ASD). We performed single‐cell RNA sequencing of cells derived from cardiac tissue of an ASD patient. Unsupervised clustering analysis was performed to identify different cell populations, followed by the investigation of the cellular crosstalk by analysing ligand‐receptor interactions across cell types. Finally, differences between ASD and normal samples for all cell types were further investigated. An expression matrix of 18,411 genes in 6487 cells was obtained and used in this analysis. Five cell types, including cardiomyocytes, endothelial cells, smooth muscle cells, fibroblasts and macrophages were identified. ASD showed a decreased proportion of cardiomyocytes and an increased proportion of fibroblasts. There was more cellular crosstalk among cardiomyocytes, fibroblasts and macrophages, especially between fibroblast and macrophage. For all cell types, the majority of the DEGs were downregulated in ASD samples. For cardiomyocytes, there were 199 DEGs (42 upregulated and 157 downregulated) between ASD and normal samples. PPI analysis showed that cardiomyocyte marker gene FABP4 interacted with FOS, while FOS showed interaction with NPPA. Cell trajectory analysis showed that FABP4, FOS, and NPPA showed different expression changes along the pseudotime trajectory. Our results showed that single‐cell RNA sequencing provides a powerful tool to study DEG profiles in the cell subpopulations of interest at the single‐cell level. These findings enhance the understanding of the underlying mechanisms of ASD at both the cellular and molecular level and highlight potential targets for the treatment of ASD.

lethality or heart defects. 2,3 ASD is a non-cyanotic CHD triggered by aberrant abnormal blood flow between the left and right atria. 4 However, the pathogenic mechanism of ASD remains largely unknown, despite that great efforts have been employed in the prevention, diagnosis, and treatment of ASD. [5][6][7] Studies have investigated the pathogenesis and crucial molecular markers of ASD. For example, multiple transcription factors, including GATA4, NKX2-5, dHAND, TFAP2 and TBX5, are required for early heart development. 8,9 Yang et al. suggested that selective expression of NEXN, an F-actin binding protein, could lead to ASD by inhibiting GATA4. 10 Duong et al. showed that Nr2f1a is expressed in differentiated atrial cardiomyocytes and that it mediates the size of the atrial and atrial-atrioventricular canal by regulating the differentiation of atrial cardiomyocytes. 11 Single-cell RNA sequencing (scRNA-seq) analysis allows the characterization of gene expression landscapes at the single-cell level, which can help us to understand the potential regulatory and driving mechanisms of biological disorders. 12 Utilizing scRNA-seq, studies have investigated the spatial and temporal programming of heart development in animal models, which has revealed the gene expression patterns in the process of organ development. 13,14 On the basis of scRNA-seq analyses of healthy and diseased heart, Gladka et al. suggested that there were disease-specific cell subpopulations, and found that CKAP4 could modulate the activation of fibroblasts, showing positive correlations with known myofibroblast markers. 15 However, to our knowledge, no studies have yet investigated the development of ASD using scRNA-seq analysis. In addition, scRNAseq studies of heart development have mostly been based on animal models, and less so on human cardiac tissue. Therefore, we aimed to characterize gene expression in cells derived from ASD and normal control tissues using scRNA-seq analysis.

| Patient and tissue samples
Normal ventricular muscle tissue and tissues adjacent to ASD were collected from the cadaver of a 3-month-old male ASD patient. The study protocol was reviewed and approved by the Medical Institutional Ethics Committee of Qilu Hospital, Shandong University, China. (Prot. KYLL-2018-080). The study was carried out in accordance with the approved guidelines. Written informed consent was provided by the parents. All procedures in this study were performed in compliance with the Helsinki Declaration.

| Single-cell sequencing and data preprocessing
Samples were prepared into a single-cell suspension and examined for cell count and cell viability using a Countess ® II Automated Cell Counter. Single-cell suspensions with cell activity above 80% and a cell concentration of 1000 cells/μl were mixed with 10× Barcode Gel Beads and enzyme to construct a 10× Genomics labelled singlecell library in accordance with the manufacturer's instructions. The Illumina HiSeq platform was used for sequencing of the library. Raw reads were aligned to the reference genome using STAR cell ranger, and unique alignment sequences were selected for subsequent analysis. The Unique Molecular Identifier (UMI) was calibrated based on the unique RNA sequence alignment results. After removal of duplicates, UMI counting was carried out for the different genes for each Barcode to determine the effective cells.

| Unsupervised clustering and celltype annotation
Expression data were normalized based on UMI, followed by the analysis of reduced dimension using principal component analysis.
A graph-based clustering algorithm 16 and K-means 17 were used for cell clustering analysis. The t-distributed stochastic neighbour embedding (tSNE) 18 was used to visualize the clustering results.
Based on the results of cell clustering, exact tests of the negative binomial of sSeq were used to perform differential analysis and identify the significantly differentially expressed genes of each cell cluster. These genes were considered feature genes. Cell clusters with more than 1% proportion of cells were selected for subsequent analysis. On the basis of edger analysis 19 (Version: 3.4, http://www.bioco nduct or.org/packa ges/relea se/bioc/html/ edgeR.html), the count matrices of gene expression were converted into logCPM for subsequent analysis. A total of 24 cell markers were obtained based on the feature genes combined with the cell markers recorded in the CellMarker 20 (http://bio-bigda ta.hrbmu.edu.cn/CellM arker/) and PanglaoDB 21 (https://pangl aodb.se/) databases, and the cell markers reported in the study of Cui et al. 22 FABP4, CD36, TNNT3 and AQP1 were markers for cardiomyocytes; SELE, ACKR1, PLVAP, DNASE1L3 and CCL14 were   markers for endothelial cells; RGS5, GJA4, TAGLN, ACTA2, MYL9   and SOD3 were markers for smooth muscle cells; DCN, COL1A2,   LUM, COL1A1, FBLN1 and TCF21 were markers for fibroblasts; and AIF1, CD163 and CD68 were markers for macrophages. The R package ComplexHeatmap was used to visualize heatmaps for cell marker expression, and cell clusters were annotated for significantly highly expressed markers.

| Cell-cell crosstalk between cell types
In order to explore the cell-cell crosstalk among different cell types, the R package iTALK 23 (https://github.com/Coolg enome/ iTALK) was used to analyse ligand-receptor interactions. In brief, the upregulated genes of each cell type were matched to the 2,648 nonredundant ligand-receptor interactions (including growth factors, cytokines, checkpoints and another four types) recorded in the iTALK package.

| Cell trajectory analysis
The single-cell trajectory analysis method allows the ordering of cells along with a pseudotime axis, which helps to characterize transitional processes such as lineage development. 24 The R package Monocle 25 (version: 2.18.0, http://bioco nduct or.org/ packa ges/relea se/bioc/html/monoc le.html) was used to perform cell pseudotime trajectory analysis. Genes that were expressed in at least ten cells with mean expression values >0.5 and differentially expressed with q values <0.01 were used in cell trajectory analysis.

| Differential expression analysis between ASD and normal tissue for each cell type
Differential expression analysis between ASD and normal tissue for each cell type was performed using the R package edgeR. The Benjamini and Hochberg method was used to perform multiple tests correction. Differentially expressed genes (DEGs) were selected that had |logFC| >0.263 (1.2 fold change) and adjusted p-values <0.05.

| Functional enrichment analysis
Gene Ontology (GO_BP) terms and KEGG pathways were analysed for DEGs identified in each cell type using the online Metascape

| Protein-protein interaction (PPI) network and modules
The DEGs identified for each cell type were uploaded to the STRING database 29 to investigate their interactions, using the following parameter: Homo sapiens and highest confidence (PPI score = 0.9). The PPI network was then visualized using Cytoscape 30 (version 3.4.0).
The CytoNCA plugin 31 (Version 2.1.6) was used to analyse the degree of centrality for nodes in the PPI network without weighting. The MCODE plugin 32 in the Metascape software was used to screen the key modules of the PPI networks using default parameters (Degree Cut-off = 2, Node Score Cut-off = 0.2, K-core = 2, Max. Depth = 100). The modules with scores >5 were identified as key modules. ClusterProfiler 33 (version:3.8.1) was used to investigate the KEGG pathways identified for the genes in key modules. A Benjamini and Hochberg adjusted p-value <0.05 was used to identify significantly enriched pathways.

| Immunohistochemical staining
Tissue sections were deparaffinized, rehydrated, and treated with citrate buffer (pH 6.0) to retrieve antigens. Then, 3% H 2 O 2 was added to sections for 20 min to block endogenous peroxidase activity, and 3% bovine serum albumin was added to block nonspecific binding sites. The sections were incubated with primary antibody anti-FABP4 antibody (Proteintech) and anti-DCN antibody (Proteintech) at 4°C overnight, then incubated with secondary antibody at room temperature for 50 min. Diaminobenzidine was added as a chromogen and the sections were incubated for 2 h at room temperature.
Sections were then counterstained with haematoxylin, rinsed, and air-dried. Finally, the sections were sealed with neutral resin and examined under a fluorescent microscope (XSP-C204; CIC). Three random fields were photographed under ×400 magnification and analysed using Image-Pro Plus 6.0.

| Cell culture and treatment
H9C2 cell lines were purchased from ATCC and cultured at 37°C with 5% CO 2 . The cells were seeded at a uniform density (10,000/ cm 2 ) and grown to 80% confluence in DMEM containing 10% foetal bovine serum and antibiotics. The medium was then replaced by DMEM without foetal bovine serum supplemented with ANP (MCE) for 12 h at different concentrations.

| Small interfering RNA (siRNA) transfection
siRNAs were transfected into vascular smooth muscle cells (VSMCs) using Lipofectamine ® RNAiMAX (Invitrogen) according to the manufacturer's protocol. The siRNA for hsa_circ_0000280 was designed according to sequences of the junction point. All siRNAs were developed and synthesized by Shanghai GenePharma Co., Ltd, and their sequences are shown in Table S1.

| RNA isolation and quantitative PCR (qPCR)
Total RNA from the tissue specimens and cells was isolated using TRIzol reagent (Life Technologies). To measure the levels of circRNA and mRNA, cDNAs were prepared using the Primescript RT Master Mix (Takara) and quantitative PCRs were carried out using TB Green Premix EX Taq (Takara). circRNA and mRNA expression were normalized to β-actin levels using the 2 −ΔΔCT method. The primer sequences are shown in Table S2. The average cycle threshold for genes was calculated from a minimum of three separate measurements.

| Statistical analysis
GraphPad PRISM 5 (Graphpad Software) was used for statistical analyses. ASD and normal samples were compared using the unpaired t test. p-values <0.05 were considered statistically significant.

| Identification of different cell types
An expression matrix of 18,411 genes in 6487 cells was identified and used in this analysis. Unsupervised clustering showed that the 6,487 cells were clustered into 13 cell clusters ( Figure 1A). After filtering the cell clusters with cell proportions less than 1%, there were nine cell clusters (clusters 0-8) retained and used in the following analysis ( Table 1). As seen in Figure 1B To strengthen our results, we also downloaded the public bulk data of GSE13 2176, GSE23959 and GSE35776 from GEO and extracted the samples we needed. Among them, samples of children with ASD were selected from GSE13 2176; Samples of neonates with right and left ventricles were selected from GSE23959 as normal controls. Control samples also were established in the right ventricle of infants extracted from GSE35776. Totally, 10 ASD cases and 18 controls were obtained. Based on markers we used for celltype identification, the ssGSEA was applied to explore the different infiltration degrees of cardiomyocytes and fibroblasts using the R package 'GSVA'. As shown in Figure S1, it can be seen that in the ASD group, the infiltration level of cardiomyocytes has a significant decrease. In contrast, the infiltration level of fibroblasts has an obvious increase, which is basically consistent with our conclusion.

| Ligand-receptor interactions in cell crosstalk
Ligand-receptor interactions among the cell types were investigated using iTALK. Most ligand-receptor interactions were identified among cardiomyocytes, fibroblast and macrophage, suggesting that there was more cell crosstalk among these three cell types, especially more We also analysed with a typical method CellPhoneDB. Ligandreceptor interactions were identified in five cell types. Among them, there was more cell crosstalk among cardiomyocytes, fibroblast cells and macrophages in ASD sample ( Figure S2). To show specific communication between them, we screen out 14 significant ligand and receptor gene pairs, displayed in a bubble graph ( Figure S2). It was revealed that SPP1-CD44 pairs were significantly activated in ASD samples, especially in macrophages. This point was consistent with the result in iTALK analysis. The bubble diagram proved that these ligands and receptors play essential roles in the crosstalk between cardiomyocytes, fibroblasts and macrophages.

| Differences between ASD and normal samples for different cell types
We firstly examined the DEGs between ASD and normal samples in cardiomyocytes, endothelial cells, smooth muscle cells, fibroblasts and macrophages (Figure 2A and Figure S3). Endothelial cells showed most DEGs between ASD and normal samples. For all cell types, the majority of the DEGs were downregulated in ASD samples (Table 2).  Figure 2B).

| PPI network and module analysis for each cell type
We also investigated the interactions among these DEGs (Table 3).  Figure S5. Similarly, DEGs in different modules were significantly enriched for different pathways ( Figure S5).

| Cell pseudotime trajectory analysis
We also performed cell trajectory analysis using Monocle to order individual cells in pseudotime for cardiomyocytes, endothelial cells and smooth muscle cells, respectively. As shown in Figure 5, cells from normal tissues and ASD tissues were distributed in different trajectory states for all three cell types, suggesting there were significant difference between normal and ASD samples. Cardiomyocytes transition from normal to disease tended to decrease by pseudotime, while colorectal cells and Smooth muscle cells transition from normal to disease tended to increase by pseudotime. Figure 5A showed the pattern number Per cent (%) with an increase at state 5 and 6 (cells from normal tissues), while expression of NPPA was decreased along the pseudotime, especially at state 5 and 6 (cells from normal tissues). Figure 5B showed the pat-

| Immunohistochemical staining
A proportion of cardiomyocytes were decreased in ASD samples compared with normal samples, while ASD samples showed a higher proportion of fibroblasts than normal samples. Therefore, cardiomyocytes and fibroblasts were considered important cell types.
We determined marker gene expression by immunohistochemical staining for these two cell types. As shown in Figure 6A, FITC assay showed that ANP stimulation also aggravated the apoptosis of cardiomyocytes ( Figure 6D). Further, we designed siRNA for ANP and remarkably reduced the expression of ANP in H9C2 cells ( Figure 6C). The knockdown of ANP decreased the percentage of cells in apoptosis ( Figure 6E).

| DISCUSS ION
The heart is the central organ of the circulatory system, and its normal development plays an important role in sustaining life. ASD is a F I G U R E 3 Differential biological processes in gene set variation analysis (GSVA). Heatmaps showing the differential biological processes in the GSVA analysis for the five cell types between ASD and normal samples For cluster 9-12 with cell proportions less than 1%, we also use the R package 'Seurat' to analyse marker genes. Then, Gene

TA B L E 3 Statistics of nodes and interactions in PPI network for different cell types
Ontology (GO_BP) terms and KEGG pathways were analysed for DEGs identified in each cell type using the online Metascape tool.
After the term that meets the above parameters is obtained, further clustering is carried out according to the genetic similarity GO analysis enhanced the inference that Cluster 12 was related to muscle system regulation.
Cardiomyocyte marker gene FABP4 showed interaction with FOS, while FOS showed an interaction with NPPA, which has been reported to play an important role in heart development. 34 There are some previously reported ASD related biomarkers, such as ACTC1, Alk3 and Whsc1. Alpha-cardiac actin (ACTC1), which is essential for cardiac contraction, has been reported that reduced ACTC1 levels may lead to ASD. 48,49 We found that ACTC1 was downregulated in cardiomyocytes, endothelial cells, fibroblast cells and smooth muscle cells of ASD samples in our sequencing data ( Figure   S7). BMP receptor Alk3 plays an essential role in BMP signalling, which may contribute to human congenital heart diseases. In our results, Alk3 (BMPR1A) was downregulated in endothelial cells ( Figure S7). It has been reported that conditional endothelial depletion of Alk3 severely impairs cushion morphogenesis during mammalian cardiogenesis. 50 Alk3-mediated BMP signalling is required for endocardial formation and survival of AV cushion mesenchymal cells. 51 These pieces of evidence indicated that Alk3 induced endothelial dysfunction might be a key reason for ASD formation. Wolf-Hirschhorn Syndrome Candidate 1 (Whsc1) deletion of mice showed various atrial and ventricular septal defects. 52 It is consistent with our result that Whsc1 was downregulated in cardiomyocytes, endothelial cells, and smooth muscle cells ( Figure S7). Whsc1 can interact with Nkx2.5 to repress transcription of NKX2-5 target genes such as Nppa. Nppa is aberrantly expressed in Whsc1 deleted hearts. Our results also confirmed the change of Nppa in ASD. Therefore, our study proved these reported conclusions and provided a new viewpoint of single cell.
Our study had some limitations. The major limitation is the small sample size, which could have led to large batch effects, and the reliability of the results may have been reduced to some extent. However, collecting human cardiac tissue poses great challenges in China, as infants' and children's bodies are rarely donated for scientific research.
The lack of human tissue that contains disease characteristics limits scRNA-seq investigation of human disease, including ASD. This was not the only challenge during our study. In addition, studies of heart development using scRNA-seq have mostly been based on animal models and less on human cardiac tissue. Furthermore, normal atrial septal tissue should ideally be used as a normal control. However, the patient had a serious atrial septal defect, and ventricular muscle tissue was collected as the normal control, considering that the lesion might affect atrial muscle. Only one marker gene for cardiomyocytes and fibroblasts was investigated by immunohistochemical staining.
We preliminarily analysed several genes in cardiomyocytes. Further experiments should be carried out on different cell types.
In conclusion, we characterized cell subsets in ASD and normal samples, and five major cell types, including cardiomyocytes, endothelial cells, fibroblasts, macrophages and smooth muscle cells, were identified. ASD samples showed a decreased proportion of cardiomyocytes and an increased proportion of fibroblasts, and there was more cellular crosstalk between cardiomyocytes and fibroblasts.
There were similarities and differences in DEGs and their functions between ASD and normal samples for these cell types. These findings increase the understanding of the underlying mechanisms of ASD at both the cellular and molecular level, and highlight potential targets for the treatment of ASD.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.