Identification of differentially expressed genes by single‐cell transcriptional profiling of umbilical cord and synovial fluid mesenchymal stem cells

Abstract The purpose of this study was to measure the heterogeneity in human umbilical cord–derived mesenchymal stem cells (hUC‐MSCs) and human synovial fluid–derived mesenchymal stem cells (hSF‐MSCs) by single‐cell RNA‐sequencing (scRNA‐seq). Using Chromium™ technology, scRNA‐seq was performed on hUC‐MSCs and hSF‐MSCs from samples that passed our quality control checks. In order to identify subgroups and activated pathways, several bioinformatics tools were used to analyse the transcriptomic profiles, including clustering, principle components analysis (PCA), t‐Distributed Stochastic Neighbor Embedding (t‐SNE), gene set enrichment analysis, as well as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. scRNA‐seq was performed on the two sample sets. In total, there were 104 761 163 reads for the hUC‐MSCs and 6 577 715 for the hSF‐MSCs, with >60% mapping rate. Based on PCA and t‐SNE analyses, we identified 11 subsets within hUC‐MSCs and seven subsets within hSF‐MSCs. Gene set enrichment analysis determined that there were 533, 57, 32, 44, 10, 319, 731, 1037, 90, 25 and 230 differentially expressed genes (DEGs) in the 11 subsets of hUC‐MSCs and 204, 577, 30, 577, 16, 57 and 35 DEGs in the seven subsets of hSF‐MSCs. scRNA‐seq was not only able to identify subpopulations of hUC‐MSCs and hSF‐MSCs within the sample sets, but also provided a digital transcript count of hUC‐MSCs and hSF‐MSCs within a single patient. scRNA‐seq analysis may elucidate some of the biological characteristics of MSCs and allow for a better understanding of the multi‐directional differentiation, immunomodulatory properties and tissue repair capabilities of MSCs.


| INTRODUC TI ON
Mesenchymal stem cells (MSCs) can differentiate into bone, cartilage and fat cells, which play important roles in development, homeostasis, post-natal growth, repair and regeneration. 1,2 Because of their ability to self-renew with a high proliferation rate, MSCs are a common source of stem cells in clinical applications to regenerate damaged organs and tissues. 3,4 Numerous studies indicate that the major sources for MSCs in the clinical setting are adipose tissue and bone marrow; however, these resources are limited because there are strict donor requirements. [5][6][7] Therefore, alternative sources obtained from neonatal or primitive tissues, such as the amnion, placenta, synovial fluid and umbilical cord, have been explored. 2,[8][9][10] The umbilical cord (UC) is an attractive source of MSCs as it can be obtained by non-invasive methods without harm to mothers or their children. 11 The UC possesses immunosuppressive activity and produces an abundance of MSCs. 12,13 UC-derived MSCs (UC-MSC) are one type of multipotent adult stem cell, which has the potential to differentiate into various cell types, thereby making these cells a possible resource for cell-based therapies. Human umbilical cord-derived mesenchymal stem cells have some characteristics in common with MSCs obtained from adipose tissue and bone marrow, including a fibroblastoid morphology and a similar set of surface proteins, as well the ability to differentiate into different cell types. 14,15 Previous studies have shown that MSCs also exist in synovial fluid (SF). 16,17 In the presence of an injury or osteoarthritis, the number of MSCs from SF increases significantly in order to help recruit mesenchymal progenitor cells to promote spontaneous healing and restore homeostasis. 17 SF-derived MSCs (SF-MSCs) are a viable option for syngeneic transplantation for cartilage regeneration. 18,19 SF-MSCs are ideal for clinical applications because SF can be obtained arthroscopically without the donor undergoing invasive surgery.
In recent years, single-cell genomics has become an incredibly powerful tool to help uncover the genetic structure and population dynamics of unicellular organisms, [20][21][22][23] as well as cancer cells, 24 and has provided insight into the developmental lineages 25 in multicellular organisms. Single-cell RNA-sequencing (scRNA-seq) can be used to analyse differences in the transcriptome of various cells, 26,27 discover novel cell types and provide insights into the regulatory networks that function in ontogenetic development. 28 scRNA-seq is an efficient method for analysing changes in gene expression, and it has been performed successfully in many different tissue types. [29][30][31] In order to uncover information about the subpopulations that exist in MSCs and analyse the differentially expressed genes (DEGs) of these subgroups, we used scRNA-seq to perform transcriptomic profiling in hUC-MSCs and hSF-MSCs. Furthermore, using clustering, principle components analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), gene set enrichment analysis, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses, we were able to identify subgroups and activated pathways within these populations of MSCs.

| Ethics statement
This study was conducted using protocols approved by the Ethical Committee of the Shenzhen People's Hospital. Informed consents were obtained from all participants.

| Isolation and culture of hUC-MSCs
Under sterile conditions, UC units (8-10 cm) were collected from the puerpera of full-term deliveries and were immediately saved in cold saline (0°C). The blood vessels and outer membrane were removed by surgical blades, and the Wharton jelly (WJ) tissue was cut using eye scissors. The minced tissues were then placed in a 10-cm culture dish at 1-cm intervals and were maintained in culture medium (MesenGro Medium supplemented with 10% FBS, 1% penicillin-streptomycin and 10% MesenGro Supplement) at 37°C with 5% CO 2 and 90% RH. After 48 hours, the medium was removed to eliminate non-adherent cells and replaced with fresh medium. The complete culture medium was changed every 3 days. We selected distinct cell subpopulations, and we assumed that these subpopulations were efficient and sustainable.
Colonies smaller than 2 mm in diameter were ignored. The clustered hUC-MSCs were digested with trypsin and resuspended with complete culture medium at a density of 2.0 × 10 4 cells/cm 2 into a 25-mm 2 vented culture bottle. The CD90 + hUC-MSCs were collected by immunomagnetic beads and were identified under more stringent measures.
After approximately three generations, the hUC-MSCs were sterilely obtained to prepare a monoplast suspension of more than 6.0 × 10 5 cells, with a survival rate >90% and cell diameter <30 μm. The hUC-MSCs samples were then sent to GENERGY BIO (Shanghai, China) for scRNA-seq analysis.

| Isolation and culture of hSF-MSCs
The samples were collected during arthroscopic procedures from patients suffering from an intra-articular ligament injury of the knee joint. Isotonic saline solution was injected into the joint, the knee was moved several times, and then, SF (50-100 mL) mixed with saline solution was collected in γ-sterilized centrifuge tubes. Within 1-4 hours, the fluid was filtered with a cell strainer (40 μm nylon) to remove debris. The filtered fluid was gathered in γ-sterilized centrifuge tubes and centrifuged at 405 g for 10 minutes at room temperature. The cell pellet was resuspended in culture medium (MesenGro Medium supplemented with 10% FBS, 1% penicillin-streptomycin and 10% MesenGro Supplement) and plated in 100-mm dishes after centrifugation. After 48 hours, the medium was withdrawn to remove non-adherent cells and replaced with fresh medium. The complete culture medium was changed every 3 days. We selected distinct cell subpopulations and assumed that these units were efficient and sustainable. The colonies smaller than 2 mm in diameter were discarded using a cell scraper (Corning Inc). Then, the distinct cell subpopulation was digested in cloning cylinders (Sigma-Aldrich) and used to inoculate a new dish as passage 1. Passage 3 (P3) cells were used for the scRNA-seq analysis. The gene-cell-barcode matrix was concatenated. Only genes with at least one UMI count detected in at least one cell were used.

| The scRNA-seq analysis
Unique molecular identifier normalization was performed by first dividing UMI counts, followed by multiplication by the median total UMI counts across all cells. Each gene was normalized such that its mean signal was 0, and standard deviation was 1. Principle components analysis was run on the normalized gene-barcode matrix. The normalized UMI counts of each gene were used to show expression of a marker in a t-SNE plot.
To identify genes that were enriched in a specific cluster, the mean expression of each gene was calculated across all cells in the cluster. Each gene from the cluster was then compared to the median expression of the same gene in all other cell clusters. Genes were ranked based on their expression difference, and the top 10 enriched genes from each cluster were selected. For hierarchical clustering, pair-wise correlation between each cluster was calculated, and centred expression of each gene was used to generate a heat map. Gene Ontology and KEGG term information was downloaded from the UniProtKB database. Both GO and KEGG terms with a P-value < .05 were considered to be significantly enriched.
F I G U R E 1 10 × Genomics single-cell technology enables the profiling of RNAs from thousands of single cells simultaneously. Cells were combined with reagents in one channel of a chip. Reverse transcription took place inside each GEM, after which cDNAs were pooled to perform amplification and library construction in bulk. Gel beads loaded with primers and barcoded oligonucleotides were first mixed with cells and reagents, and subsequently mixed with oil-surfactant solution at a microfluidic junction. Single-cell GEMs were collected in the GEM outlet. Finished library molecules consisted of Illumina adapters and sample indices, which allowed for pooling and sequencing of multiple libraries on a next-generation short read sequencer

| The scRNA-seq profiles of hUC-MSCs and hSF-MSCs by 10 × Genomics
For our scRNA analysis, we obtained 1597 cells and 1259 cells from hUC-MSCs and hSF-MSCs samples, respectively (

| Subpopulation discovery in hUC-MSCs and hSF-MSCs samples
The Chromium™ single-cell technology can also be used for scRNAseq of primary cells. We isolated more than 1000 cells from hUC-MSCs and hSF-MSCs. Gene-cell matrices from hUC-MSCs and hSF-MSCs were concatenated, and PCA was performed to reduce dimensionality before performing clustering and t-SNE analysis.
Based on our PCA and t-SNE results, there were 11 clusters present in hUC-MSCs and 7 in hSF-MSCs (Figures 2 and 3).

| GO function analysis of hUC-MSCs and hSF-MSCs samples
The three main categories for GO function analysis are biological process, cellular component and molecular function. As shown in Figure 4, the DEGs found in hUC-MSCs were significantly enriched in In hSF-MSCs, the cluster 1 DEGs were primarily associated with mitotic cell cycle control and processes, protein binding and RNA binding, and were enriched in membrane-enclosed lumen and organelle lumen ( Figure 5A). The DEGs of cluster 2 were primarily related to SRP-dependent cotranslational protein targeting, protein targeting to ER and structural molecule activity, and a significant amount is found in cytosolic ribosomes and ribosomal structures ( Figure 5B).
The DEGs of cluster 3 function in wound healing, vasculature development, platelet-derived growth factor binding, formation of collagen trimers and extracellular matrix structure ( Figure 5C). The DEGs of

| KEGG analysis of hUC-MSCs and hSF-MSCs samples
According to KEGG analysis, the DEGs in hUC-MSC were mainly enriched in Alzheimer's disease, amoebiasis, antigen process and presentation, bladder cancer, cell cycle, chemical carcinogenesis, DNA

| D ISCUSS I ON
In general, MSCs are increasingly being used as a resource for cellbased therapies in cartilage repair and regenerative medicine. 32,33 The

ACK N OWLED G EM ENTS
This study was supported financially by the Natural Science

CO N FLI C T O F I NTE R E S T
The authors confirm that there are no conflicts of interest.

AUTH O R CO NTR I B UTI O N S
The following people designed, performed research and analysed data: Zhaofeng Jia, Shijin Wang, Qisong Liu; Zhaofeng Jia wrote the paper.

DATA AVA I L A B I L I T Y S TAT E M E N T
The raw data used for the analyses in this paper have been deposited in the Genome Sequence Archive in BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, under accession numbers CRA002294, CRA002294 that are publicly accessible at https ://bigd.big.ac.cn/gsa.