Transcriptional and open chromatin analysis of bovine skeletal muscle development by single‐cell sequencing

Abstract Skeletal muscle is a complex heterogeneous tissue and characterizing its cellular heterogeneity and transcriptional and epigenetic signatures are important for understanding the details of its ontogeny. In our study, we applied scRNA‐seq and scATAC‐seq to investigate the cell types, molecular features, transcriptional and epigenetic regulation, and patterns of developing bovine skeletal muscle from gestational, lactational and adult stages. Detailed molecular analyses were used to dissect cellular heterogeneity, and we deduced the differentiation trajectory of myogenic cells and uncovered their dynamic gene expression profiles. SCENIC analysis was performed to demonstrate key regulons during cell fate decisions. We explored the future expression states of these heterogeneous cells by RNA velocity analysis and found extensive networks of intercellular communication using the toolkit CellChat. Moreover, the transcriptomic and chromatin accessibility modalities were confirmed to be highly concordant, and integrative analysis of chromatin accessibility and gene expression revealed key transcriptional regulators acting during myogenesis. In bovine skeletal muscle, by scRNA‐seq and scATAC‐seq analysis, different cell types such as adipocytes, endothelial cells, fibroblasts, lymphocytes, monocytes, pericyte cells and eight skeletal myogenic subpopulations were identified at the three developmental stages. The pseudotime trajectory exhibited a distinct sequential ordering for these myogenic subpopulations and eight distinct gene clusters were observed according to their expression pattern. Moreover, specifically expressed TFs (such as MSC, MYF5, MYOD1, FOXP3, ESRRA, BACH1, SIX2 and ATF4) associated with muscle development were predicted, and likely future transcriptional states of individual cells and the developmental dynamics of differentiation among neighbouring cells were predicted. CellChat analysis on the scRNA‐seq data set then classified many ligand–receptor pairs among these cell clusters, which were further categorized into significant signalling pathways, including BMP, IGF, WNT, MSTN, ANGPTL, TGFB, TNF, VEGF and FGF. Finally, scRNA‐seq and scATAC‐seq results were successfully integrated to reveal a series of specifically expressed TFs that are likely to be candidates for the promotion of cell fate transition during bovine skeletal muscle development. Overall, our results outline a single‐cell dynamic chromatin/transcriptional landscape for normal bovine skeletal muscle development; these provide an important resource for understanding the structure and function of mammalian skeletal muscle, which will promote research into its biology.

This makes it possible to assess distinct cell types and states, their dynamic trajectories and molecular programmes governing sequential cell fates in skeletal muscle development. 7 Moreover, recent advances in scATAC-seq offer a single-cell method for measuring chromatin accessibility to generate additional information about gene regulatory processes. Combined with scRNA-seq, these approaches have the potential to unravel how dynamic changes in cisregulatory elements driven by changes in TF binding regulate gene expression programmes during development. 8,9 A recent study estab-

| Animals
Longissimus muscle samples of Qinchuan cattle were collected from foetal (60 days), postnatal (4 months) and adult (24 months) stages at Ningxia Yikang Animal Husbandry. The cow was artificially inseminated on natural oestrus. All the cattles were not fed the night before they were euthanized. The care and use of experimental animals were in full compliance with local animal welfare laws, guidelines and policies.

| scRNA-seq data processing
The 10Â Genomics scRNA-seq raw data were processed and aligned to the reference genome downloaded from ENSEMBL (release version 104) and then quantified using the package Cell Ranger (release 6.0.2). Then, we utilized the R package Seurat (release 4.0.5) to perform downstream analysis. 9 First, cells with low detected gene numbers, low total read numbers and high ratios of mitochondrial to somatic genes were detected automatically using the R function 'isOutlier' and then filtered out. Then, the count matrix was normalized and scaled using the 'NormalizeData' and 'ScaleData' functions. We performed principal component analysis (PCA) based on the top 2000 highly variable genes using the 'RunPCA' function. To reduce background noise, the significant PCA results were selected according to the p values generated by 'ScoreJackStraw'. We clustered cells using the 'FindClusters' function and then projected them onto a twodimensional space using the 'RunTSNE' or 'RunUMAP' functions.

| Integrative analysis of scRNA-seq data
Before integrating multiple scRNA-seq data from the three different developmental stages, we first examined whether there were any batch effects. We found that the strongest remaining factor was sample-tosample differences, because cells apparently tended to cluster by sample. Therefore, we applied the R package 'Harmony' (release 0.1.0) to correct batch effects. 11 After that, we annotated cell clusters by first identifying the marker genes of each cluster with the 'FindMarkers' function from the R Seurat toolkit and then manually annotating the cell clusters according to known cell markers for each cell type. Clusters that expressed the same marker genes were merged.

| Trajectory analysis of scRNA-seq and scATAC-seq data
To infer the developmental trajectory of particular cell lineages, we used the R package 'Monocle2' (release 2.20.0) to perform trajectory analysis based on scRNA-seq and scATAC-seq data. 12 Specifically, the differentially expressed genes between different cell types were selected for following analyses. Next, the 'DDRTree' method was applied to perform dimensionality reduction and order the cells along pseudotime. Because Monocle2 cannot know a priori which of the trajectory of the tree to call the 'beginning', the 'root_state' argument was used to specify the zygote stage as the beginning and we then reapplied 'orderCells'.

| Single-cell regulatory network inference
To infer the regulatory activity of TFs in subpopulations, we used the pyS-CENIC package (release 0.11.2) to perform gene regulatory network analysis. 13 Briefly, regulatory modules were identified by inferring coexpression between TFs and genes containing TF-binding motifs in their promoter regions. First, we generated two gene-motif ranking files including 10 kb around the TSS and 500 bp upstream. Then the 'GRNBoost' function implemented in pySCENIC was employed to identify coexpression modules and quantify the weight between TFs and target genes. Target genes with a low positive correlation (<0.03) in each TF module were removed from further analysis. The regulatory activity of TFs was quantified by area under the recovery curve value from the enrichment of each regulon.
Subpopulation-specific regulons were identified according to the average regulon activity scores in the subpopulation.

| RNA velocity analysis
To infer the developing trajectory of individual cells or cell populations, we employed the 'velocyto' (package release 0.17.17) to estimate RNA velocities of single cells by quantifying unspliced and spliced mRNAs based on output files from the package 'Cell Ranger'. 14 The spliced/ unspliced expression matrices of genes were processed using the R package 'velocyto.R' (release 0.6) to estimate velocity. Then, we visualized RNA velocity by projecting gene unspliced/spliced reads abundance on the clustering embedding space generated by integrative analysis.

| Cell communication analysis
To quantify intercellular communication networks, we used the 'Cell-Chat' package (release 1.1.3) to investigate potential cell communications. 15 Because there is incomplete annotation of receptor-ligand pair information for the bovine geneome, we only selected homologous human genes for further analyses. Briefly, the CellChat objects were first created using the 'createCellChat' function. Then, the communication probability was inferred using 'computeCommunProb' and 'compute-CommunProbPathway' functions. The network centrality scores and the contribution of each ligand-receptor pair to the signalling pathway were calculated using 'netAnalysis_computeCentrality' and 'netAnalysis_contribution', respectively. All graphs of visualized cell communications were generated using the functions implemented in the CellChat package.

| Pre-processing
The fastq files of scATAC-seq data were first processed by the function 'cellranger-atac count' of 10Â Genomics Cell Ranger ATAC (version 2.0.0) pipeline, which generated fragment files and per-barcode fragment metrics after filtering the reads, alignment, barcode counting, identification of transposase cut sites and detection of accessible chromatin peaks. In the alignment step, we used the reference genome bosTau9.

| Creating cell-by-bin matrix
Downstream analyses were performed using the 'SnapATAC' (release 1.0.0) pipeline. 16 Taking fragment files as input, we generated snap files by 'snaptools' (release 1.4.8), only considering uniquely aligned and properly paired fragments with a mapping quality ≥30. To create a cell-by-bin matrix, we segmented the genome into 5-kb bins and scored each cell with the number of aligned fragments in a given bin using the fragment files. Only cells with high-quality libraries were retained according to the total number of detected high-quality fragments (1000-5000) and the ratio of high-quality fragments overlapping with annotated transcription start sites (0.125-0.8). After binarization of the matrix, we calculated and normalized the coverage of each bin and finally eliminated those overlapping with invariant features and mitochondrial DNA to avoid potential contamination. Any cells with <500 bin coverages were discarded in the following analyses.

| Dimension reduction and cell clustering
Based on the cell-by-bin matrix, the Nyström landmark diffusion maps algorithm was applied for nonlinear dimensionality reduction after combining the samples of the three groups, which included sampling, embedding and extension. In short, we sampled 10,000 cells from the total as 'landmarks' in a density-based manner developed in SCTransform and computed a diffusion map embedding. The remaining query cells were then projected onto the resulting low-dimension embedding, finally generating a joint embedding space. Batch effects of three groups were corrected by R package Harmony (release 0.1.0). 11 After that, using the top 20 significant components determined by an ad hoc method, a k-nearest neighbour graph was constructed for cells clustering and grouped cells that were of the same type through a community finding algorithm.

| Calculation of gene activity scores
In preparing the integrating scATAC-seq data and corresponding scRNA-seq data for cell type annotation, we created a cell-by-gene matrix by calculating the number of fragments intersecting with the regions of genes body and 2 kb upstream from the TSS, which was performed using the function 'createGmatFromMat' in SnapATAC. In this matrix, we only selected the top 2000 highly variable genes of scRNA-seq data for downstream analyses.

| scRNA-seq-based annotation
For cell type annotation of chromatin accessibility data, we followed the scRNA-seq-based annotation pipeline provided by SnapATAC. 16 First, we converted the snap object into a Seurat object (release 4.0.5) which was integrated with the paired scRNA-seq data based on the linkage between chromatin accessibility and gene expression in single cells. 9 Cell types were predicted by calculating 'prediction scores' with the function 'Transfer-Data'. After that, 'pseudo multi-omics' cells containing information of gene expression as well as chromatin accessibility were created by imputation of gene expression profile. At last, we re-clustered cells with max prediction score larger than 0.5 using the first 20 dimensions and constructed TSNE graphs for visualization of the high-dimensionality dataset.

| Identification of peaks and differentially accessible regions
We clustered fragments from cells of the same cell type and identified peaks using MACS2 (release 2.2.7.1) with the parameters '--nomodel --shift 100 --ext 200 --qval 5 e-2 -B -SPMR'. 17 Peaks from all clusters were merged to create and add a cell-by-peak matrix to the snap object. Differentially accessible regions (DARs) for each cluster were recognized using function 'findDAR'. In brief, for a given cluster of cells, snapATAC took their neighbouring cells or the remaining cells as background to perform differential analysis in line with the size of that cluster. 16 Peaks with log-transformed fold change >0 and FDR <0.05 were considered as significant DARs.

| GREAT analysis
Taking DARs as input, we used GREAT to infer candidate and active biological pathways in each cell cluster. 18

| Motif enrichment analysis
TF motif enrichment analysis of DARs was performed using the 'find-MotifsGenome' function of package HOMER (release 4.11) for identification of known and de novo motifs. 19 2.8.9 | Integration with scRNA-seq To integrate the scATAC-seq and scRNA-seq data, we followed the multiway alignment using all pairs method tutorial embedded in package scAlign (release 1.8.0), which was achieved by associating the gene expression and gene activity scores. 20
immunostaining, the paraffinized 5 μm sections of skeletal muscle were deparaffinized and submitted to antigen retrieval in Tris-citrate buffer (Servicebio) followed by incubated with 3% hydrogen peroxide in phosphate-buffered saline (PBS; Servicebio) to inhibit endogenous peroxidases. After closed with 10% bovine serum albumin (BSA; Servicebio), sections were incubated at 4 C overnight with primary antibodies followed by incubated 1 h with an HRP-labelled secondary antibody (Cell Signaling Technology) at room temperature.
Peroxidase activity was revealed with 3,3 0 -diaminobenzidine (DAB; Servicebio), which produces a brown staining, and sections were then stained for 30 s with haematoxylin (Servicebio), dehydrated, and mounted. Finally, the slides were scanned under the E100 microscope (Nikon), and data were analysed using the Aipathwell software (Servicebio).  Figure 1D,E). Adult skeletal muscles are composed mainly of slow and fast myofibers, which are classified as Type I fibres, Type IIA fibres and Type IIX fibres. In addition, Type IIB fibres were undetectable in our scRNA-seq data, in line with the consensus that these genes are not expressed in skeletal muscles of large mammals such as humans and bovines. 22,23 Interestingly, hybrid myofibers expressing more than one MYH isoform were identified, in which Type I fibres 2 can express MYH7 and MYH2 while Type I fibres 1 only express MYH7 ( Figure 1E). For Type IIA fibres, one more MYH7 copy is expressed in Type IIA fibres 2 than in Type IIA fibres 1. Analogously, using snRNA-seq, mixed nuclei coexpressing MYH7 with and ZBTB18 ( Figure S1B). This reflects the similarity in transcriptional regulation between myogenic progenitors and myoblasts, and specific transcriptional events might occur during myocyte formation. Notably, these specifically expressed TFs among neighbouring cell subpopulations are likely candidates for the promotion of cell fate transition, whose specific role can be the focus of future research.

| RNA velocity analysis of transcriptional plasticity
As pseudotime analysis does not fully reveal the developmental dynamics of differentiation, we applied RNA velocity analysis to our scRNA-Seq data set, which can be used to predict likely future transcriptional states of individual cells by distinguishing spliced and unspliced mRNAs. Here, RNA velocity based on UMAP plots was visualized; arrows each represent the local average velocity on a vector field with their length indicating the speed of the differentiation events ( Figure 3A). As expected, the myogenic progenitors and myoblasts have RNA velocity vectors pointing towards the myocytes, suggesting strong myocyte lineage commitment. Intriguingly, most RNA velocity vectors within the myoblasts point backwards in pseudotime, indicating that myoblasts maintain the undifferentiated cell phenotype ( Figure 3B). Significantly, Schwann cells and myocytes share similar trajectories with long RNA velocity vectors ( Figure 3B).
Schwann cells and myocytes are the core components of neuromuscular junctions, allowing the transmission of neuronal impulses to skeletal muscles for contraction. 26 Moreover, we found that pericytes had the longest RNA velocity vectors pointing towards many cell clusters, such as myogenic progenitors, adipocytes, fibroblasts, myofibres and Schwann cells, suggesting that they have the capacity of multidirectional differentiation to maintain normal skeletal muscle development ( Figure 3C).
The bovine longissimus dorsi is a fast-twitch muscle mainly consisting of Type II fibres, and has a higher glycolytic capacity than slowtwitch (Type I fibres). 27 Previous reports have confirmed that fibre type switching such as I $ IIA $ IIX can be induced by exercise, by changes in neural activity, or under the influence of hormones during development. 28 Figure 4B,C). Notably, The WNT ligand-receptor pairs WNT7A-FZD3/LRP6 were the major contributors to this communication network ( Figure 4D), which is consistent with a report that WNT7A-dependent expression is mediated by the canonical WNT cascade during embryonic myogenesis. 29 Several studies have shown that WNT signalling plays a role in fibre type determination, so we have speculated the presence of cell-cell interactive mechanisms driving myofibre diversification during development. 30  In another example at the three time points, monocytes or lymphocytes were the source of TNF ligands ( Figure S3C), consistent with its multifunctional proinflammatory cytokine role secreted predominantly by these cell types. 32 Network centrality analysis of the VEGF signalling pathway network confirmed adipocytes and fibroblasts were the dominant source of VEGF ligands acting on endothelial cells, with minor contributions from skeletal muscle cell populations. However, skeletal muscle fibres gained VEGF responsiveness as the body developed ( Figure S3D). Moreover, CellChat analysis predicted that the FGF signalling pathway network is complex and highly redundant with multiple ligand sources, and adipocytes and fibroblasts demonstrated significant autocrine signalling ( Figure S3E).

| Ligand-receptor interaction prediction during bovine skeletal muscle development
To explore how multiple cell groups and signalling pathways coordinate to function during myogenesis, CellChat pattern recognition modules were applied and showed that certain cell types simultaneously activate multiple signalling pathways and rely on largely overlapping incoming and outgoing signalling networks at different developmental nodes. For example, the application of this analysis revealed two, three, and four patterns for outgoing signalling and incoming signalling at the FB, CB and AB stages, respectively ( Figure 4E,F and Figure S3F). The patterns here represent multiple pathways that change over time, notably, adipocytes share the same pattern 1 with fibroblasts for both incoming and outgoing signalling at different stages. Thus, multiple cell types in skeletal muscle contribute to support complex signal transduction, in which adipocytes and fibroblasts seem to play key roles. Moreover, outgoing and incoming signalling patterns also provide specific autocrine and paracrine signalling pathways for a given cell type. For instance, at the FB stage, the major autocrine pathways between adipocytes are PTN, BMP, IGF and FGF, while the major paracrine signalling pathways for adipocytes are SPP1, PDGF and MSTN ( Figure 4E,F).
These analyses can potentially help in the deeper understanding of regulatory mechanisms driving differentiation of the skeletal muscle lineage.

| scATAC-seq reveals chromatin accessibility landscapes of the bovine skeletal muscle
Our scRNA-seq results focus on RNA expression, ignoring the epigenetic changes which are the primary determinants of cellular potential.
To gain more insights into the development of bovine skeletal muscle at the chromatin level, we obtained chromatin accessibility profiles of 24,333 cells from these successive developmental stages (7274, 7433 and 9626 single cells) by scATAC-seq ( Figure 1A). The library quality was evaluated based on the insertion lengths and peak signal distributions, which passed the assessment ( Figure S4A performed pseudotime analysis of scATAC-seq data using the same method as our previous scRNA-seq, described above. The sequential ordering for these myogenic subpopulations in the scATAC-seq results was highly compatible with those from the scRNA-seq data in general ( Figure 5D). Along the inferred trajectory, we observed dynamic changes in the accessibility of lineage-specific genes, forming eight clusters ( Figure S4D). Gene clusters 1-8 consist of 555, 316, 825, 526, 488, 227, 539 and 200 genes, respectively, with monotonic and non-monotonic gene activity patterns, consistent with the gene expression pattern obtained by scRNA-seq (Table S2, Figure S4D).
Next, we invoked and visualized cell type-specific peaks to identify potential drivers of the skeletal muscle transcriptome, and genes that potentially regulated by cell type-specific open regions were annotated functionally to determine biological processes associated with these elements ( Figure 5E, Figure S4E). From these GO-BP results, we found terms, such as 'skeletal muscle satellite cell commitment', 'response to muscle activity involved in regulation of muscle adaptation', 'regulation of slow-twitch skeletal muscle fibre contraction', 'positive regulation of fast-twitch skeletal muscle fibre contraction' and 'skeletal muscle fibre differentiation', were enriched in peaks from myoblasts, myocytes, Type I fibres, Type IIA fibres and Type IIX fibres, respectively, which are generally consistent with the future direction of gene transcription in myogenesis ( Figure 5E).
Because cell type-specific peaks likely harboured motifs allowing for the binding of TFs, using HOMER, we further performed motif enrichment analysis for cell type-specific peaks to predict TF sets controlling of gene expression in skeletal muscle ( Figure 5F, Figure S4F). Finally, large sets of TFs, both known and novel, were predicted for each cell type.
To further explore the relationship between regulation of the expression of some critical genes and chromatin accessibility, we successfully integrated our scRNA-seq and scATAC-seq results ( Figure 6A,B). Overall, these two modalities were validated corre-   and MUSTN1, known to be important regulatory factors acting during skeletal muscle differentiation. 40,41 It would be interesting to explore the function of these genes in future studies.
As already mentioned, single-cell molecular profiling has problems associated with static measurements; however, temporal information is essential to decipher the complex developmental processes of skeletal muscle. For inferring future cellular states, we applied an RNA velocity framework to a bovine skeletal myogenesis atlas, based on a well-defined model of the transcription process. Overall, the majority of inferred differentiation paths agreed with our current understanding of the developmental biology of skeletal muscle. In particular, the differentiation trend of myogenic subpopulations (myogenic progenitors ! myoblasts ! myocytes, and Types I ! IIA ! IIX) were shown from the inferred pseudotime trajectory and from the calculated RNA velocity. Of note, when applying the RNA velocity framework, intron retention events associated with splicing heterogeneity were not considered in our modelling.
In multicellular organisms, the formation of highly differentiated tissues is determined by an intricate network of cells with particular biological functions to a great extent, and fine-tuned communication among these different cells is critical for maintaining the homeostasis and function of tissues. Specifically, cell-to-cell interactions can be realized in autocrine or paracrine manners. 42 44 Additionally, a dense ligand-receptor network facilitating extensive communication was observed between diverse mouse heart cell types, which revealed prevalent sexual dimorphism in gene expression levels. 45  Generally, gene expression is regulated by complex interplays between TFs and cis-regulatory DNA elements, and assessing genome-wide chromatin accessibility can facilitate the identification of these functional regions. 46 Single-cell technologies offer new perspectives to study the mechanisms underlying cell properties, in which scRNA-seq provides transcriptional landscapes measuring gene expression in each cell, while scATAC-seq serves as a metric of chromatin accessibility to contribute insight into cellular transcriptional heterogeneity. 47 Here, we investigated the changes in the epigenetic landscape by performing scATAC-seq in developing bovine skeletal muscle, and integration of the scRNA-seq and scATAC-seq results further provided a comprehensive view of the cellular regulatory state.
As expected, we revealed specific TFs, such as E2F7, JUND, ZBTB18, HLF, BACH2, FOSL2 and MAFA, that might regulate cell fate transition during myogenesis. In addition, with the exception of some cellspecific TFs, many were also coexpressed among cell clusters, further supporting the previous consensus that genes in different cell subsets are expressed in a partially overlapping dynamic manner to form a complex network of interconnections during skeletal myogenesis. 48 Therefore, these findings are potentially valuable in advancing our current understanding of critical genetic regulatory networks associated with bovine skeletal muscle development, and our future work will focus on deciphering the functional importance of these candidate TFs and their target genes.

CONFLICT OF INTEREST STATEMENT
The authors declare no conflict of interest.