Transcriptional states and chromatin accessibility during bovine myoblasts proliferation and myogenic differentiation

Abstract Objectives Although major advances have been made in bovine epigenome study, the epigenetic basis for fetal skeletal muscle development still remains poorly understood. The aim is to recapitulated the time course of fetal skeletal muscle development in vitro, and explore the dynamic changes of chromatin accessibility and gene expression during bovine myoblasts proliferation and differentiation. Methods PDGFR‐ cells were isolated from bovine fetal skeletal muscle, then cultured and induced myogenic differentiation in vitro in a time‐course study (P, D0, D2,and D4). The assay for transposase‐accessible chromatin sequencing (ATAC‐seq) and RNA sequencing (RNA‐seq) were performed. Results Among the enriched transcriptional factors with high variability, we determined the effects of MAFF, ZNF384, and KLF6 in myogenesis using RNA interference (RNAi). In addition, we identified both stage‐specific genes and chromatin accessibility regions to reveal the sequential order of gene expression, transcriptional regulatory, and signal pathways involved in bovine skeletal muscle development. Further investigation integrating chromatin accessibility and transcriptome data was conducted to explore cis‐regulatory regions in line with gene expression. Moreover, we combined bovine GWAS results of growth traits with regulatory regions defined by chromatin accessibility, providing a suggestive means for a more precise annotation of genetic variants of bovine growth traits. Conclusion Overall, these findings provide valuable information for understanding the stepwise regulatory mechanisms in skeletal muscle development and conducting beef cattle genetic improvement programs.

physical and metabolic function in human, while in animal production, affecting growth efficiency and meat quality. 2 The growth of skeletal muscle fibres is initiated during embryonic stage, 3 then is fixed during the foetal stage, and remains almost unchanged in number after birth. 4 Notably, the foetal stage is the critical period of myogenesis, when the majority of skeletal muscle fibres form. 5 It has become established that many specific signalling pathways and transcription factors (TFs) are involved in myogenesis. [5][6][7][8] Wingless and Int (Wnt), paired box 3 (Pax3), and Pax7 are necessary myogenic regulatory factors (MRFs) that regulate the myogenic lineage differentiation of mesenchymal stem cells during embryo stage. 5,[9][10][11][12] Although previous studies provide strong evidence of the contribution by MRFs to skeletal muscle development, 13 15 In addition, a comprehensive functional annotation of bovine rumen epithelial cells has been well established and provided important information to explore butyrate-induced biological effects in bovine. 16 As Functional Annotation of Animal Genomes (FAANG) Consortium proposed, additional efforts have been made to investigate chromatin accessibility related to skeletal muscle development. [17][18][19] However, particularly in bovine models, previous studies in the context of skeletal muscle chromatin accessibility have mainly focused on comparative analysis across distant species to identify tissue-specific regulatory elements. [20][21][22] Recently, Cao et al. described comparative analysis of enhancers between adult and embryo bovine muscle using chromatin accessibility. 23 Despite the foetal stage is critical for skeletal muscle development, only limited evidence of foetal muscle growth and development was available from prior studies, and these findings may ignore that skeletal muscle development is a dynamic process comprised of multiple developmental stages. In our study, we reconstructed the course of myogenic differentiation in vitro, providing an ideal model to define when and how these regulatory changes occur during skeletal muscle development.

| Primary cells isolation and culture
The myoblasts were enzymatically isolated from longissimus dorsi tissues of bovine foetuses at 90 days, and cultured in low-glucose DMEM with 10% foetal bovine serum (growth medium) as described previously. 24 At 90% confluence, the cells were trypsinized with 0.25% trypsin-EDTA (Gibco, Grand Island, NY) and passaged to culture plates. At 100% confluence, the growth medium was exchanged by low-glucose DMEM with 5% horse serum (differentiation medium). The differentiation medium was then changed every 2 days.
2.2 | RNA extraction, reverse transcription and quantitative real-time PCR Total RNA was isolated using the TRIzol reagent (Invitrogen Life Technologies). RNA concentration and quality were determined by NanoPhotometer N50 (Implen, Munich, Germany). Next, the qRT-PCR was performed using ABI QuantStudio 7 Flex system (Life, Carlsbad, CA) in accordance with the instructions of KAPA SYBR ® FAST qPCR Kit (KAPABiosystems, Wilmington, MA). The primer sequences involved in qRT-PCR were listed in Table S1.

| siRNA transfection
Small interfering RNA (siRNA) were supplied by RiboBio (Guangzhou, China), and the sequences are listed in 2.5 | mRNA-seq library construction, sequencing and data processing mRNAs were purified from total RNAs using oligo-dT beads and then fragmented with Mg 2+ . RNA-seq libraries were constructed by following steps: reverse transcription, the end repair of cDNA, poly(A) tail and index addition. All libraries were sequenced on the HiSeq2500 platform following a PE150 strategy. RNA-seq datasets from each sequencing library were trimmed with Trim_Galore 0.6.3 to remove Illumina adapter sequences and low-quality sequences. Trimmed reads were then aligned to bovine genome (ARS-UCD1.2) using STAR 2.7.3a. 25 Gene expression counts were calculated by the -quantMode GeneCounts functionality in STAR. For downstream analysis, RNAseq raw counts were normalized by computing counts per million and then formed to Z scores. Differential gene expression analysis was performed by DESeq2 (adjusted p-value <0.01). 26 29 To establish a common peak set, peak summits were extended ±250 bps to obtain 500 bp fixed-width peaks. 30 Overlapping peaks in each sample were handled using an iterative removal procedure with the strongest signal (Àlog 10 FDR) until all peaks were accounted. FeatureCounts 31 was used to quantify the raw read counts from the common peak set including all replicates. For downstream analysis, ATAC-seq raw counts were normalized by computing counts per million and then formed to Z scores. Differential chromatin accessibility analysis was performed by DESeq2 (adjusted p-value < 0.01 and jlog2FCj > 2). Peak annotation was performed by R package ChIPseeker. 32  open chromatin peaks (ranked by adjusted p-value) were used for principal component analysis (PCA) and Spearman correlation analysis using prcomp_irlba and Complexheatmap. 33 The coverage tracks were generated by deeptools 3.3.0 bamCoverage. 34 Visualization of ATAC-seq and RNA-seq coverage track was performed by IGV 2.6.2. 35 Each read was extended by 250 bp and the genome-wide signal was generated by the normalized read counts (CPM) per bin. TF deviation and variability analysis were performed via the motifmatchr package, and CisBP transcription factors from the 'human_pwms_v2' dataset. 36 Transcription factor activity scoring was performed using the 501-bp fixed-width peak set, which combined replicates for each differentiated time point.

| Identification of stage-specific genes and stage-specific peaks
We used a Shannon-entropy-based to identify differentiation stagespecific genes/peaks as previously described. 37,38 We first selected those with entropy scores less than a predefined threshold (stagespecific genes: 1.6; stage-specific peaks: 1.8) as candidates. Then the ATAC-seq peak was highest active in this stage (normalized CPM > 1), and its high activity (normalized CPM > 0) could not be observed at more than two additional stages. The enrichment of known motifs for stage-specific peaks was detected by findMotifsGenome.pl function of the HOMER package. 39 After filtering out motifs of TFs which were not expressed in all stages (MAX (CPM) < 1), the top enriched known motifs were reported.

| Functional annotation enrichment analysis
Gene Ontology (GO) enrichment analysis for the tested genes was performed by DAVID V 6.8. 40,41 KEGG pathway enrichment was conducted by KOBAS 3.0. 42

| Integration of ATAC-seq and RNA-seq
Based on genomic region, consensus peaks were classified into six groups: Promoter (TSS ±2.5 kb), 5 0 UTR, 3 0 UTR, downstream, exon, intergenic and intron. Spearman's correlation coefficients were calculated between gene expression levels (normalized CPM) and chromatin accessibility levels (normalized CPM) of each genomic region group. We further defined genes/peaks based on the previous definition 43 : Genes defined as HA/HE that maximum value of ATACseq/RNA-seq reads was higher than the 70th percentile; Genes/ peaks were defined as MA/ME that maximum value of ATAC-seq/ RNA-seq reads was below the 50th percentile. Each gene associated with promoter-region accessibility was assigned to four groups: HA-HE, high accessibility/high expression; MA-ME, medium-low accessibility/medium-low expression; HA-ME, high accessibility/mediumlow expression; and MA-HE, medium-low accessibility/high expression.

| Enrichment analysis based on empirical sampling
Overall, 3804 human housekeeping genes were converted successfully to 3256 bovine orthologous housekeeping genes using the BioMart tool. 44  The annotation of GWAS Summary Statistics was lifted over from  Figure 1C).

| Genome-wide association analyses and GWAS enrichment analysis
To predict genomic features of functional regulatory elements, annotation of accessible regions was performed ( Figure 1D). As expected, the majority of consensus peaks identified in the peak set were assigned to distal intergenic (63.56%), intronic (21.21%) and promoter (10.30%) regions throughout the genome, consistent with previously reported profiles of chromatin accessibility in cattle. 20 We

| Stage-specific open chromatin reveals distinct TF binding events during bovine myoblasts proliferation and myogenic differentiation
As lineage commitment and myogenic differentiation relies on the activity of specific transcriptional programs, we asked whether the stage-specific ATAC-seq peaks harbour specific TFs regulating skeletal muscle development. A total of 3424, 731, 343 and 3022 stagespecific peaks were identified at P, D0, D2 and D4 time points, respectively ( Figure 5A). Interestingly, we observed a progressive increase in the number of stage-specific peaks at P and D4 time points and the majority of stage-specific peaks fell into enhancer regions (distal intergenic and intronic regions; Figure S4). The enriched GO terms were involved in a variety of biological functions, while only a small part of GO terms was directly related with skeletal muscle growth and development ( Figure S5). The enriched pathways were Rap1, cAMP, Wnt, Hippo and MAPK signalling pathways, which were associated with muscle growth and development ( Figure 5B). Next, we investigated the characters of TF binding in stage-specific peaks.
Notably, we revealed a series of TFs that showed specific enrichment at different time points ( Figure 5C). We observed specific enrichment motif of the ATF3, AP-1 and its family members (FRA-1, FRA-2, JUNB) at P time point, which is in accordance with the observation that AP-1 complex is correlated with cell proliferation and transformation. 61,62 We also found that high motif enrichment of MyoG, Myf5, HLH-1 (Spearman's R <0.14; Figure 6A). To enhance our understanding of the positive relationship, we further divided genes into groups according to their promoter accessibility and expression levels ( Figure 6B). As in previous reports, genes with both high expression level and accessibility at promoter region were enriched for housekeeping-like biological functions. 43,70 Therefore, we verified whether this regulatory pattern was also present during bovine skeletal muscle development. In general, we found a significantly higher proportion (proportion = 34.42%, p-value <10 À4 ) of HA-HE genes were bovine orthologous housekeeping genes ( Figure 6C,D). In summary, our results indicated a potential relationship between chromatin accessibility and gene expression, which is paramount to understanding the complexities of gene regulatory during bovine skeletal muscle development.

| Cattle growth-related genetic variants in the context of chromatin dynamics
To dissect genetic aspects of skeletal muscle development, we further investigated whether the open chromatin regions dynamics were associated with muscle-related traits. In this study, we first investigated the association between our data and 163 stature-associated lead SNPs identified by Bouwman et al. 71 ( Figure 7A). Although very few of them were located in accessible chromatin peaks, the majority of these lead variants were located within 3 kb of OCRs ( Figure 7B).
There were 11.5% of lead variants located in OCRs, while 70.55% of lead variants were located within 3 kb. Since there is a reasonable chance that the lead SNP is not the causal variant, we also examined the fractions of OCRs that overlapped with QTL confidence regions and found that a great number (90.37%) of QTL confidence regions were overlapped with at least one OCRs ( Figure 7C). A similar pattern was observed in context from cattle QTLdb. About 56.92% and 54.09% of known QTL regions and associated SNP located within 3 kb of OCRs for anatomy and growth traits, respectively ( Figure 7D,  Figure 7E, Table S6). Notably, we found the joint SNP effects in OCRs were lower than those in other groups (Table S6).
Given that the majority of genetic variants fell into non-coding regions, our data therefore may be suitable for the functional interpretation of GWAS results. myotubes. [89][90][91][92][93][94] Alternatively, these stage-specific genes may contribute to bovine myoblasts proliferation and myogenic differentiation by mediating these biological processes. Overall, these results further supported our prediction on the functions of stage-specific genes, suggesting the reliability of our data analysis. As for nearby genes of stage-specific peaks, we found significant decreases in the number of GO terms related to myogenesis, although muscle growth and development such as Rap1, cAMP, Wnt, Hippo and MAPK signalling pathways were identified. Indeed, the investigation on gene regulatory mechanisms regulated by stage-specific peaks in our study may be limited on the assumption that genes are regulated by proximal regulatory elements. It must be noted that this assumption is not always correct. Additionally, a series of TF binding events occurred as expected during bovine myoblasts proliferation and myogenic differentiation, including ATF3, AP-1 family members, MyoG, Myf5, HLH-1, ZBTB18, TCF4 and TEAD family. We propose that these TFs may facilitate our understanding of the sequential order of gene regulatory during skeletal muscle growth and development.
Previous studies have pointed to a strong relationship between gene expression and chromatin accessibility at promoter region. 43,95 However, given the genetic basis of complex cis-regulatory mechanisms, the relationship between chromatin accessibility and gene expression remains functionally uncharacterized at present. In this work, chromatin landscape was classified into six groups based on their genomic regions, and relationships were calculated between chromatin accessibility and gene expression in each group. We only observed a strong positive relationship between chromatin accessibility and their nearby gene expression in promoter group, which were consistent with the previous study. 43 Transcriptional regulation is governed not only by proximal cis-regulatory elements (promoter), but also by distal cis-regulatory elements (enhancer) that often located far away from their target gene. 96 While such a strong positive relationship was absent in intergenic group, this is partly due to a remarkably large amount of intergenic regions widely distributed across the genome and the inaccurate method of nearby genes annotation. Furthermore, we provided evidence showing that at least one-third of the genes (high expression and high accessibility level at promoter region) were correlated with housekeeping genes. Considering the potential significance of the residual HA/HE genes, we suppose that this regulatory model is an essential step towards completing our understanding of the regulatory mechanism of skeletal muscle development.
Genome-wide association studies (GWAS) have proved to be an effective and promising approach for the identification of genetic variants associated with complex traits. 97 However, most of these genetic variants are located in non-coding regions and a push for annotation of cis-regulatory elements by epigenomic information have contributed to elucidate the functional relevance of non-coding genetic variants to some extent. 98 In this study, integration of chromatin landscape profiles with GWAS signals demonstrated that the GWAS signals of bovine growth traits were significantly enriched in chromatin accessibility of skeletal muscle development, which elucidated the activity of genetic variants in skeletal muscle development-related cis-regulatory elements.
Further studies are warranted to prioritize variant SNPs and explain how these variant SNPs change TFs regulatory and genes expression.

| CONCLUSION
Overall, our study describes the dynamic changes of chromatin accessibility and gene expression during bovine myoblasts proliferation and differentiation in vitro, which is a good model to reconstruct the process of skeletal muscle development. In addition, we identified a series of stage-specific genes and TFs to enhance our understanding of the sequential regulation of skeletal muscle development. Integrations of chromatin accessibility with transcriptional expression profiles and GWAS signals provide an opportunity to explore the regulatory role of these cis-regulatory elements defined by chromatin accessibility.
Taken together, our study demonstrates a step-wise dissection of the transcriptional regulation network for skeletal muscle development and provide a systematic understanding of the molecular circuits governing skeletal muscle development.

CONFLICT OF INTEREST
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Gao contributed to data analysis and manuscript preparation.

ETHICS STATEMENT
All experiments were conducted in concordance with the guidelines established by the Regulations for the Administration of Affairs Con-