Epigenetic regulation of gene expression in Chinese Hamster Ovary cells in response to the changing environment of a batch culture

Abstract The existence of dynamic cellular phenotypes in changing environmental conditions is of major interest for cell biologists who aim to understand the mechanism and sequence of regulation of gene expression. In the context of therapeutic protein production by Chinese Hamster Ovary (CHO) cells, a detailed temporal understanding of cell‐line behavior and control is necessary to achieve a more predictable and reliable process performance. Of particular interest are data on dynamic, temporally resolved transcriptional regulation of genes in response to altered substrate availability and culture conditions. In this study, the gene transcription dynamics throughout a 9‐day batch culture of CHO cells was examined by analyzing histone modifications and gene expression profiles in regular 12‐ and 24‐hr intervals, respectively. Three levels of regulation were observed: (a) the presence or absence of DNA methylation in the promoter region provides an ON/OFF switch; (b) a temporally resolved correlation is observed between the presence of active transcription‐ and promoter‐specific histone marks and the expression level of the respective genes; and (c) a major mechanism of gene regulation is identified by interaction of coding genes with long non‐coding RNA (lncRNA), as observed in the regulation of the expression level of both neighboring coding/lnc gene pairs and of gene pairs where the lncRNA is able to form RNA–DNA–DNA triplexes. Such triplex‐forming regions were predominantly found in the promoter or enhancer region of the targeted coding gene. Significantly, the coding genes with the highest degree of variation in expression during the batch culture are characterized by a larger number of possible triplex‐forming interactions with differentially expressed lncRNAs. This indicates a specific role of lncRNA‐triplexes in enabling rapid and large changes in transcription. A more comprehensive understanding of these regulatory mechanisms will provide an opportunity for new tools to control cellular behavior and to engineer enhanced phenotypes.


| INTRODUCTION
Epigenetic modifications and regulations have attracted significant interest as researchers aim to identify the mechanisms that control gene expression. So far, most studies have focused on understanding ON/OFF mechanisms such as the complete silencing of one X-chromosome (Brockdorff, 2011), the turning of developmental switches in embryogenesis (Shi & Wu, 2009), or cell differentiation along specific tissue lineages (Du et al., 2015). Less attention has been given to short-term regulation of gene expression in response to environmental conditions where the majority of genes are not turned on or off, but regulated differentially to altered expression levels that enable cells to handle new conditions (Kuang et al., 2014;López-Maury, Marguerat, & Bähler, 2008). Such mechanisms are likely to be different from ON/OFF mechanisms, but still interesting, specifically in the context of biopharmaceutical production processes where the physiological state of cells and their precise gene expression pattern have implications on bioprocess performance and product quality of biopharmaceutical proteins (Hsu et al., 2017;Stolfa et al., 2017;Yusufi et al., 2017).
To align gene expression with developmental or physiological needs, epigenetic regulators play a central role through short-term (histone modifications) or long-term (DNA methylation) modifications that bring about conformational changes in chromatin, thus activating or repressing transcription (G. Li & Reinberg, 2011;Saksouk, Simboeck, & Déjardin, 2015). DNA methylation, based on the conversion of cytosine to 5-methylcytosine, tends to inhibit the binding of transcription factors or recruit repressor proteins at the methylated promoter region (Moore, Le, & Fan, 2013). Similarly, histone modifications, including acetylation, methylation, phosphorylation, citrullination, ubiquitination, SUMOylation, and ADP ribosylation of the histone tails at specific sites, contol gene transcription by modifying chromatin accessibility.  (Dykes & Emanueli, 2017;Peschansky & Wahlestedt, 2014;Xu et al., 2017), and act as signals, decoys, guides, and scaffolds for chromatin modifiers (Marchese & Huarte, 2014;Wang & Chang, 2011). Together, these complex transcriptional dynamics result in defined patterns of gene expression and proteomic and metabolite profiles that determine the phenotype and cell survival. For instance, Kuang et al. (2014) highlight the precise temporal control of ribosome biogenesis ensuring the best utility of resources for an energetically demanding process by just-in-time supply within different phases of the yeast metabolic cycle.
Chinese Hamster Ovary (CHO) cells have been known as workhorses for the industrial production of recombinant therapeutic proteins since 1987 (Dorner, Bole, & Kaufman, 1987). Variations in cellular environment and phenotypes can bring about significant changes in cell behavior and productivity of producer cell lines (Pilbrough, Munro, & Gray, 2009). However, very little is known about the control mechanisms that enable rapid changes in response to environmental conditions and most transcriptome studies so far have been comparative, looking at the difference between two states or defined cell samples, such as high versus low producing cell lines.
In our previous report on genomic and epigenetic variation in CHO cells (Feichtinger et al., 2016), the overall DNA methylation pattern of CHO cells was shown to change upon adaptation to different culture conditions, whereas it remained remarkably constant over months when the cells were maintained in the same medium. Shortterm changes in DNA methylation, as observed between exponential and stationary phase in CHO, were primarily found in regulatory regions such as enhancers. In addition, the first report of chromatin states, as defined by combinations of histone marks, was presented including a temporal pattern and its changes during a batch culture, however, without association to gene expression patterns. However, to achieve exquisite control over gene expression in bioprocessing, an in-depth understanding of the mechanisms that regulate gene expression over time is indispensable. Therefore, we here follow up the previous report with the missing data on the transcriptome and its changes during the batch culture, with a particular focus on the correlation between gene expression and regulatory mechanisms.
The resulting resource opens up possibilities both for enhanced control of cellular phenotypes during bioprocessing as well as the development of new engineering tools to manipulate cell behavior.

| Sample preparation and sequencing
CHO-K1 cells were thawed Feichtinger et al., 2016) and, after 2 weeks of recovery, seeded into eight parallel shaker flasks at 2 × 10 5 cells/ml, in working volumes of 250 ml. The total and viable cell count was analyzed twice daily with a ViCell (Beckmann Coulter). Samples for ChIP-seq were taken every 12 hr (1 × 10 7 viable cells for cell fixation and cell lysis; 1 × 10 6 cells for magnetic immunoprecipitation), for RNA-seq every 24 hr (1 × 10 6 cells into 1 ml Trizol), and for whole-genome bisulfite analysis at mid-exponential and mid-stationary phase (5 × 10 6 cells for DNA isolation; Supporting Information Figure 1). For RNA extraction, the cells were centrifuged and lysed using TRI reagent (T9424-200 ml; Sigma-Aldrich) following the manufacturer's instructions: phase separation was done by addition of chloroform and the aqueous phase collected. After precipitation with 2-propanol and washing with 70% Ethanol, pellets were air-dried and resuspended in nuclease-free water. Libraries for RNA-seq were prepared using NEBNext Ultra Directional RNA library prep Kits (E7420L), starting from total RNA according to the instructions and analyzed by Illumina HiSeq 2000 PE100 (pair-end; 100 bp read length). Libraries for ChIP-seq and bisulfite sequencing were prepared as described (Feichtinger et al., 2016).

| RNA-seq mapping and normalization
RNA-seq reads were aligned with the GEMTools RNA-seq pipeline v1.7 (Marco-Sola, Sammeth, Guigó, & Ribeca, 2012) in three phases: mapping against the Chinese hamster genome published by Brinkrolf et al. (2013), against a reference transcriptome and a de novo transcriptome, generated from the input data to detect new junction sites. After mapping, all alignments were filtered for a minimum intron length of 20 bp, a maximum exon overlap of 5 bp, and a check against a reference annotation for consistent pairs and junctions, where both sites align to the same annotated gene. Mapping statistics and expression quantification were calculated by GEMTools "gtfcount," and expressed genes were identified based on fragments/counts per million mapped fragments (FPM/CPM), filtering for rowSums >1 using the DESeq.2 R package bioconductor-3.4.1 (Love, Huber, & Anders, 2014

| Differential gene expression
The design formula for the samples was created based on the PC analysis that separated samples from different TP corresponding to the gene expression values (Supporting Information Figure 2) considering TP1, TP3, TP5, and TP7 (17-90 hr) as the exponential phase; TP9 and TP11 (114-138 hr) as the stationary phase; and TP13, TP15, and TP17 (162-210 hr) as the decline phase. Phase-wise comparison was done between exponential and stationary (ES) and exponential and decline (ED) phases. Differentially expressed genes were extracted based on DESeq.2 normalized read counts by the Benjamini-Hochberg method to adjust p values with a threshold of 0.01 and an absolute value of log2 fold change >1.
Gene set enrichment analysis (GSEA) allows computation of statistical significance of predefined gene sets that share common biological function, based on a ranked list of differentially expressed genes observed while comparing two distinct states or phenotypes.
This was performed with the GSEA software (v2.2.4) (Subramanian et al., 2005) based on DESeq.2 stat (Wald statistic) prerank gene-list (Mootha et al., 2003) for phase-wise comparison analysis. Negative phenotype corresponds to exponential phase, positive phenotype to the stationary phase in the ES comparison and the decline phase in the ED comparison. The differential expression (DE) analysis was done separately for coding and noncoding transcribed regions. All coding genes reported as DE from both phase-wise comparisons were selected for Fuzzy clustering and Gene Ontology (GO) enrichment analysis. Gene expression matrix was normalized with the variance stabilizing transformation (VST) followed by standardization by a gene with z-score normalization (x−mean/standard deviation) using the R clusterSim package v0.45-2 (M. E. Futschik & Kumar, 2017;Lemay et al., 2013). Clustering was performed with the Mfuzz Bioconductor package v2.36.0 (Matthias Futschik, 2017) to report four clusters (Supporting Information Figure 3). GO enrichment analysis was done with topGO R package v2.24.0 (Adrian Alexa, 2017) for genes with membership higher than 0.5 in any of the clusters.

| DNA methylation around transcription start site
The whole genome bisulfite analysis data published in our previous report were utilized for this analysis (Feichtinger et al., 2016). The mean of methylation percentage per CpG was plotted against its distance from the transcription start site (TSS), filtering CpGs with less than 10 reads. DNA methylation upstream and downstream of TSS were assessed for expressed and non-expressed genes, and for expressed genes containing active promoter states (states 9 and 10).
For coding genes, 3 kb upstream and downstream of TSS was analyzed.
For noncoding RNAs, considering their shorter length, this was reduced to 1.5 kb to avoid noise (Supporting Information Figure 4).
The average CpG methylation per position was calculated and fitted by LOESS smoothing using the default span value of 0.75 with the stats package from R(v3.3.1).

| Chromatin state enrichment
As published in Feichtinger et al. (2016), the presence and combination of 6 histone modification marks (H3K4me1, H3K4me3, H3K9me3, H3K27me3, H3K36me3, and H3K27ac) can be used to define a chromatin state model (Ernst & Kellis, 2012)  on segmentation with the published 11 states model, the enrichment of each state was computed for a set of external coordinates: differentially methylated 1 Kb regions between exponential and stationary phase within batch culture, the genomic region between TSS and transcription end site (TES) for expressed, and non-expressed coding genes as well as non-coding RNAs.

| Temporal association of gene expression with chromatin marks
Genes with active transcription mark (State 4) within their gene body were identified, and the corresponding coordinates were intersected for the presence of H3K36me3 peak coordinates as identified from the MACS2 peak caller (Zhang et al., 2008). Genes containing State 4 and an H3K36me3 peak within the gene body are annotated as H3K36me-E4. Similarly, the combination of H3K4me3 peaks with states 9 or 10 (H3K4me3-E9,10) and H3K27ac peaks (H3K27c-E9,10) was identified around the TSS+/− 500 bp. Considering many genes of small length, a 500 bp flanking region was used to ensure capturing a pattern without interference from the neighboring genes.
Changes in expression levels (z-score of VST normalized values) with all active transcription marks (z-score of CPM values) calculated with the DiffBind R package (Stark & Brown, 2017) were plotted in a heatmap using ggplot2 (Wickham, 2009

| Interaction analysis for expressed lncRNA and coding genes
Sequences for all expressed lncRNA and DNA sequences of all coding genes plus 1.5 kb upstream and downstream of the gene body were extracted from the genome using samtools version-1.3.1 (H. Li et al., 2009). Triplex-forming oligos in lncRNA transcripts and the corresponding triplex target sites (TTSs) within and around coding genes were identified using triplexator 1.3.2 (Buske, Bauer, Mattick, & Bailey, 2012). Interactions were filtered for the presence of purine motifs with a minimum triplex length of 20 and minimum G proportion of 50%. An error rate of up to 20% and only two consecutive errors without low complexity filtering were allowed (Buske et al., 2012). The output was parsed to extract TTSs coordinates and unique pairs of interacting lncRNA and coding genes.

| Transcriptome response of CHO cells during batch culture
Batch culture is a perfect example for changing conditions, with significant environmental variation and altered media composition during different growth phases (Young, 2013). To address the phenotype-relevant changes in gene expression, a high-resolution temporal profile of global gene transcription for CHO-K1 cells was analyzed by RNA-seq every 24 hr over 9 days (Supporting Information Figure 1) and related to previously published changes in the chromatin state and the global DNA methylation pattern (Feichtinger et al., 2016).

| DE during growth phases
To  between exponential and stationary phase (ES) and 1,381 between exponential and decline (ED) phase (Supporting Information Table 5).
In total, 1,397 unique coding genes show DE between growth phases, comparable with a previous study (Bort et al., 2012). Gene expression during exponential growth remains surprisingly stable, despite the already changing environment. During transition into stationary and decline phase, however, the pattern begins to change.
Using GSEA with default parameters (FDR < 0.25) (GSEA/MSigDB Team, 2018), 224 gene sets were identified to be enriched in exponential phase relative to stationary phase (ES; Supporting Information gene sets showed enrichment in exponential phase and 95 gene sets in the decline phase (Supporting Information Table 6b). Several gene sets enriched in exponential phase are related to DNA damage and genomic instability, which are known to be highly prevalent in rapidly growing cells, such as CHO. The results suggest that diverse mechanisms for DNA repair and stress response decrease in the decline phase, possibly leading to a higher rate of genome damage (Bort et al., 2012). For instance, pathways related to cell cycle checkpoints appear in the top 20 significant pathways enriched in ED.
Interestingly, lipid metabolism is a major response factor during the decline phase, indicating the cells' need to activate energy resources.   Table 7).

| Gene expression clusters
The biological role of each cluster of genes was determined by GO enrichment (Supporting Information Table 8). Cluster 1 is gradually decreasing in expression from exponential to decline phase, with 706 coding genes. As expected, the majority of these are related to mitotic cell cycle, chromatin organization, DNA damage/repair, and RNA biogenesis, all major prerequisites for growth and proliferation.
Clusters 2 (188 genes), 3 (242 genes), and 4 (261 genes) increase in expression levels from exponential to decline phase, in different patterns. GO annotation confirms the result from GSEA as the majority of these upregulated pathways were related to lipid metabolism, cell homeostasis, cell motility, and extracellular matrix organization (Supporting Information Table 9). Each cluster was validated for their temporal expression profile by qRT-PCR of a selected number of genes (Supporting Information Figure 5, Supporting Information Table 10).

| Mechanism of regulation of coding genes in response to culture conditions
To understand the mechanism underlying this response in the gene expression pattern, changes in the major epigenetic regulators including DNA methylation and histone modification (Kundaje et al., 2015) were interrogated.  Figure 6b).

| DNA methylation
Importantly, modifications of the global DNA methylation pattern between exponential and stationary phase were observed mostly in genomic regions with regulatory chromatin states, rather than promoter marks (Feichtinger et al., 2016), indicating that while DNA methylation in promoters marks gene expression as "ON" or "OFF," it is not the rapid response mechanism for fine-tuned control of expression level such as is required for short-term response during a batch culture.

| Chromatin modifications
Many studies confirm that alterations in histone modifications lead to changes in chromatin conformation that control gene expression as and when required (Kuang et al., 2014;López-Maury et al., 2008).
The PC analysis of histone modifications revealed their continuous adaptation (Feichtinger et al., 2016), even during exponential phase where gene expression patterns are very uniform (Figure 1). The 11 chromatin states computed from 6 histone marks (Feichtinger et al., 2016) allow us to identify the function of genomic regions of concern.
Various categories of genomic regions with different features were

| Long noncoding transcripts and their potential function in rapid response
As per the deduced annotation from the transcript assembly using the RNA-Seq data, 74% of the annotated Chinese Hamster genes encode for noncoding RNAs, 62% of these are lncRNAs or processed transcripts. Around three times more noncoding transcribed genes (42,177) in comparison to protein-coding ones were found to be expressed (rowSums >1). Such a high number suggests an important role in regulation. However, overall expression levels of noncoding RNAs were found to be much lower than that of protein-coding genes (Supporting Information Figure 9). All expressed lncRNA genes were checked for homology against the 279 lncRNAs reported with known function in lncRNAdb, using default parameters from BLAST. We   (Figure 3a; Supporting Information Table 12). As with the coding genes, the genomic region around TSS was found to be demethylated for expressed and methylated for silenced genes (Supporting Information Figure 6)

| Interacting pairs of DE lncRNA and distant coding genes
While many studies report co-expression and localization in promoter regions as the mechanism for cis-acting lncRNAs, a transmode of action has long been known, but its mechanism is not yet  Table 15). As expected, it was observed that interaction sites were mostly localized within regulatory regions, especially promoter regions, rather than the highly prevalent quiescent regions or actively transcribed regions marking the gene body ( Figure 6). The wide range shown by enrichment within repressor marks at different TP could be due to the fact that genome-wide localization of these states was found in very short spans of just 200 nucleotides in most cases, which may have led to noisy results. On the other hand, the detection of triplex-target sites within such short spans is indicative for the bias of enrichment within regulatory regions and confirms the report from Mondal et al. (2015) that describes the targeting mechanism of repressive chromatin associated lncRNAs.

| Evaluation of lncRNA gene targets
The well-characterized lncRNAs-MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) and NEAT1 (nuclear enriched abundant transcript 1) were selected to evaluate the gene targets reported in our interaction list reporting triplex forming lncRNAcoding gene pairs. MALAT1 has been reported to regulate gene sets associated with cellular proliferation, localization, apoptosis, and metabolic processes and thereby plays an important role in tumorigenesis . It is localized to serine and argininerich splicing factors (nuclear speckles). NEAT1 forms paraspeckles with its loci adjacent to MALAT1. Similar to MALAT1, NEAT1 is also reported to be a transcriptional regulator of various genes involved in cancer progression.
Homologues of MALAT1 (cgriseus1ncB038456) and NEAT1 (cgriseus1ncB038466) were highly upregulated in the later culture phases as compared with the early exponential phase. The coding gene pairs were extracted for these lncRNAs from our interaction list, and KEGG pathway enrichment was performed individually for coding genes corresponding to MALAT1 and NEAT1 homologues (Supporting Information Table 16 Further studies to obtain a more detailed understanding of how these regulatory mechanisms determine process behavior of cells and their ability to adapt to a variety of culture conditions will also increase our ability to control and manipulate gene expression towards more reliable process performance and outcome. As an example, the pathway analysis over time of genes that are DE during stationary and decline phase indicates the struggle of cells to maintain homeostasis. These results might be used to understand the changes in product quality or productivity during late production process stages and indicate alleviatory feeding strategies, to ensure proper processing of the product. For instance, in the context of the cells, it makes sense to mobilize lipids under nutrient limitation and to initiate degradation of dispensable cellular components (as observed in the upregulation of lysosome and lipid metabolism), however, for protein production, lipids are essential as they are required for the generation of organelles, such as the endoplasmic reticulum and the Golgi, which are known to be bottlenecks of secretion. Likewise, the fact that "galactose metabolism" pops up as DE in the decline phase is critical as galactose is an important sugar required for proper glycoprocessing and thus the quality of the product.
For cell engineering approaches, the observed rapid response mechanism and control over gene expression levels exerted by lncRNAs also opens up the opportunity for completely new, so far unused tools for manipulating gene expression. Similar to microRNA (miRNA) engineering approaches, where the aim was to target the translation of multiple genes without burdening the overall protein production capacity of a cell, lncRNAs are no burden on the translational machinery and could be used to control gene transcription of target genes rather than their translation, thus intervening at an even earlier stage. While miRNAs can reduce translation of their target, engineering by lncRNAs could also be used to enhance transcription of individual genes, such as the product gene, an approach that has already been shown to work at the level of mRNA translation using lncRNAs (Takahashi et al., 2018).
Speculatively, and excitingly, one could reach a level of understanding that allows control of entire phenotypes-which are determined by transcriptome patterns-by targeting multiple genes in a pathway without interfering with their genomic sequence or context by introducing lncRNAs designed to manipulate their expression patterns.
This would be of interest particularly for controlling the precise expression level of a gene rather than turning it off or overexpressing it to a high level.
In conclusion, our report presents plausible mechanisms of regulation of gene expression in cells, from an ON/OFF switch, to mechanisms controlling constitutive gene expression, to such that determine changes in the level of gene expression in response to altered nutrient supply and waste accumulation as observed during a batch culture. A more detailed understanding of these mechanisms and cellular response to culture conditions will enable enhanced process control for bioproduction and innovative approaches for cell engineering and optimization.