Subcloning induces changes in the DNA‐methylation pattern of outgrowing Chinese hamster ovary cell colonies

Chinese hamster ovary (CHO) cells are the most extensively used mammalian production system for biologics intended for use in humans. A critical step in the establishment of production cell lines is single cell cloning, with the objective of achieving high productivity and product quality. Despite general use, knowledge of the effects of this process is limited. Importantly, single cell cloned cells display a wide array of observed phenotypes, which so far was attributed to the instability and variability of the CHO genome. In this study we present data indicating that the emergence of diverse phenotypes during single cell cloning is associated with changes in DNA methylation patterns and transcriptomes that occur during the subcloning process. The DNA methylation pattern of each analyzed subclone, randomly picked from all outgrowing clones of the experiment, had unique changes preferentially found in regulatory regions of the genome such as enhancers, and de‐enriched in actively transcribed sequences (not including the respective promoters), indicating that these changes resulted in adaptations of the relative gene expression pattern. The transcriptome of each subclone also had a significant number of individual changes. These results indicate that epigenetic regulation is a hidden, but important player in cell line development with a major role in the establishment of high performing clones with improved characteristics for bioprocessing.

quality; (II) grow reasonably fast and to high cell densities; and (III) were generated according to the guidelines of the regulatory agencies. A prerequisite there is the subcloning of cells with subsequent screening for further cell line development. This is done to remove low producers that may have a growth advantage over high producers and thus lead to loss in culture productivity over time, and to ensure consistent product quality, a result of the known variability and genomic instability of in-vitro cultivated cells. [13,42,54,60,63] While most steps in cell line development have been optimized, streamlined, and automated by the industry, leading to significant reduction in timelines, the time required for subcloning is an unsurmountable limiting factor due to the time single cells need to grow into a colony of sufficient size for further expansion. [13] Subcloning can be achieved via limited dilution, which is essentially based on the statistical likelihood that due to the dilution used, there will be one cell per well. Other tools are available, such as fluorescent activated cell sorting, to enrich for high producers [35,51] and further assure clonality. With the advancement in automation and microfluidic systems, various instruments for subcloning, such as cell printing and light-induced sorting, as well as instruments for additional image verification of clonality have been described. [15,30,39,43,46,65] Additionally, media additives to promote colony outgrowth were developed. [31,64] These tools aim to ensure either the isolation of a verified single cell or the enrichment of rare high producers. However, insufficient scientific attention has been paid to the underlying reason to perform subcloning at all, which is the inherently high variation in subclone phenotypes even when derived from a parent population that had already been subcloned. [7,42] In the pre-omics era, a common dictum in the field was that the high genomic variation in CHO cells (or any fast growing cell line, for that matter) is the source of this variation and that during subcloning one simply selects for individual cells with specific mutations that, thus, provide individual phenotypes to each subclone. [5,9,12] However, a recent study that investigated the oxidative phosphorylation and energy metabolism in a variety of sequenced CHO cells showed that cells with specific phenotypes are characterized by a mixture of mutations in relevant genes and changes in their expression patterns. [10] While it is likely that changes in expression levels are also caused by sequence mutations in the relevant regulatory regions for a given gene, an additional and as likely explanation is that many differences in phenotype across different cell lines, lineages and subclones are also controlled by changes in the epigenetic code of cells. This was indeed shown to be the case after adaptation of cell lines to new culture conditions and altered medium composition, as well as for a cell line selected specifically for higher transient productivity. [11] The epigenome is a network of various mechanisms that regulate gene expression without changing the underlying nucleotide sequence. [14] The most prominent mechanisms are chromatin accessibility as controlled by the methylation of cytosines in the DNA (also known as methylome) and diverse modifications of the N-terminal tails of histones. The presence and/or combination of such modifications are able to control whether or not a gene is expressed and to what level of transcription. [4] As a specific phenotype is not likely to be determined by a single mutation or the change in expression of a single gene, the combination of genetic and epigenetic variation actually provides cells with a wide space of possible transcriptomes and resulting phenotypes across the 10-15,000 expressed coding and the even more non-coding genes. [19,38,56,57] This wide space of variation in transcript expression levels would explain the known adaptability of CHO cells and the success of process and cell line development that the biopharmaceutical industry has seen over the last 30 years.
The two mechanisms of epigenetic modifications addressed above have different kinetics and thus serve different functionalities in cells.
While histone modifications enable rapid responses to culture environment and the maintenance of stable as well as differentially regulated gene expression, [19] DNA-methylation is a more long term mechanism where the information "learned" during adaptation can be passed on to daughter cells. [1,11,33] In the present study, we therefore focused on DNA-methylation as the "inheritable" epigenetic mark and aimed to understand its role in relation to the behavior and the emergence of specific phenotypes of individual subclones and their resulting diversity. Therefore, an already established subclone producing an Epo-Fc fusion protein was single cell sorted, the first 36 outgrowing subclones expanded and their batch behavior tested twice over 8 weeks. Six subclones that represented the full range of phenotypes and stability behavior were chosen for detailed characterization of their transcriptome and methylome in comparison to the parental cell line.

Stability study
The design of the study is presented in Figure 1a. In short, cells were kept in culture, as described above, for 53 days. The first batch evaluation was done one passage after the transfer to TPP TubeSpin bioreactors. A second batch was performed at the end of the 53 days.
During that time, 12 cultures were lost, resulting in 36 clones that could be used for further investigation. In parallel, three replicates of freshly thawed CHO DUKX B11, passage 6 post thawing, were used to perform the same stability study, to determine parental phenotype changes.

Batch cultures
Cell lines were seeded at 0.2 . 10 6 cells mL -1 in 20 mL TPP TubeSpin bioreactors and incubated as described above. Viable cell concentra-

DNA methylation analysis
The gDNA of six selected clones (first batch, D2) and three replicates of  Table 2.
Raw reads were processed using Trim-galore 0.6.0, [34] with a quality cut-off of 28 and trimming at the 5′ end of 5 bp of read 1 and 15 bp of read 2. Additionally, 5′ bp were removed at the 3′ end on both reads. Processed reads were aligned paired-end mode to the Chinese hamster genome [48] using the Bismark v0.22.1 pipeline (non-default parameters: N = 1, score_min = L,0,-0.6). [26] Moreover, Bismark was used to remove duplicate reads and to generate methylation profiles with default settings. Qualimap v.2.2.2 was used to check raw read quality [40] and sequencing depth ( Table 2). Due to the lack of replicates differential analysis was performed using DSS-single. [62] DSS setting comprised a smoothing span of 500 bp with a minimum DMR length of 50 bp with ≥4 CpGs and p < 0.05 (Wald test). DMR intersection upset plot was generated using the R-package ComplexHeatmap. [16] Raw methylation data was acquired using the R-package bsseq. [18] .
Loci were filtered for PCA and methylation plotting for a coverage of >0 in all samples.
ac.at/ using the line "TP_3(42hr)". DMRs were assigned to the chromatin state using the intersect command of the R package GenomicRanges. [29] Enrichment was determined using the following ratio: where l chromDMR is the length of intersections of each reported chromatin states with identified DMRs, l DMR is the length of all DMRs identified, l chromGenome the length of each reported chromatin state identified in the reference genome and l Genome the length of the whole reference genome.
For Euclidian distance and uniform loci analysis, parental replicate reads were summed up and filtered together with subclone reads for loci with coverage >0. Circular diagrams were generated using the R package circlize. [17] Sequencing data has the accession number PRJEB38542 and a browser is available at www.BorthLabCHOresources.boku.ac.at.

Impact of aging
To investigate the impact of methylation during cell line aging, data from a previous CHO methylome study [11] was processed using the above pipeline. Four datasets were used: 1. data acquired from a master cell bank (MCB) sample of the CHO-K1 cell pool, 2. data from the same cell line after maintenance in culture for 6 months, 3. data of a high productivity (qP) subclone, isolated from the MCB pool, and 4. data of the same clone after 3 months in cultivation. DMR analysis was performed between the MCB samples and the respective subclone samples to evaluate the impact of aging. Additionally, DMR analysis was performed between the starting MCB sample and the starting subclone sample, to investigate the impact of phenotypic selection. The only difference to above was during trimming, where a quality cut off of 28, and trimming at the 5′ end of 10 bp and 3′ end of 1 bp were chosen to ensure read quality for the different data set.

RNA sequencing
Total RNA was isolated from the D5 samples using Direct-zol RNA mini prep kit (Zymo Research) according to the manufacturers instruction. Singular samples of the six selected clones were submitted for sequencing as two technical replicates, whereas the parental cell line was analyzed in two biological replicates.
Low quality reads and adapters of the raw sequences were trimmed using Trimmomatic 0.36. [6] Processed reads were mapped to the Chinese hamster genome [48] using HiSat2 2.1.0. [22] Mapped reads were counted using the HTSeq python package. [2] DESeq2 R package 1.24.0 [32] was used to analyze read counts. Differential expression analysis was performed using the DESeq function of the package. Differentially expressed genes between samples were analyzed using the log2 fold change threshold 0 and BH adjusted p-value 0.05.
Genes with a foldchange difference of ≥1.5 and BH p-value < 0.05 were deemed significantly differentially expressed. For further analysis, counts were normalized using the DESeq2's variance stabilizing transformation (vst-normalization).

Downstream computational analysis
Downstream analysis was done in the statistical software R version 3.6.2. [44] Plotting was done using ggplot2. [61] Upset plots were generated using the R-package UpSetR. [8] 3 RESULTS

Stability study and subclone phenotypic diversity
To investigate the effects of subcloning on cellular phenotypes, a previously subcloned EPO-Fc producer cell line [28] was re-subcloned, in the expectation that an already subcloned cell line has a more uniform genetic landscape than a cell pool and thus should yield more homogenous subclones than a pool. After outgrowth of randomly selected colonies which showed reasonable growth and expansion after 3 weeks, 36 subclones were maintained in culture for 53 days ( Figure 1a). Batch experiments at the beginning and the end of this stability study were used to evaluate the phenotype of clones. Unexpectedly, the clones performed better during the second batch ( Figure   S1-S5), whereas the parental cell line showed the anticipated productivity loss. [41] This lower performance of the recently subcloned cultures may have been due to the cells not yet being fully recovered after subcloning, a stress that the parental pool was not exposed to.
Based on the maximum titer measured and the deviation between the two batch cultivations (Figure 1b and c), four clones were selected along the diagonal to cover the entire range of stable productivity, as well as two subclones that deviated from the diagonal due to either increased or decreased productivity (Table 1).

Transcriptome characterization
For transcriptional characterization of both the parental and the second generation subclones, RNA-Seq samples were taken during the exponential phase of the first batch. Using DESeq2, we measured the Euclidian distance between gene expression patterns which revealed two main clusters. One contained the parental clone and SL, while the other five subclones are closer to each other than to the parent or the low-productivity clone SL (Figure 2a). Performing a principal component analysis (PCA) shows that all replicates match closely and that each subclone can be distinguished (Figure 2b). including the ones from the parent, were still mid-exponential ( Figure   S3). Nevertheless, closely spaced transcriptome data from a previous study showed that there are little changes in transcriptome throughout the exponential phase of a batch culture [19] and no distinct clustering of these two samples was observed in the PCA or the distance matrix ( Figure 2a and b). This supports that there is no systematic difference between the samples due to variation in culture stage or growth and strengthens the conclusion of individual, subclone specific changes.
To understand the underlying function of these transcriptional changes, we performed pathway analysis via GSEA (Figure 2d and Table   S2). This revealed that the subclones share many upregulated pathways, even though the specific genes in each pathway may be different.

Full genome DNA-methylation pattern
Next, the full genome DNA-methylation pattern was sequenced and differentially methylated regions (DMRs) called. The coverage for the   To obtain an estimation of the impact of time in culture during the subcloning step on these results, we evaluated data from a previous study where the cellt's DNA-methylation pattern had been reanalyzed after 4 and 6 months in culture in Figure S13. [11] Using our same analysis pipeline we found that few changes occur during the time in culture. This is in marked contrast to the number of DMRs found for increased productivity where a much larger number of DMRs was detected, even exceeding those found in the present study after a single subcloning step without selection pressure. Also in line with a recent publication, aging seems to result in more hypo-methylation, while subcloning increases the number of hyper-methylated CpGs. [36] Differentially methylated Genes (DMG) are defined as genes where at least one DMR is found in their promoter region (+1500 to -200 to the TSS). Interestingly, only a small overlap between DEG and DMG can be observed ( Figure S9). However, the overlaps of DEG and DMG show the same trend: the more hypomethylated the DMR is, the higher is the observed log2Foldchange and vice versa ( Figure S10). Seeing Approx. 50% of Cs in the parent were uniformly methylated, and approx. half of these sites had a different methylation state in one or the other of the subclones, with the specific percentage of such differentially methylated sites in each subclone ranging from 5.2%-6.7% ( Figure 4a). In each subclone, approx. 40% of such differentially methylated sites that had been uniformly methylated in the parent were unique to this specific subclone (Figure 4a and b). The pattern of how these uniformly methylated sites change in each subclone is presented in Figure 5 and in Figure S12. In each subclone, between 3.8% and 8.6% of these loci with originally uniform, but now different methylation were again either fully methylated or demethylated, while 8%-15% were methylated at 50%, which could indicate uniform methylation at one allele and demethylation at the other. Most importantly, the presence of uniformly methylated Cs that had the inverse methylation state in the parent indicates that these new methylation states in the subclone had not been present in the original population, but were modified during the process of subcloning.

DISCUSSION
The process of subcloning and the ensuing wait for single cells to expand sufficiently to be transferred into larger culture vessels is one of the rate and time limiting steps in cell line development that is difficult to speed up. Subcloning is stressful, as documented by the low number of outgrowing colonies and in many cases by the necessity to supplement supportive proteins, hormones, and growth factors.
A recent study revealed that a major contributor to this stress is the absence of cell-to-cell communication and, possibly, the lack of proteins and other cellular signals in the culture environment. [ Despite this stressfulness and the long timelines required for subcloning, it has been a standard part of all cell line development platforms in the past 30 years. Invented for hybridoma technology, [25] where it was an essential part of generating cell lines that produce antibodies with unique binding sites, it later was used also for recombinant cell line technology, where the uniqueness of the product was given in any case by the used plasmid that was transfected. Nevertheless, F I G U R E 5 Changes in methylation of CpGs that were uniformly methylated in the parent (confirmed by ∼35x coverage) and that were differentially methylated in each subclone. Fully methylated in parent are shown in red and fully demethylated in blue. Resulting methylation in the subclones is shown in grey (Coverage in subclones ranges from ∼8x to ∼12x) the technique had its advantage, as during transfection and selection, a wide range of productivities was generated in the cells within a population, where the high producers may have a significant growth disadvantage and thus could be outgrown by non-or low-producers. [67] In later years, the main focus of ensuring monoclonality was to maintain stability and reproducible product quality. This was based on the assumption that a subclone is genomically more homogenous than a cell pool, [60] even though several studies have shown that subclones of a subclone are as genomically diverse as subclones of the parent. [3,9,11,55] With respect to phenotypes, it was already well established that there is high diversity with respect to growth rate and specific productivity in subclones generated from an already subcloned cell line. [24,42,49,53] Based on the here presented results it seems that, unbeknownst, but empirically successful, the process of subcloning itself made a significant contribution to our ability to isolate rare cells with outstanding properties throughout the thousands of cell screening programs would have had to be present in less than 3% of the population. As this pattern was found across a high number of such sites, many of these unique for individual subclones, it is highly unlikely that precisely these unique combinations were present already in the parent population as rare variants. If only such cells had been able to grow out of the main population, then the subcloning efficiency would also have had to be less than 3% where in reality it is typically in the range of 30%-70%.
Therefore, we suggest that our results support the hypothesis that these new methylation patterns were actively generated by the cells in response to the stress caused by isolation during the subcloning process itself. [59] An important question in this context is at what stage such a pro-  Figure 5).
A closer look at the differential transcriptome of the 6 clones provides indications of what cells need to enable them to outgrow from single cell colonies. While most of the DE genes and also enriched pathways are unique (as one would expect if the above mentioned changes in epigenetics are random), there are also some that overlap between all subclones. These include cell growth, cell cycle progression, and cell-cell interactions, and thus are plausible pathways that enable outgrowth into new clones. They also align well with the pathways identified in a recent study to support more efficient clonal outgrowth in cell lines selected for this property. [59] Of interest, in view of the know effect of subcloning to stabilize and "rejuvenate" cells, is the differential regulation of pathways and genes related to DNA repair.
A detailed look at where in the genome these changes took place reveals a strong de-enrichment for regions that are being actively tran- and its local context in particular with respect to enhancer regions remain mostly unelucidated. [1,47] The caveat on these results is that the chromatin states used were with only few genes expressed uniquely in some lineages or states, and these typically found at low read counts (unpublished results). Thus, we expect that the majority of the used chromatin states is correctly assigned also for the cell line used in this study.
In conclusion, we here presented data that show that during the process of subcloning the DNA methylation pattern of outgrowing cells is altered in a random way to generate new epigenomes and transcriptomes in the resulting subclones. This process appears to be caused by the cells activating a yet unknown mechanism to actively and randomly change their DNA methylation pattern, resulting in new and unique patterns that were not present in the parental population.
These changes contribute to the generation of new and diverse phenotypes in the subclones obtained and thus also support the screening for production clones with improved performance in bioprocessing. Due to the fact that DNA methylation patterns are copied during the division and therefore passed on to daughter cells, the thus obtained new patterns and associated phenotypes are then stable within the new subclone population, at least for a time, possibly until they are again modified due to some other stress situation arising that requires again a modification of the transcriptome.