A Mass Spectrometry Survey of Chromatin-Associated Proteins in Pluripotency and Early Lineage Commitment

Pluripotency can be captured in vitro in the form of Embryonic Stem Cells (ESCs). These ESCs can be either maintained in the unrestricted “na¨ıve” state of pluripotency, adapted to developmentally more constrained “primed” pluripotency or diﬀerentiated towards each of the three germ layers. Epigenetic protein complexes and transcription factors have been shown to specify and instruct transitions from ESCs to distinct cell states. In this study, proteomic proﬁling of the chromatin landscape by chromatin enrichment for proteomics (ChEP) is used in mouse naive pluripotent ESCs, primed pluripotent Epiblast stem cells (EpiSCs), and cells in early stages of diﬀerentiation. A comprehensive overview of epigenetic protein complexes associated with the chromatin is provided and proteins associated with the maintenance and loss of pluripotency are identiﬁed. The data reveal major compositional alterations of epigenetic complexes during priming and diﬀerentiation of na¨ıve pluripotent ESCs. These results contribute to the understanding of ESC diﬀerentiation and provide a framework for future studies of lineage commitment of ESCs. identify transcription factors, our detected proteins were matched with two published lists of murine transcription factors. [44,45] Correlation between replicates was assessed using spearman correlation. Proteins that were signiﬁcantly diﬀerent between the conditions were assessed using ANOVA statistics with Benjamini–Hochberg correction for multiple testing. Proteins were called signiﬁcant with FDR < 0.05. To calcu-late p -values in pairwise comparisons in Figure 3B, we used Welch’s t -test. A list of all detected proteins and whether these are signiﬁcantly diﬀerent between the conditions can be found in Table S2, Supporting Information.Forcomparison of ChEP with whole cell proteomes, we used whole cell proteomesthatweregeneratedpreviouslyinourlab. [54] Downstreamanal-ysis was done with R, Python3, and Jupyter Notebook. GO analysis was performed using DAVID. [55] The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identiﬁer PXD011782. [56]


Cells
(ESCs) are receptive to differentiation-inducing cues from the environment. [3,4] Several distinct populations of pluripotent cells can be maintained in vitro, each reflecting a particular time point in embryonic development. [2] Naïve ESCs of mice can be derived from pre-implantations embryos and can be maintained in fetal calf serum supplemented with Leukemia Inhibitory Factor (LIF), [5,6] which mediate activation of the SMAD and JAK-STAT signaling, respectively. [2] Upon injection into pre-implantation embryos, these cells can contribute to all germ layers and the germline. [3] The in vitro cultured equivalents of post-implantation pluripotent cells are Epiblast stem cells (EpiSCs), of which the maintenance relies on stimulation of SMAD and ERK signaling by Activin A and FGF2, respectively. [7,8] EpiSCs are more developmentally constrained and represent the "primed" state of pluripotency, as EpiSCs are not germline-competent and do not contribute efficiently to embryos in chimaera assays. [9,10] However, EpiSCs retain the ability to form teratomas, demonstrating their pluripotency, [8] although EpiSCs cannot revert to the naïve state of pluripotency. When primed ESCs progress with dissolution of the pluripotent state, they initiate lineage specification programs during differentiation and as such are no longer considered pluripotent. [11] The transition from naive to primed and subsequently differentiation requires extensive rewiring of the cellular state, exemplified by major changes in cell morphology and distinct metabolic, transcriptional and epigenetic states. [12] Some of these changes, such as the activation of lineage-priming genes and increase of DNA methylation, are readily initiated in primed pluripotency. [13,14] Another notable example comprises bivalent chromatin domains, which are decorated with both repressive and activating histone marks, [15] that are largely resolved when the pluripotent state is lost, starting upon priming of ESCs. [16] Epigenetic processes play a substantial role in regulation of cell fate decisions. The effector proteins on the chromatin, Transcription Factors (TFs), often exhibit a role as "master regulator" by binding to and regulating many genes, thereby driving cell state transitions. [17] Despite their low abundance, transcription factors make up a significant portion of the variation in the mammalian proteome, [18] and exert critical roles in mammalian development. [17] Several regulatory TFs have been discovered that drive early priming or differentiation in the embryo, such as Otx2 Figure 1. Generation of distinct cell types using specific culture conditions. A) Schematic overview of the culture conditions and workflow used. B) Colonies of the various cell types as used in this study. C) Gene expression for selected markers to validate the distinct molecular signature. Error bars represent SEM, n = 2. and Zeb1/2, respectively. [19] In addition, recent reports have highlighted dramatic rewiring of epigenetic complexes between ESCs and early differentiated cell types such as neural progenitor cells (NPCs). [20,21] However, a comprehensive overview of the chromatin environment during priming and differentiation is currently lacking. Such an overview would provide valuable information on TFs and other epigenetic factors in the process of differentiation. Here, we set out to provide a comprehensive overview of the chromatin proteome during the onset of differentiation.
As epigenetic factors and TFs are generally lowly abundant, [18] their detection is challenging when using proteomics approaches that profile total cell lysates. [22] However, these factors can be brought into the dynamic range of mass spectrometer through enrichment of the chromatin fraction. [23][24][25][26] Simultaneously, this provides information on the levels of these factors in their relevant context. [27,28] To gain insight in dynamics of epigenetic regulators and transcription factors during pluripotency and differentiation, we collected a range of cell types representing naïve pluripotency ("ESCs"), primed pluripotency ("EpiSCs"), and early neuronal differentiation ("END"), which is a widely adopted differentiation system [29] (Figure 1A). First, we confirmed that the various cell types were morphologically distinct. In particular, ESCs formed small colonies, EpiSCs formed large, flatter colonies, and END cells were hallmarked by a more stretched morphology [30] (Figure 1B). In terms of expression of known marker genes, the naïve pluripotency marker Rex1 was highly expressed in ESCs, but not in other cell types. EpiSC culture conditions showed high expression of EpiSC markers Otx2, Fgf5, and Zic2. [31] Ectodermal differentiation markers Pax6 and Sox1 were upregulated in END cells ( Figure 1C). [31] These results validated our in vitro differentiation protocol.
Next, we aimed to isolate the chromatin of these cells using Chromatin Enrichment for Proteomics (ChEP) [23] (Figure 2A). To confirm that ChEP enriches for chromatin, we validated the enrichment for histones after ChEP of ESCs as compared to whole cell extracts of ESCs using Coomassie staining ( Figure S1A, Supporting Information). Next, we generated chromatin proteomes of ESCs and compared these to whole cell proteomes of ESCs. We observed that the ChEP strongly enriches for chromatin factors such as histones and DNA binding zinc finger proteins, while ChEP depletes for cytosolic factors such as mitochondria and translation initiation factors ( Figure  S1B-D), validating the ChEP procedure. Next, we performed ChEP for ESCs, EpiSCs, and END cells and performed mass spectrometry analyses of the chromatin fraction. A total of 4174 proteins were reproducibly quantified ( Figure S1E, Supporting Information) and the replicates showed a high correlation (spearman correlation >0.95) ( Figure 2B,C). Next, as chromatin enrichment procedures can be prone to contamination from organelles such as the mitochondria, [32] we assessed the purity of the ChEP proteomes on the peptide level. This revealed that of all detected proteins, 70% (ESCs), 60% (EpiSCs), and 63% (END cells) of the unique and razor peptides originated from proteins with an expected chromatin function. [33] For further downstream analysis, only the proteins with an expected chromatin function (Experimental Section; Table S2, Supporting Information) were included.
For validation of the cell types on the chromatin level, we plotted the Label Free Quantification (LFQ) values of known markers of EpiSCs and early differentiation, which included naïve markers (TBX3 and KLF4), priming markers (DNMT3B and GRHL2), and neuronal markers (FOXP1 and HMGN3) www.advancedsciencenews.com www.proteomics-journal.com www.advancedsciencenews.com www.proteomics-journal.com ( Figure 2D). Of all proteins included in the analysis, ANOVA statistics revealed 659 proteins to be significantly differentially abundant between the three cell types. We clustered these proteins using k-means clustering. This approach revealed distinct clusters with GO terms associated with their particular cell state. Notable examples include the term "telomere maintenance" being enriched on the chromatin in ESCs, whereas END chromatin was enriched with factors associated with neuro-ectodermal processes such as "neurogenesis" and 'regulation of epithelial cell differentiation' ( Figure 2E).
After validation of the samples, we aimed to provide a comprehensive overview of the abundance of major epigenetic complexes on the chromatin in these cells. Notably, we previously showed that the abundance of chromatin-associated proteins is in good correspondence to the levels of the respective proteins as observed in protein complexes using protein complex pulldowns. [34] We focused on major epigenetic protein complexes including the Polycomb Repressive Complex 2 (PRC2), Nucleosome Remodeling and Deacetylase (NuRD) Complex, and the Brg/Brahma-associated factors (BAF) complex. We observed large changes within these epigenetic complexes ( Figure 3A). For example, we observed a downregulation of many PRC2 components including the core complex consisting of EED, EZH2, and SUZ12, which we also validated using western blot for EED ( Figure S1F, Supporting Information). This observation is in line with previous reports highlighting a major rewiring of the PRC2 complex during differentiation resulting in a strong reduction of many components in neural progenitor cells. [20,35] The BAF complex is known to change composition during differentiation. [36] In line, we observed an increase of subunits that characterize neuronal differentiation (npBAF) such as SMARCC2 in EpiSCs, and these levels increase to a larger extent in END cells. In contrast, subunits that define the ESC version of the BAF complex (esBAF; SMARCC1, SMARCD2, and ACTL6A) remain at similar levels while exiting the naïve state, [36] indicating a balance shift toward npBAF, although the esBAF complex remains present as well. In addition, several other BAF complex subunits display strong dynamics upon differentiation. For example, SMARCA4 (also known as BRG1) is strongly upregulated in END cells, which is likely related to the requirement of SMARCA4 for induction of enhancers during lineage commitment. [37] On the other hand, SS18 seems to be mainly present in pluripotency, which could indicate that the embryonic lethality observed in mice lacking SS18 is readily caused by defects in very early embryonic development. [38] Finally, we focused on the NuRD complex as a very recent report suggested altered NuRD complex composition in ESCs and NPCs. [21] In line with this previous report, we observe more ZFP296 on chromatin in ESCs and more ZMYND8 and ZNF687 on chromatin during early differentiation. Interestingly, some changes such as the gain of ZMYND8 are readily present in EpiSCs, indicating this switch occurs during pluripotency ( Figure 3A, S1G), whereas the increase of CHD3 and MBD2 mainly occurs after the pluripotent state is lost. As such, these results further indicate that NuRD changes composition upon ESC differentiation.
Next, we focused on DNA methylation as this is a major driver of differentiation. [39] We observed higher levels of DNMT3B in EpiSCs, in line with previous reports, [40] but an increase in DNMT3A in END cells ( Figure 3A), which could indicate a switch in DNMT3 proteins during initiation of differentiation. We also observed a drastic downregulation TET1/2 in both EpiSCs and END cells, which fits previous reports showing the downregulation of TET1/2 during embryoid body differentiation. [41] In addition, this further indicates that the altered DNA methylation landscape in differentiating cells may be the result of a shift in TET1/2 and DNMT3A/B balance, rather than a unilateral increase in depositing enzymes. [39,42] The last category of proteins we focused on is pluripotency factors as these comprise dynamic regulators of cell fate and differentiation. We observed that both EpiSCs and END cells displayed an overall downregulation of pluripotency markers, fitting the loss of the naïve state ( Figure S1H, Supporting Information). In addition, EpiSCs were characterized by increased levels of LIN28A/B which is characteristic for primed pluripotency [43] ( Figure S1H). Collectively, our analysis reveals strong rewiring of chromatin-associated epigenetic complexes upon induction of differentiation.
Finally, we aimed to use our chromatin proteomes to identify candidate TFs that regulate differentiation. We filtered the proteins identified in the chromatin proteomes for TF activity using a combination of published TF databases [44,45] and identified differential TFs (p < 0.05 and >twofold difference) relative to ESCs. These analyses identified known naïve pluripotency factors such as ESRRB and KLF2 to be more abundant in ESCs ( Figure 3B). Conversely, factors associated with neural development such as FOXC1/2 and ZEB1 are enriched in END cells, [46,47] and priming factors such as GRHL2 and LIN28B in EpiSCs. [43,48] Next to identification of known regulators, we identify several TFs such as SMAD2, ZFHX3, HIC1, and ZHX2 that could be candidate regulators for priming or differentiation.
Here, we here provide a comprehensive overview of the chromatin during transition from the naïve pluripotent state toward primed and differentiating cell-states. Focusing on the chromatin proteome enabled detection of low abundant transcription factors, which allowed us to effectively explore these regulatory factors during maintenance and exit of pluripotency. Several of the changes in the chromatin protein landscape during differentiation as observed in the current study are complementary to previous work [34] in which we compared two different states of naïve pluripotency, represented by 2i ESCs and serum ESCs. An example comprises PRC2, which is moderately downregulated in the transition from 2i to serum ESCs, and more drastically upon differentiation. This suggests that epigenetic changes that are linked to differentiation are readily initiated upon dissolution of the pluripotent ground state.
In conclusion, the dataset as generated in the current study recapitulates known dynamics in epigenetic protein complexes during differentiation and can be used to identify novel candidate proteins for future studies. To facilitate this, we have included an extensive table with the proteins identified in this study, their abundance, and whether they are significantly different between ESCs, EpiSCs, and END cells. In addition, we have highlighted whether a protein is considered a chromatin factor (Table S2, Supporting Information). Overall, these data provide a useful resource for future studies on the chromatin environment during maintenance and exit of pluripotency.
Chromatin Enrichment for Proteomics: Chromatin was harvested and enriched as described in Kustatscher et al. [23] Cells were cross-linked on plates using 1% formaldehyde and incubated at 37°C for 10 min. Crosslinking reaction was stopped by adding glycine to a concentration of 0.25 M for 5 min. Plates were rinsed with Phosphate Buffered Saline (PBS) and scraped in 5-10 mL of PBS into tubes. Tubes were centrifuged at 1000 g for 3 min, supernatant was aspirated and cells were resuspended and homogenized in 1 mL cold lysis buffer (25 mm TRIS pH 7.4, 0.1% Triton X-100, 85 mm KCl, 1X Roche protease inhibitor). Suspensions were centrifuged at 2300 g for 5 min at 4°C, supernatant was aspirated and pellets were resuspended in 500 µL of lysis buffer and incubated at 37°C for 15 min. Suspensions were centrifuged at 2300 g for 10 min at 4°C, supernatant was aspirated, pellets were resuspended in 500 µL of SDS buffer (10 mm TRIS pH 7.4, 10 mm EDTA, 4% SDS, 1X Roche protease inhibitor) and incubated at room temperature for 10 min. 1.5 mL of urea buffer (10 mm TRIS pH 7.4, 1 mm EDTA, 8 m urea) was mixed with samples and they were centrifuged at 16 100 g for 30 min at room temperature. Supernatant was aspirated and pellets were resuspended in 500 µL SDS buffer, after which 1.5 mL urea buffer was added and suspensions were centrifuged at 16 100 g for 25 min at room temperature, this step was performed twice. The supernatant was aspirated and pellets were resuspended in 500 µL SDS buffer, after which 1.5 mL SDS buffer was added and suspensions were centrifuged at 16 100 g for 25 min at room temperature. Supernatant was discarded and pellets were carefully resuspended in 100-200 µL of storage buffer (10 mm TRIS pH 7.4, 1 mm EDTA, 25 mm NaCl, 10% glycerol, 1X Roche protease inhibitor) and sonicated for 6 min at high intensity (30 s on/off alternation) on a NGS bioruptor (Diagenode). Protein concentration was determined using a Qubit assay (Invitrogen). Samples were subjected to mass spectrometry sample preparation or western blot.
RT-qPCR: Cell pellets were generated by taking a small volume of cells in suspension and centrifuging samples for 3 min at 1000 g. RNA was extracted using the RNeasy Mini Kit (Qiagen). cDNA was synthesized by reverse transcription as previously described. [50] Quantitative PCR on the cDNA was performed using SYBR Green (Bio-Rad, cat. no 1708886). Primers are listed in Table S1, Supporting Information.
Mass Spectrometry Sample Preparation: Sample preparation for mass spectrometry was adapted from published methods. [51] All centrifugation steps were performed at 20°C unless specified otherwise. Cross-linked samples were incubated at 95°C for 30 min in decrosslink buffer (10.5 µm TRIS pH 8.8, 1.95% SDS, 60 µm β-mercaptoethanol). After decrosslinking, 20 µg of protein in 30 µL volume was loaded onto Centrifugal Filters (Microcon, cat. no MRCF0R030) and 200 µL of UA (8 m urea, 0.1 m HEPES pH 8.5) was added. This was centrifuged for 15 min at 14 000 g. Another 200 µL of UA was added, and the same centrifugation step was applied. 100 µL fresh IAA (0.05 m iodoacetamide (IAA) in UA) was added to samples, and this was mixed in a thermo-mixer for 1 min, after which it was incubated for 20 min in the dark. Samples were then centrifuged for 10 min at 14 000 g. To wash, 100 µL of UA was added and samples were centrifuged for 15 min at 14 000 g. The wash step was repeated twice. Filters were washed with 100 µL ABC (0.05 m ammonium bicarbonate) and spun for 10 min at 14 000 g. The wash step with ABC was performed three times. Filters were then transferred to a new collecting tube, and 40 µL of ABC with trypsin (1:100 enzyme to protein ratio) was loaded onto the filter. This was mixed at 600 rpm in a thermo-mixer for 1 min. Filters were sealed with parafilm to prevent evaporation and incubated at 37°C o/n. After trypsin digestion, filters were centrifuged for 10 min at 14 000 g. 50 µL of 0.5 m NaCl was added and filters were centrifuged for 10 min at 14 000 g. 4 µL of trifluoroacetic acid (TFA) was added to acidify samples. Samples were then subjected to stage tip preparation. Stage tips were generated by stacking 200 µL pipet tips with three layers of C18. Tips were washed with 100 µL MeOH and spun for 2 min at 2500 g, washed with 100 µL Buffer B (80% acetonitrile, 0.1% formic acid in H 2 O) and spun for 2 min at 2500 g and washed with 200 µL Buffer A (0.1% formic acid in H 2 O) and spun for 4 min at 2500 g. Samples were loaded onto stage tips and centrifuged for 4 min at 1500 g. Tips were washed with 100 µL Buffer A and spun for 2 min at 2500 g, which was repeated once. Samples were then eluted in 40 µL Buffer B, speedvacced to 5 µL, and filled up to 12 µL with buffer A.
Mass Spectrometry Analysis: 5 µL digested peptides was injected into an Easy-nLC1000 (Thermo) connected online to an LTQ-Orbitrap-Fusion mass spectrometer (Thermo) by developing a gradient from 7 to 30% Buffer B for 214 min before washes at 60% then 95% Buffer B, for 240 min of total data collection time. The flow rate was 250 nL min −1 . Full MS scans were collected from 400 to 1500 m/z with an Orbitrap resolution of 120 000 and an AGC target of 3e5. MS/MS spectra were recorded in the Ion trap using higher-energy collision dissociation fragmentation. The ion trap scan rate was set at Rapid. An AGC target of 2e4 was used with HCD collision energy at 30% and an intensity threshold of 1e4. Scans were recorded in data-dependent top-speed mode of a 3-s cycle with dynamic exclusion set at 60 s with a mass tolerance of 5 ppm. Ions of charge state 2-7+ were considered. Thermo RAW files were searched against the curated UniProt mouse proteome database (release December 2015) with MaxQuant [52] (version 1.5.1.0) and its integrated search engine Andromeda. Cysteine carbamidomethyl was used as a fixed modification, and N-terminal acetylation and methionine oxidation were used as variable modifications. The mass tolerance for precursor ions was set to 20 ppm and the mass tolerance for fragment ions to 0.5 Da. The match between runs feature was enabled and LFQ and IBAQ values were calculated for each protein. The output Proteingroups file containing all detected proteins was loaded into Perseus. [53] Proteins were first filtered against a reverse and contaminant database. Next, the conditions were grouped in Perseus and any protein that was not detected in all replicates of a single condition was discarded. Missing values were imputed from the random distribution with default parameters (width = 0.3, Down shift = 1.8). The proteins in the resulting list were annotated as chromatin-associated or not chromatin associated (Table S2, Supporting Information). This was done by comparing to a list of factors that were experimentally and in silico determined to be chromatin associated. [33] As this list was generated in non-pluripotent human cells, we converted the names to mouse names and we manually included known mouse pluripotency factors. In addition, we called all zinc finger proteins chromatin-associated as these are known to possess nucleic acid binding domains. To specifically identify transcription factors, our detected proteins were matched with two published lists of murine transcription factors. [44,45] Correlation between replicates was assessed using spearman correlation. Proteins that were significantly different between the conditions were assessed using ANOVA statistics with Benjamini-Hochberg correction for multiple testing. Proteins were called significant with FDR < 0.05. To calculate p-values in pairwise comparisons in Figure 3B, we used Welch's t-test. A list of all detected proteins and whether these are significantly different between the conditions can be found in Table S2, Supporting Information.
For comparison of ChEP with whole cell proteomes, we used whole cell proteomes that were generated previously in our lab. [54] Downstream analysis was done with R, Python3, and Jupyter Notebook. GO analysis was performed using DAVID. [55] The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD011782. [56]

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.