Comprehensive proteogenomic analysis of human embryonic and induced pluripotent stem cells

Abstract Although the concepts of somatic cell reprogramming and human‐induced pluripotent stem cells (hiPSCs) generation have undergone several analyses to validate the usefulness of these cells in research and clinic, it remains still controversial whether the hiPSCs are equivalent to human embryonic stem cells (hESCs), pointing to the need of further characterization for a more comprehensive understanding of pluripotency. Most of the experimental evidence comes from the transcriptome analysis, while a little is available on protein data, and even less is known about the post‐translational modifications. Here, we report a combined strategy of mass spectrometry and gene expression profiling for proteogenomic analysis of reprogrammed and embryonic stem cells. The data obtained through this integrated, multi‐“omics” approach indicate that a small, but still significant, number of distinct pathways is enriched in reprogrammed versus embryonic stem cells, supporting the view that pluripotency is an extremely complex, multifaceted phenomenon, with peculiarities that are characteristic of each cell type.

human-induced pluripotent stem cells (hiPSCs) still remains controversial. Different studies have shown quite a high similarity between embryonic and reprogrammed pluripotent stem cells, 9 while others have instead found significant differences. 10,11 At the mRNA level, for example, early passage iPSCs show specific patterns of expression that are not detected in ESCs; this discrepancy seems to become less evident in late passage iPSCs. 12 While the epigenomic and transcriptomic profiles of hiPSCs and hESCs have been widely discussed, [13][14][15] less is instead known about proteomic patterns 16 and post-translational modifications, 17 such as phosphorylation. 18 Here, we have applied and integrated a multi-'omics' strategy to dissect, at the transcriptional, translational and post-translational level, two distinct human PSCs, the H9 embryonic stem cell line (hESCs), and the induced pluripotent stem cell line (hiPSC-1), obtained by reprogramming of peripheral blood T-lymphocytes from a healthy volunteer. Two additional hiPSC lines, one generated from peripheral blood T-lymphocytes and one generated from skin fibroblasts (hiPSC-2 and hiPSC-3, respectively) were included in this study to validate the proteogenomic data.

| Assessment of pluripotency of generated iPSCs
Prior to pluripotency assessment, all generated hiPSCs lines were tested for SeV-transgenes loss by reverse transcription polymerase chain reaction (RT-PCR). Detection of SeV-transgenes was performed in infected parental cells for presence, uninfected parental cells for absence, and generated hiPSCs for loss of viral transgenes.

| RNA extraction, RT-PCR and qRT-PCR
Total RNA was extracted using TRIzol reagent (Thermo Fisher Scientific) and 1 µg RNA was used for retro-transcription using the High-capacity cDNA Reverse Transcription Kit (Thermo Fisher Scientific). For SeV detection, half µg of cDNA was used for standard PCR reaction. However, qRT-PCR was used for gene expression quantification using 1 µL of the RT reaction and the Power SYBR Green master Mix (Applied Biosystems). Gene expression levels were normalized to Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) housekeeping gene. qRT-PCR was performed by StepOnePlus™ Real-time PCR System (Applied Biosystems). A list of primers is provided in Table S1.

| Immunohistological analysis and alkaline phosphatase staining
Cells were fixed with 4% (vol/vol) paraformaldehyde (PFA) and subjected to immunostaining using the following primary anti- Thermo Fisher Scientific) for 1 hour at 37°C. Nuclei were counterstained using 1 µg/mL Hoechst 33528 (Thermo Fischer Scientific).
Microscopy was performed using imaging systems (DMi8), filter cubes and software from Leica microsystems. AP staining was performed using the 1-Step NBT/BCIP (Thermo Fisher Scientific).

| Treatment of hESCs and hiPSC lines with retinoic acid and BMS493
For RAR pathway modulation, cells were treated for 24 hours either with 0.5 µM RA or with 5 µM BMS493 (a pan-retinoic acid receptor inverse agonist) or with a combination of RA and BMS493, directly diluted in the culture media. For EBs formation assay, cells were dissociated into single cells using StemPro Accutase (Thermo

| Proteomic and phosphoproteomic analysis: Strategy overview
An integrated strategy that combines enzymatic digestion, isobaric mass tag labelling, a selective affinity technique which uses metal oxide affinity material (MOA) for phosphopeptides enrichment, peptides fractionation by strong cation exchange, and nanoLC coupled with high resolution tandem mass spectrometry was adopted.
A total of 8 samples (200 µg each), four biological replicates per each cell line, hESCs and hiPSC-1 respectively, were prepared for phosphoproteomics and proteomics analyses. The core of the strategy is represented by the use of isobaric tags, allowing for relative quantification and consequent identification of differentially expressed proteins (DEPs) between hESCs and hiPSC-1 lysates. The total area under the chromatograms, the number of proteins and the number of sequenced peptides obtained from single injections were evaluated to estimate protein amounts per sample (Table S2): about 200 µg per sample was handled. These results were further used to ensure that the subsequent labelling reaction would be performed on a comparable amount of starting material. Moreover, nLC-MS/MS analyses were also carried out after TMT labelling 21 (Table S3)

| Microarray procedure
Total RNA was extracted using the Stratagene Absolutely RNA kit and resuspended in RNase-free water. Spectrophotomeric determination of purified RNA yield was performed using the NanoDrop (Thermo Scientific), while total RNA quality was measured using the BioAnalyzer 2100 (Agilent Technologies). Antisense RNA (aRNA) was synthesized, amplified and purified using the Illumina TotalPrep RNA Amplification Kit (Ambion) following the manufacturer's instructions. For microarray, purified aRNA was hybridized to the Human HT-12v4 Expression BeadChip Kit (Illumina). Samples were scanned on the iSCAN system (Illumina). The output file was statistically analysed.

| Statistical analysis
For proteomic data, statistical analysis was carried out by using both were compared by student t-test corrected for multiple hypothesis testing using the Benjamini-Hochberg procedure (q-value <0.05). 22 Protein fold-changes were determined by dividing protein's median fold-changes (n = 4 replicates) of the two data sets. For microarray analysis, primary raw intensity data produced by Illumina iSCAN were imported in R statistical environment using limma package 23 for background subtraction, quantile normalization and log 2 transformation signal values. This procedure also removes the control probes, leaving only the regular ones. Moderated t-test analysis with Benjamini and Hochberg (BH) multiple testing correction were used to identify differentially expressed genes (DEGs) between hESCs and hiPSC-1.
DEGs were selected by a fold-change analysis of ≥1.5 and based on a P value cut-off of ≤0.05. The identified DEGs were annotated in Gene Ontology (GO) and pathway analysis. Ingenuity Pathways Analysis (IPA; Ingenuity Systems, http://www.ingen uity.com website) was used for gene set enrichment and gene network analysis.

| Whole proteome identification and classification of DEPs
Proteomics data resulting from nano-Liquid Chromatography Tandem Mass Spectrometry (nLC-MS/MS) allowed the identification and quantification of 3807 proteins between hiPSC-1 and hESCs samples (Table S4). For proteomic and phosphoproteomic data, the same statistical cutoff was adapted and 230 statistically significant proteins (q < 0.05) were selected. Interestingly, by com-  (Table 1). Two proteins, RAB17 and SQSTM1, respectively up-and down-regulated in hiPSC-1 vs hESCs, were selected for biological validation via Western blot analysis ( Figure 1A). Although the trend of expression is confirmed for both proteins, only the expression level of SQSTM1 resulted in statistically significant (P value 0.03). RAB17 is a member of the small GTPase superfamily and it has been linked to the down-regulation of cell growth and proliferation. 24 The protein resulted in up-regulated hiPSCs vs hESCs in our proteogenomic comparison and, in a previous study, we demonstrated that hESCs have indeed a higher proliferation rate compared to hiPSCs as shown by cell cycle analysis. 11 SQSTM1 is a hub molecule involved in several biological pathways, including autophagy that represented a highly conserved cellular process in ES cells supporting self-renewal and regulating differentiation. Moreover, autophagy is activated during reprogramming of somatic cells to iPSCs. 25 Whole proteomic data analysis identified key regulators of pluripotency, including Sox15, 26 (Table S6) and LBR (Lamin B receptor) (Table S4), respectively down-and up-regulated in hiPSCs.

| Identification and classification of differentially expressed phosphoproteins
Phosphoproteome analysis allowed us to identify and quantify 5958 phophopeptides and 2623 phosphoproteins (Tables S5 and   S6). Of these, 69 phophopeptides and 73 phosphoproteins were found statistically significant according to Student's t-test with the  Differentially abundant phosphopetides and phosphoproteins, plotted in Figure 2A  with the phoshoproteomic analysis ( Figure 2D).

| Transcriptome analysis for the identification and classification of DEGs
Comparative transcriptome analysis of hiPSC-1 vs hESCs, followed by a functional annotation analysis, highlighted 433 DEGs  (Table S7).    of embryonic stem cell self-renewal and pluripotency such as NRF, and ERK/MAPK signalling resulted commonly enriched ( Figure 4A).
Although the majority of pathways identified by this cross-analy-   Table S9.

| Role of the RAR activation pathway and its modulation on the phenotype of human ESCs and human PSCs
Among the pathways identified as differentially regulated in human ESCs and iPSC-1, we selected the RAR Activation signalling, as retinoids including Vitamin A and its derivatives have been widely associated with embryonic development and differentiation. 27,28 Here, we evaluated the effects on hESCs and hiPSC-1, -2, and -3 -3) showed an enhanced expression of SOX17 when they were simultaneously exposed to RA and BMS493 (RA+/BMS493+) as shown by immunofluorescence ( Figure S3) and its quantification ( Figure 5C).
qRT-PCR analysis for NESTIN, MESP1 and SOX17 expression showed a reduction of these markers in hESCs in all the three conditions tested (RA+; BMS493+; RA+/BMS493+); conversely, although the hiPSC lines showed a similar trend when exposed either to RA or BMS493, they completely recovered their endodermal and mesodermal differentiation potential when simultaneously treated with

| D ISCUSS I ON
Recent breakthroughs in stem cells research allowed to envision a scenario in which a stem cell-based therapy might become reality.
Particularly, the findings that differentiated somatic cells can be reprogrammed back to a pluripotent state and subsequently differen-  complete re-establishment of hiPSCs differentiation potential is instead observed when cells are simultaneously treated with RA and BMS493, supporting the fact that a synergistic action of both molecules is sufficient to restore the activity of the pathway.
In conclusion, our data indicate that the molecular correspondence between hiPSCs and hESCs is not exactly linear, as demonstrated by the presence of specific pathways representative for each cell line. The discovery of pathways enriched in hiPSC-1 vs hESCs and the identification of pathways exclusively present in our dataset are in agreement with the view that pluripotency is an extremely complex and multifaceted phenomenon, with peculiarities that are characteristic of each pluripotent cell type, and that hESCs and hiPSCs cannot be considered equivalent from a functional point of view. Our data provide evidence that reprogrammed cells possess a unique molecular signature that can have functional and phenotypic consequences when a given pathway is modulated.
Future investigations of the identified pathways and their relative components will provide new insights into the complex mechanisms of pluripotency and self-renewal of reprogrammed cells and will give the opportunity to understand how molecular variations can impact the phenotypic and functional behaviors of reprogrammed stem cells.

ACK N OWLED G M ENTS
This work was supported by in part by the MIUR grant PON03PE_00009_2 (iCARE) to G.C.

CO N FLI C T S O F I NTE R E S T
The authors declare no conflict of interest.