Formation of Albumin and AFP-Secreting Hepatocytes from hESCs
In the mouse, FGF signaling from the cardiac mesoderm provides essential signals for specification of the ventral foregut endoderm into liver (reviewed ). A previous study by Lavon et al.  showed that acidic FGF was able to increase by approximately 50-fold, the number of albumin-positive cells arising in a 30-day hESC differentiation protocol. Using a modification of their protocol, we conducted a two-step differentiation regime, illustrated in Figure 1. First, we plated clumps of undifferentiated hESCs in suspension onto ultralow adhesion dishes in DMEM containing 20% FBS. This promoted the formation of embryoid bodies (EBs), which contain cells of the endoderm, mesoderm, and ectoderm lineage. After 8 days, the EBs were plated down onto standard tissue culture dishes coated with gelatin in DMEM containing 20% FBS supplemented with 100 ng/ml acidic FGF (FGF-1).
Figure Figure 1.. Enzyme-linked immunosorbent assay (ELISA) determination of liver proteins secreted by differentiated human embryonic stem cells (hESCs). Albumin and α-fetoprotein (AFP) concentration in the growth media was determined by ELISA. Peak levels of AFP were detected approximately 2 weeks after plating embryoid bodies onto gelatin-coated dishes. Albumin secretion reached the highest levels approximately 3 weeks following plating. Abbreviation: EBs, embryoid bodies.
Download figure to PowerPoint
To examine whether this differentiation regime gave rise to hepatic cells, conditioned media were collected for 6 weeks after plating down the EBs and analyzed by ELISA for the presence of the hepatic proteins AFP and albumin. Both proteins were secreted in the media at high levels. As shown in Figure 1, peak levels of AFP were detected approximately 2 weeks after plating embryoid bodies onto gelatin-coated dishes. Albumin secretion reached the highest levels approximately 3 weeks following plating. Thus, the cells are capable of secreting liver-specific proteins.
Although the cellular population secreted high levels of liver-specific proteins, we wanted to isolate and examine the individual cells within the population that were expressing AFP and albumin. To this end, we used lentivirus reporter vectors containing the AFP promoter driving eGFP and the albumin promoter upstream of either eGFP or the RFP gene. These vectors were recently used to mark and isolate liver progenitors from primary human fetal liver tissue . Differentiating hESCs were transduced 5 days after plating the EBs.
We observed AFP promoter activity in two contexts. The most common appearance of eGFP driven by the AFP promoter occurred within densely packed multicellular layers of small cells resembling hepatoblasts (Fig. 2A, 2B.). AFP promoter activity was also detected in stratified clusters of cells (Fig. 2C, 2D). These clusters of cells closely resemble the bile duct units formed by bipotential mouse embryonic liver cells grown on Matrigel . Albumin promoter activity was also detected in multiple cell types. Similar to AFP, albumin promoter activity was detected in the periphery of multicellular layers of cells and in stratified cell clusters (Fig. 2E, 2F). In addition, albumin was detected in monolayers of cells resembling hepatocytes (Fig. 2G, 2H). In conclusion, this differentiation protocol gives rise to three morphologically distinct populations of cells that express the liver markers AFP and albumin.
Figure Figure 2.. Morphology of human embryonic stem cells (hESCs) differentiated toward hepatic lineages. H9 hESCs were differentiated and transduced with α-fetoprotein promoter driving enhanced green fluorescent protein expression (AFP:eGFP) and albumin:dsRed lentiviral vectors. The most common appearance of AFP:eGFP+ and albumin:dsRed+ cells occurred in dense, multilayered regions of small cells as shown in (A) and (B), phase contrast and fluorescence, respectively (× 100 magnification). AFP and albumin promoter activity was also detected in stratified clusters of cells (C, D and E, F, respectively; × 200 magnification). Note the morphology of the cell clusters closely resembling the bile duct units formed by bipotential mouse embryonic liver cells. Albumin promoter activity was also detected in monolayers of cells resembling hepatocytes (G and H; × 200 magnification).
Download figure to PowerPoint
Isolation and Expression Profiling of hESC-Derived Hepatic Cells
Once we established the conditions for fluorescently tagging cells expressing hepatic cell markers, we used flow cytometry cell sorting to purify the AFP:eGFP+ cells from the AFP:eGFP− cells. The AFP:eGFP lentivirus construct was chosen for this study over the albumin construct for two reasons. First, we sought to examine the earliest events in hepatic cell differentiation with the hope of identifying early, hepatic stem cell populations. Since albumin was considered a marker of more mature liver cells, we hypothesized that the albumin promoter may mark a more mature cell population. Second, the albumin construct typically fluorescently labeled fewer cells than the AFP construct, perhaps due to differences in the endogenous gene expression or differences in the strength of the transgenic promoter constructs.
Twenty-one days after cells were differentiated and transduced with the AFP:eGFP lentivirus construct, cells were dissociated and sorted based on eGFP expression using flow cytometry. Cells expressing detectable eGFP by flow cytometry were considered the AFP:eGFP+ fraction, the remainder of cells considered AFP:eGFP−. Typically the AFP:eGFP+ fraction comprised between 1%–10% of the total cell population and was more than 85% pure following FACS sorting. Immunohistochemistry on the AFP:eGFP+ and AFP:eGFP− populations with an antibody against the human AFP protein confirmed that the AFP protein was detectable only within the AFP:eGFP+ population (supplemental online data 1). mRNA from triplicate biological replicates of each fraction was isolated, used for probe synthesis, and hybridized to the Affymetrix Exon Array ST 1.0. Gene expression indexes were calculated as described in Xing et al.  for the 400,000 background adjusted full probe sets resulting in 18,495 gene indexes.
Comparison of AFP-Positive Cells to Tissue Shows Strong Similarity to Liver
One of the primary goals of this study was to isolate human hepatic cells differentiated from hESCs. Although AFP is an early hepatocyte marker, it is also expressed in other fetal tissues including the yolk sac and kidney. Therefore we sought to test whether there was a significant enrichment of liver-specific transcripts in the transcriptome of the AFP:eGFP+ cells. To identify biological difference between the AFP:eGFP+ and AFP:eGFP− transcriptomes, we first compared the gene indexes by Gene Set Enrichment Analysis (GSEA) . GSEA is thought to be an improvement over traditional gene ontology (GO) term enrichment analysis since GSEA examines the entire data set, whereas GO analysis typically requires a preselected list of differentially regulated genes using an arbitrary cutoff. GSEA with the C2 gene set (containing 522 gene sets participating in specific metabolic and signaling pathways from manually curated databases) and the C4 gene set (containing 427 gene sets defined by expression neighborhoods centered on cancer-related genes) identified 20 gene sets as significantly upregulated in the AFP:eGFP+ fraction (supplemental online data 2). When GSEA identifies enrichment of multiple gene sets, additional biological insight can be gained by “leading-edge” analysis. The leading edge subset is the core of a gene set that contributes most to the enrichment score in GSEA . Biologically important subsets of genes can be identified by examining the genes shared in leading edge subsets . From the 20 gene sets upregulated in the AFP:eGFP+ fraction, 193 genes contributed to the leading edge subsets. Sixty-nine of the genes in the leading edge subsets are shared by two or more gene sets. Examination of the expression levels of those genes in a panel of 11 human tissues revealed that 66 of the 69 genes have their highest expression in the adult liver (Fig. 4). The intensity of the heatmap within the liver reflects the fact that these genes are much more highly expressed in the liver relative to any of the other samples and strongly suggests that the AFP:eGFP+ fraction is enriched for liver gene expression.
Figure Figure 4.. The α-fetoprotein promoter driving enhanced green fluorescent protein expression-positive (AFP:eGFP+) fraction is enriched for liver gene expression as determined by gene set enrichment analysis. Biological difference between the AFP:eGFP+ and AFP:eGFP− transcriptomes were investigated by comparing the gene indexes by Gene Set Enrichment Analysis. Shown is a heatmap generated by the dChip program of the 69 common leading edge genes. In the heat map, high expression is depicted in red and low expression in blue. The columns have triplicate samples of undifferentiated human embryonic stem cells (hESCs), 8-day-old embryoid bodies, AFP:eGFP− and AFP:eGFP+ sorted fractions, and 11 adult tissues from the Affymetrix public Human Exon 1.0 ST Array tissue panel data set.
Download figure to PowerPoint
A large proportion of the genes found in the fetal liver are also expressed in the yolk sac, including AFP. This may be due to their shared germ layer lineage (the gut tube is formed from the continuous sheet of embryonic endoderm lining the yolk sac), or similar biological function in early development. However, neither the GSEA gene sets nor the Exon Array data sets include human yolk sac samples. Therefore, to compare the transcriptional profile with genes expressed within the yolk sac, we examined the gene expression data in the Gene Expression Database (GXD) of the Jackson Laboratory Mouse Genome Informatics database . The GXD can be used to identify gene expression information throughout mouse development and includes details regarding the ages analyzed and assays used for determining gene expression. The data are curated from the published literature, and the assays to determine gene expression include immunohistochemistry, Western blots, Northern blots, RNA in situ analysis, RNase protection assays, and reverse transcription-PCR. A query for genes expressed in both the embryonic liver and the yolk sac identified 126 genes. In contrast, only five genes were identified as being expressed in the fetal liver of the mouse but not in the yolk sac. All five of these genes were also expressed in the AFP:eGFP+ cells (Table 1). For this analysis, a gene was considered expressed in the AFP:eGFP+ fraction if the average gene index from the three biological replicates was greater than 100. Furthermore, the GDX database query identified six genes that were expressed in the yolk sac but not expressed in the fetal liver. Only one of these genes, TEK (tyrosine kinase, endothelial), was expressed within the AFP:eGFP+ cells. Thus the AFP:eGFP+ cells showed liver-specific gene expression in 10 of the 11 genes differentially expressed between the mouse embryonic liver and yolk sac. A similar analysis was performed using Unigene's EST Profile Viewer (http://www.ncbi.nlm.nih.gov/sites/entrez). Expressed sequence tags for the four liver-specific genes found in the mouse analysis (Table 1; GRB2, HELLS, LGALS3, SLC20a2) demonstrated expression within the adult human liver. Furthermore, the yolk sac-specific genes (CITED1, ERG, MKX, and TEK) are absent from adult human liver (HAMP is present in adult liver, and there are no data for S100g).
Table Table 1.. Comparison of fetal liver- and yolk sac-specific genes
Finally, we compared our data with the gene expression analysis of definitive and visceral endoderm described by Sherwood et al. . In their work, they detected a large number of transcription factors with > 3-fold enrichment in visceral endoderm compared with definitive endoderm isolated from the stage embyronic day-8.25 mouse. Examination of 30 of the visceral endoderm enriched transcription factors that Sherwood et al. identified indicated that in our array data, 22 of those genes are expressed in the human liver, highlighting once again the similarity between those tissues. Of the eight visceral endoderm enriched genes identified by Sherwood et al. that we found were not expressed in the adult liver, three genes (Hoxb8, Nfatc1, Twist1) were present in both AFP:eGFP+ and AFP:eGFP− samples, but not enriched in either. The remaining five visceral endoderm enriched transcription factors (Cited1, Vdr, Lhx1, Sox7, Tfec) were absent from the AFP:eGFP+ cells. Thus overall this analysis supports the hypothesis that the AFP:eGFP+ cells are hepatic, not yolk sac, in origin.
Genes Enriched in AFP:eGFP+ Cells Compared with 14FP:eGFP− Cells
To gain deeper insight into the genetic differences between the AFP:eGFP+ and AFP:eGFP− cells, we performed a comparison of gene expression in the AFP:eGFP+ versus the AFP:eGFP− fractions using the program Significance Analysis of Microarrays (SAM) . Analysis was performed both with and without a log transformation of the unfiltered gene expression indexes. As expected the log-transformed analysis identified more genes with a low expression but high fold difference between the two samples, and the analysis using the raw gene indexes tended to identify genes with high expression values and lower fold difference between the two samples. The log-transformed analysis resulted in the identification of 472 genes whose expression was enriched in the AFP:eGFP+ fraction (supplemental online data 3). Two hundred nine genes were identified without the log transformation of the gene indexes. Seventy-two genes were identified by both methods, resulting in a total of 609 genes identified by SAM as being enriched in the AFP:eGFP+ fraction with a false detection rate of 0.14.
Unsupervised hierarchical clustering of the 609 genes enriched within the AFP:eGFP+ cells was performed with the dChip program  using the expression indexes of undifferentiated hESCs, human ESC-derived embryoid bodies, the AFP:eGFP+ and AFP:eGFP− cells, and a panel of 11 human tissues (supplemental online data 4). The largest cluster contains 163 genes whose expression is highest in the adult liver, further supporting the hypothesis that the AFP:eGFP+ cells are hepatic. This cluster includes genes such as albumin, transferrin, thrombin, transthyretin, vitronectin, hepatic lipase, fibrogen-alpha, -beta and -gamma, and eight members of the apolipoprotein family. In addition, three cell surface receptors for the hepatitis C virus, claudin one , CD81 [28, 29], and LDLR [30, 31], were found to be highly expressed in the AFP:eGFP+ cells, raising the possibility that these cells may be used for the in vitro study of hepatitis C infection.
Manual inspection of the 609 genes determined by SAM as enriched in the AFP:eGFP+ fractions identified numerous transcription factors and signaling molecules known to play a significant role in liver development (supplemental online data 5). Among the transcription factors, FOXA1, FOXA2, FOXA3, HNF4A, TCF2, PROX1, and CEBPA were all enriched in the AFP:eGPP+ cells, all of which are known to play important roles in liver development. Several genes involved in Wnt/β-catenin signaling were found to be enriched in the AFP:eGPP+ cells including FZD5, DACT2, and DKK3, consistent with the recent studies linking Wnt signaling with liver development. Other signaling molecules implicated in endoderm and liver development that were enriched in the AFP:eGFP+ cells include BMP2, FGFR4, KITL, and HABP2. Genes involved in signaling that have not previously been associated with liver development include CER1, COBL, and GMCL1. Finally, although the HGF receptor MET was enriched in the AFP:eGFP+ cells, HGF was not. Instead HGF was expressed at a higher level in the AFP:eGFP− cells, suggesting it plays an endocrine or paracrine role in this system.
Although examination of different tissue types clearly yields a gene signature consistent with liver, we were interested whether the AFP:eGFP reporter enriched for a specific hepatic cell lineage. We conducted a review of the literature and compiled a list of genes used to distinguish between hepatic stem cells, hepatoblasts, cholangiocytes, and mature hepatocytes. Although such a review is confounded by the fact that many of the reports vary in the embryonic stages examined, the methods of analysis, and experimental systems used, a small number of markers appeared useful for distinguishing the different lineages. A summary of our analysis is shown in Table 2. For this analysis, a gene was considered expressed in the AFP:eGFP+ fraction (denoted “+” in Table 2) if the average gene index from the three biological replicates was greater than 100. Genes that were identified by SAM as enriched in the AFP:eGFP+ fraction compared with the AFP:eGFP− fraction are listed as “enriched.”
Table Table 2.. Hepatic cell lineage markers
The recently identified hepatic stem cells are distinguished from hepatoblasts by their expression of NCAM and claudin 3, with an absence of expression of AFP . Mature hepatocytes can be distinguished from hepatic stem cells and hepatoblasts by the expression of dipeptidyl peptidase 4, CYP3A4, and Serpin A1 . Cholangiocytes are distinguished from the hepatic stem cells, hepatoblasts, and hepatocytes by their expression of cytokeratin 7 (also known as ck7 or KRT7) and their high expression of cytokeratin 19 (also known as ck19 or KRT19). As shown in Table 2, the AFP:eGFP+ cells were enriched for expression of the hepatoblast marker AFP, the mature hepatocyte marker dipeptidyl peptidase 4, and the cholangiocyte marker KRT7. The AFP:eGFP+ cells also expressed the hepatic stem cell markers NCAM and claudin 3. Therefore, following our differentiation protocol, purified AFP:eGFP+ cells express genes found in all three early hepatic cell types.