A functional genomics approach to investigate the differentiation of iPSCs into lung epithelium at air‐liquid interface

Abstract The availability of robust protocols to differentiate induced pluripotent stem cells (iPSCs) into many human cell lineages has transformed research into the origins of human disease. The efficacy of differentiating iPSCs into specific cellular models is influenced by many factors including both intrinsic and extrinsic features. Among the most challenging models is the generation of human bronchial epithelium at air‐liquid interface (HBE‐ALI), which is the gold standard for many studies of respiratory diseases including cystic fibrosis. Here, we perform open chromatin mapping by ATAC‐seq and transcriptomics by RNA‐seq in parallel, to define the functional genomics of key stages of the iPSC to HBE‐ALI differentiation. Within open chromatin peaks, the overrepresented motifs include the architectural protein CTCF at all stages, while motifs for the FOXA pioneer and GATA factor families are seen more often at early stages, and those regulating key airway epithelial functions, such as EHF, are limited to later stages. The RNA‐seq data illustrate dynamic pathways during the iPSC to HBE‐ALI differentiation, and also the marked functional divergence of different iPSC lines at the ALI stages of differentiation. Moreover, a comparison of iPSC‐derived and lung donor‐derived HBE‐ALI cultures reveals substantial differences between these models.

in 3). Here, we focus on the epithelium of the human airway that is intimately involved in the aetiology of many common respiratory diseases including chronic obstructive pulmonary disease (COPD), asthma and cystic fibrosis. Among the best resources for investigating human bronchial (HBE) and tracheal (HTE) epithelial cells are primary cultures derived from lung tissue and grown on plastic or differentiated on permeable supports. [4][5][6] However, these are always in limited supply as they are dependent on the availability of suitable donor organs and tissue. More recently, robust protocols were developed to differentiate iPSCs into lung epithelial cells at air-liquid interface (ALI), thus modelling the HBE cells on permeable supports. [7][8][9][10][11][12][13][14][15][16] Extensive characterization of the iPSC-to-ALI cell cultures has largely focused on their relevance to disease states (particularly cystic fibrosis 10,17 ) and also generated elegant data on aspects of stem cell biology and the detailed lineages of the cell types. [18][19][20][21][22][23][24] However, complete maturation of iPSC-derived ALI cultures into functional HBE-ALI cells is often challenging. This obstacle may depend not only on the differentiation capacity of individual iPSC lines but also on the absence of appropriate biochemical and/or physical cues. [25][26][27] Our goal was to develop a robust and unbiased functional genomics pipeline to benchmark the differentiation of iPSC lines from different sources cultured according to diverse protocols in individual laboratories. Here, we utilize genome-wide methods to examine the gene regulatory networks that drive the pathways of differentiation from iPSCs to ALI cultures and compare the model ALI cultures with HBE-ALI cells. By intersecting open chromatin data (generated by Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) 28 ) with the transcriptome (generated by RNA-seq) of defined stages of differentiation of the iPSC to ALI pathway, we reveal the dynamic changes in the transcriptional networks that drive these alterations. We also show the divergence in the molecular signatures of iPSC-derived ALI and lung donor-derived HBE-ALI cells, which may provide insights into the factors that are required to overcome the hurdles of complete maturation of the model ALI cells into mature HBE cells. of Genetics and Genome Sciences, Case Western Reserve University). All three lines were from male donors. Differentiation of iPSC to ALI culture was based on the method by Wong et al, 10 with minor modifications in coating of the membrane inserts (Corning 346012 mm Transwell inserts) for the differentiation protocol. After testing different coating methods described by others, 10,11 we found Matrigel (Corning 35623, 1/60 dilution) with collagen IV (60 µg/mL) supported the most robust and consistent ALI cultures. The stages of differentiation were confirmed by 1) RT-qPCR assays specific for NKX2.1, FOXA2, SOX2, SOX9, SOX17, MUC5AC, MUC16, FOXJ1, TP63 and KRT5 transcripts, using primers shown in Table S1; and specific antibodies to CXCR4

| RNA preparation, reverse transcription quantitative PCR (RT-qPCR) and RNA-seq
Total RNA was extracted from iPSCs at different stages of differentiation using TRIzol (Invitrogen) using their protocol. RT-qPCR was done using our standard protocols 29 with primers shown in Table S1.
Six to seven libraries were pooled, per lane, and sequenced on a HiSeq4000 (Illumina) using 100 bp paired-end sequencing.

| RNA-seq analysis
Raw reads were aligned with STAR 2.6 (https://github.com/alexd obin/STAR). 32 Aligned reads were then assigned to genomic features with featureCounts version 1.6.3 in the Subread package (http:// subre ad.sourc eforge.net/) 33 and differential gene expression was analysed using DEseq2 version 1.22.1 (https://www.bioco nduct or.org/packa ges/relea se/bioc/html/DESeq2.html). 34 Raw reads of six untreated healthy donor-derived HBE-ALI samples were acquired from the GEO database Series GSE97036. 35 Data sets were processed using the same pipeline. The cells used in this series were as described above for donor-derived HBE cells.

| Gene ontology analysis
Differentially expressed genes were filtered to enrich for those with a fold change ≥1.5 and Benjamini-Hochberg adjusted P-value ≤ 0.01. Gene lists were read into the PANTHER gene ontology database (http://panth erdb.org/) 36 using the biological processes term. Results were further validated using gProfiler. Statistically significant results were filtered for categories passing a P-value of 0.001 with the Bonferroni correction for multiple testing.

| ATAC-seq data processing
Raw reads from the HiSeq4000 were processed using the ENCODE ATAC-seq pipeline (Dec 2018)(https://github.com/kunda jelab/ atac-seq-pipeline). Replicates were used to generate peaks with an IDR threshold of 0.5. Analysis of enriched motifs within the IDR peaks was performed with HOMER (4.7.2q) (http://homer.ucsd. edu/homer/ index.html). 37 39 Cuffdiff output for pairwise comparisons was processed against ATAC-seq peak files for each stage in the pair independently using the BETA software version 1.0.7 (http://cistr ome. org/BETA/) 40 to determine peak distribution at differentially expressed genes (DEGs). Peaks were chosen within 20 kb from the gene body, and function prediction was determined by the rank product of the estimated regulatory potential of factor binding and change in expression of the gene target.

| Immunofluorescence
iPSC ALI cultures (week 5) on membrane filters were harvested and fixed in 4% paraformaldehyde (in PBS), embedded in paraffin and cut into 5-μm sections. After deparaffinization and rehydration, antigen retrieval was performed by heating with citric acid (10 mmol/L,

| RE SULTS
Our experimental and analysis pipelines are summarized in Figure 1.
In initial experiments, we differentiated three iPSC lines, ND2.0, ND1.4 and CWRU205, which came from 2 different sources. We followed the standard protocols as described in the methods section to culture the iPSCs and differentiate them into airway epithelial cells at ALI ( Figure 1A). Cells were collected for ATAC-seq and RNA-seq at 6 time-points: definitive endoderm (DE), anterior foregut endoderm (AFE) and lung progenitor cells 3a, 3b (LP3a, LP3b), at air-liquid interface after weeks 3 and weeks 5 (ALIw3 and ALIw5). The ND1.4 iPSC line repeatedly failed to differentiate fully to the ALI stage (either through not adhering to membrane supports at the transition to AFE cells, or due to reduced cell proliferation starting at LP3a), so this line was not analysed further. Open chromatin and RNA-seq data were derived from cells in multiple differentiation processes of the ND2.0 and CWRU205 lines. However, data presented here are from one differentiation process where these 2 lines were grown in parallel using the same batches of culture media, supplements and culture inserts, to minimize variation that was not inherent to the cells. The RNA-seq and ATAC-seq pipeline are summarized in Figure 1B CTCF is the most overrepresented motif at DE, AFE and ALIw3 stages ( Figure 2C i, ii, iv). Consistent with its pivotal role in organizing higher order chromatin structure genome-wide, the CTCF motif is among the top 7 motifs at all stages of differentiation ( Figure 2C, Figure S1). Among TFs with pioneer activity (opening chromatin to increase accessibility for other TFs) and also with

| The transcriptome through differentiation from iPSC-derived definitive endoderm through to air-liquid interface lung epithelial cell cultures
To compare temporal changes in gene expression, RNA-seq was performed at each stage of differentiation in the ND2.0 and CWRU205 iPSC lines. Gene expression values were quantified using the featureCounts software in the Subread package (r3.1) and normalized for sequencing depth and RNA composition with DESeq2. The PCA plot in Figure 3A shows a good correlation between the samples from the two lines at each stage until the ALI cultures. Normalized count distributions for the two iPSC lines consistently clustered together across all principal components.
Once the cells were exposed to ALI conditions, each line remained more similar to itself between 3 and 5 weeks at ALI than to the other line at the same time-point. To accommodate the variance between the two lines at the ALIw3 and ALIw5 stages, results for each time-point were used as biological replicates. Following adjustment for sequencing depth between samples, we identified 8940 differentially expressed genes (P < 0 .005) at one or more stage in the full developmental pathway from DE to ALIw5.
A heatmap of the expression profile matrix for the top 1000 differentially expressed genes (DEGs) revealed several of these trajectories including genes most up-regulated in the early stages or those that are ALI-specific ( Figure 3B). Among the most significant To confirm the differentiation protocol was as expected based on previous publications, 9,10 we examined the expression of a number of known markers by both RT-qPCR and RNA-seq, including the key TFs

| Expression of genes involved in key pathways of lung development shows substantial changes between stages of differentiation
To further delineate the biological functions that are significantly altered across individual stages of differentiation, we filtered the DEG data sets for each linear pairwise RNA-seq comparison. Here, we focused on transitions between DE to AFE ( Figure 5A) and LP3b to ALIw3 ( Figure 5B) due to their relevance to early lung tissue development and airway epithelial cell identity, respectively.
DEGs passing a 0.01 adjusted P-value threshold were stratified by fold change of ±2 and then analysed using the PANTHER database (r5.1) and gProfiler (r5.2) to identify biological process enrichment gene ontology terms.
In the transition from DE to AFE, 633 genes were down-regulated ( Figure 5Ai) and 1107 genes were up-regulated (Figure 5Aii), illustrating the occurrence of both substantial repression and activation of genes during this key developmental modification. A   Table S5). These include defence responses, including innate immunity, inflammatory response and several metabolic pathways. Among the ALIw3-specific genes contributing to these processes are multiple cytokines such as C-C motif Chemokines

| Intersecting open chromatin with differential gene expression through differentiation from DE to ALI cultures
Next, we investigated the correlation between open chromatin within 20 kb of gene bodies and differential gene expression from

| Comparison of iPSC-derived ALI open chromatin and transcriptome with donor-derived HBE-ALI cultures
A critical feature of the iPSC ALI cultures is how well they mirror the transcriptome and functions of ALI cultures generated from adult human bronchial epithelial cells derived from donor lung tissue. We observed significant differences in both open chromatin ( Figure 7A) and gene expression profiles ( Figure 7B) between the ALI cultures of different origin. The PCA plots of ATAC-seq peaks show that while biological replicas of iPSC and donor-derived ALI cultures cluster well by origin, they are far from each other, and there is more similarity between the two iPSC lines than the two donors ( Figure 7A).
With respect to gene expression profiles ( Figure 7B), the six repli-  Figure 7F). CD47 provided a negative control for this analysis, as neither the normalized counts from DESeq2 nor the ATAC-seq profiles ( Figure 7E and Figure S5) vary between the cultures.  (formerly NRF2) 71 and ELF5. 72 The fact that these TFs are present in the cells but are apparently not being selectively recruited to cis-regulatory elements (open chromatin peaks) suggests that critical co-factors required for DNA-binding might be absent or mis-expressed. Alternatively, these TFs may not be appropriately activated by external stimuli in the iPSC-derived ALI cultures, including from other cell types, and so cannot bind DNA. It is also likely that some of the cell types in the donor-derived HBE-ALI cultures that contribute to the motif enrichment signal are either absent or at lower abundance in the iPSC-derived ALI cells.

| D ISCUSS I ON
In conclusion, we provide a detailed molecular characterization of the gene expression repertoire and the activating and repressive TFs that direct the differentiation of iPSCs into lung epithelial cells at ALI. Our data show that though this model to study lung epithelial F I G U R E 7 Comparison of ATAC-seq and RNA-seq profiles of donor-derived ALI-HBE and iPSC-derived ALI-HBE cultures. A, PCA plot for ATAC-seq peak distribution comparing primary HBE-ALI cultures from 2 donors (DD076O and DD078O) and 2 samples each from iPSCderived HBE-ALI cultures CWRU205 and ND2.0. B, PCA plot for RNA-seq gene expression data comparing HBE-ALI cultures from 6 donors and 2 replicas each from ALI-HBE ND2.0 and CWRU205. C, HOMER analysis showing the top 24 known TF binding motifs overrepresented in ATAC-seq peaks from donor-derived HBE-ALI cultures. * and blue font denotes motifs also enriched in top 24 known motifs in iPSCderived ALIw3 cells (Figure 2Civ). D, Heatmap of the top 8645 DEGs comparing donor-derived primary HBE-ALI to iPSC-derived ALI-HBE samples passing a 0.01 adjusted P-value threshold. Each donor for primary cells or iPSC-derived cultures was used as a replicate for hierarchal clustering. E, Normalized sequence counts comparing donor-derived and iPSC-derived cells for CD47 control and FBLIM1 and SCNN1B DEGs. F, UCSC genome browser graphic of ATAC-seq tracks (IDR) for donor-derived and iPSC-derived HBE-ALI cultures at the FBLM1 and SCNN1B loci. Arrows show sites of increased chromatin accessibility in cells with higher gene expression function in health and disease has many advantages, according to the protocols we used it currently falls short of producing a fully differentiated functional airway epithelium. This may reflect contamination with non-lung fated cells together with a lack of appropriate environmental cues, which may be biochemical (such as appropriate growth factors) or physical (lack of suitable matrix or substrate and stretch forces 73 ), which could result in failure to produce rare but critical cell types. Further higher resolution molecular analysis by single cell RNA-seq and single cell ATAC-seq may reveal the absent components. The application of our analysis pipeline to other protocols for the differentiation of iPSC into lung epithelial cells 13,14,19,[21][22][23]74 would likely be highly informative. (NIH Grant S10-RR021228).

CO N FLI C T S O F I NTE R E S T
The authors confirm that there are no conflicts of interest. The funders had no role in the design of the study, in the collection, analyses or interpretation of the data, in the writing of the manuscript, or in the decision to publish the results.