Role of HRTPT in kidney proximal epithelial cell regeneration: Integrative differential expression and pathway analyses using microarray and scRNA‐seq

Abstract Damage to proximal tubules due to exposure to toxicants can lead to conditions such as acute kidney injury (AKI), chronic kidney disease (CKD) and ultimately end‐stage renal failure (ESRF). Studies have shown that kidney proximal epithelial cells can regenerate particularly after acute injury. In the previous study, we utilized an immortalized in vitro model of human renal proximal tubule epithelial cells, RPTEC/TERT1, to isolate HRTPT cell line that co‐expresses stem cell markers CD133 and CD24, and HREC24T cell line that expresses only CD24. HRTPT cells showed most of the key characteristics of stem/progenitor cells; however, HREC24T cells did not show any of these characteristics. The goal of this study was to further characterize and understand the global gene expression differences, upregulated pathways and gene interaction using scRNA‐seq in HRTPT cells. Affymetrix microarray analysis identified common gene sets and pathways specific to HRTPT and HREC24T cells analysed using DAVID, Reactome and Ingenuity software. Gene sets of HRTPT cells, in comparison with publicly available data set for CD133+ infant kidney, urine‐derived renal progenitor cells and human kidney‐derived epithelial proximal tubule cells showed substantial similarity in organization and interactions of the apical membrane. Single‐cell analysis of HRTPT cells identified unique gene clusters associated with CD133 and the 92 common gene sets from three data sets. In conclusion, the gene expression analysis identified a unique gene set for HRTPT cells and narrowed the co‐expressed gene set compared with other human renal–derived cell lines expressing CD133, which may provide deeper understanding in their role as progenitor/stem cells that participate in renal repair.


| INTRODUC TI ON
Acute kidney injury (AKI), or synonymously acute tubular necrosis (ATN), is associated with an abrupt decline in renal function and is reported to have a mortality rate ranging from 18% to 80% depending on the severity of the insult. 1,2 The progression of AKI to chronic (CKD) or even end-stage renal disease (ESRD) correlates with the duration and severity of the acute injury. 3 It is well documented that the human kidney has the ability to regenerate damaged nephrons following AKI. 4 AKI primarily damages the cells comprising the proximal tubules, and it is these tubules where regeneration and repair are most evident. The ability of renal regeneration following AKI offers the potential opportunity for enhanced repair with slowing or eliminating progression to CKD. There is strong evidence that the cells that participate in tubule cell renewal are generated from within the kidney itself and are not recruited from non-renal sources. 5,6 Whether these cells are always present within the kidney or result from dedifferentiation of existing cells is presently not known with absolute certainty. A number of studies have shown that the progenitor cells capable of regenerating renal tubules, regardless of origin, are characterized by the co-expression of surface markers CD133 and CD24. [5][6][7][8][9][10] The CD133+/CD24+ cells can proliferate in primary cell culture with retainment of phenotype and having the capacity to differentiate both in vivo and in vitro. In the developing human kidney, the CD133+ cells are a subset of CD24+ cells, which constitute the metanephric mesenchyme-derived primordial nephron. 11 The study of human CD133+/CD24+ renal progenitor cells is difficult due to the acquisition of human tissue, the small population of such cells within the kidney, their limited duration of viability following direct isolation and their limited lifespan in primary culture. Using flow cytometry, this laboratory demonstrated that a human renal proximal epithelial cell line (RPTEC/TERT), which was isolated from renal cortical tissue and subsequently immortalized by transduction with human telomerase reverse transcriptase (hTERT), that consists of approximately 70% of the cell population expressed both CD133 and CD24, while the remainder expressed CD24 but not CD133. [12][13][14] The laboratory then employed cell sorting to establish a cell line that is composed of over 95% of cells expressing both CD133 and CD24. The HRTPT cells were shown to maintain the coexpression of CD133 and CD24 and cell phenotype following extended subculture. The HRTPT cells formed multicellular spheroids (nephrospheres), a characteristic feature of stem/progenitor cells, and formed branched tubule-like structures when grown on the surface of Matrigel and were able to grow and undergo neurogenic, adipogenic, osteogenic and tubulogenic differentiation. An identical protocol was used to generate a cell line from the RPTEC/TERT cell line that expressed CD24 in the absence of CD133. 12,13 This cell line, HREC24T, maintained its phenotype and the expression of CD24 and not CD133 over extended serial culture, but did not undergo spheroid formation and differentiation on the surface of Matrigel, nor neurogenic, adipogenic or osteogenic differentiation.
The objectives of the present study were to further characterize the gene expression of the HRTPT cells that might further define their role in renal epithelial cell regeneration. The first objective was to identify the differences in global gene expression when the HRTPT progenitor cells were compared to the HREC24T cells that do not display progenitor cell properties. The second objective was to use the differences in global gene expression to define cellular functions and pathways specific to each cell line. A third objective was to compare the CD133/24 co-expressing gene derived from the above comparison of HRTPT and HREC24T cell lines with publically available databases of renal CD133 expressing cells. The final objective was to determine whether the HRTPT cell line harbours a single or multiple CD133 expressing progenitor cell populations.

| Cell culture
Stock cultures of RPTEC/TERT1 cells were obtained from American Type Culture Collection and were grown using serum-free conditions as previously described by this laboratory. 15,16 The composition of the growth formulation was also as described previously. 12,13,15 Confluent cultures of the immortalized RPTEC/TERT1 cell line were sorted into two different cell populations, namely HRTPT (CD133+/CD24+) cells and HREC24T (CD133-CD24+) cells, using BD FACSAria (BD Biosciences) as detailed previously. 12 Both cell lines were subcultured at 1:3 ratio, allowed to reach confluence and then used in the described experimental protocols. For treatment with FGFR inhibitor, SU5402 (Selleck Chemicals), HRTPT cells were grown to confluency and subcultured into growth media containing 5uM SU5402 inhibitor or DMSO for control, grown to confluency and harvested to obtain pellets for RNA and protein analyses.

Significant statement
Cultures of human renal epithelial cells and their tertimmortalized counterparts routinely express the surface markers of renal progenitor cells, CD24 and CD133. All cells express CD24, but a large minority fraction does not express CD133. Only those cells that express both markers have full differentiation potential. Here, we compare the global gene expression of CD24+/CD133+ (HRTPT) with cells that express only CD24+ (HREC24T). The CD133specific gene set was used to compare other human renal progenitor cell global gene expression data sets: infant progenitor cells and renal progenitor cells isolated from urine. A common gene set consisting of 92 genes was identified that defines the organization and interactions of the apical membrane. This gene set will provide a deeper understanding of the regulation of the renal progenitor cell population and the process of renal repair.

| Sample preparation for microarray and singlecell analysis
HRTPT and HREC24T cells were harvested, and their respective RNAs were isolated using RNeasy Mini Kit using the manufacturer's protocol (Qiagen). Triplicate samples for each cell line were diluted to 100 ng/ul, packed in dry ice and transported to Genome Explorations, TN, for microarray analysis. For single-cell analysis, the sample was prepared using Genewiz 10x genomic chromium single-cell RNA-seq protocol. Following confluency, HRTPT cells grown in three separate T-25 flasks were trypsinized, centrifuged and resuspended in the growth media in separate tubes. To obtain a single-cell suspension, the cells were gently pipetted up and down for several minutes. The viable cells were counted by mixing with 4x trypan blue using a haemocytometer, diluted to at least 1x 106 cells/ml, centrifuged and resuspended in 500ul of ice-cold DMEM containing 20% serum and 10% DMSO and then transferred to cryotubes sandwiched between two styrofoam tube holders and stored in the −80 ℃ freezer for at least 4 h. The samples were packed in dry ice and shipped to Genewiz for singlecell RNA sequencing.
For microarray analysis, the samples were processed and hybridized onto HTA2.0 arrays with a quality control (QC) report provided by the company, which showed all samples passed the QC test. The oligo (https://bioco nduct or.org/packa ges/relea se/bioc/html/oligo. html), frozen RMA and Barcode (https://www.bioco nduct or.org/ packa ges/relea se/bioc/html/frma.html) packages of Bioconductor were used to preprocess microarray data using R. The data used for the analysis were log-transformed and pareto-scaled. After preprocess and transformation, the density distribution of data shows the uniform distribution across all samples ( Figure S5). The robustness of the correlation across samples was tested using the multiscale bootstrap resampling method to avoid any technical/biological bias of data. The single-cell sequencing data were analysed using the Loupe Browser provided by 10x Genomics (https://www.10xge nomics. com/produ cts/loupe -browser).

| RNA isolation and digital droplet polymerase chain reaction (ddPCR)
Total RNA was isolated using Tri Reagent (Molecular Research Center, Inc.) as described previously. 17 The measurement of mRNA expression of selected genes was assessed using ddPCR and commercially available primers (Bio-Rad). The list of primers is provided in Table S1. For analysis, 100 ng of total RNA was subjected to complimentary DNA (cDNA) synthesis using the iScript cDNA Synthesis Kit (Bio-Rad) in a total volume of 20 μl. Digital droplet PCR was performed utilizing the QX200 Droplet Digital PCR System (Bio-Rad) that measures absolute copy number per µl of reaction mix. In brief, 12.5ul QX200 ddPCR EvaGreen Supermix, 2.5 ul of cDNA template and 10 ul of PrimePCR SYBR Green assay primers were mixed in the 96-well plate and placed in the QX200 Droplet Generator. After this step, the plate was sealed and transferred to a 96-well plate thermal cycler for PCR amplification at an annealing temperature of 60 0 C, and then placed in a QX200 Droplet Reader (Bio-Rad) instrument, which generates the absolute quantification for each droplet per well. All samples were tested with ddPCR system in triplicates and represented as ±S.D.

| Western blot analysis
The cell pellets were lysed using ice-cold 1x RIPA buffer (Santa Cruz Biotechnology) followed by sonication, and the content was agitated for 30 mins on ice. Then, the tubes were centrifuged at 4 0 C and the supernatant was collected in fresh tubes. Protein concentration was determined using Pierce BCA Protein Assay Kit (Thermo Fisher). The protein levels were assessed using automated protein separation and immunodetection, Jess Simple Western System, and all of the steps were performed using the instructions and reagents provided by the manufacturer (ProteinSimple). In brief, each sample was diluted to 0.5 ug/ul in 0.1x sample buffer, combined with 4x fluorescent master mix in 4:1 ratio and denatured at 95 0 C for 5 mins accompanied by biotin ladder. The tubes were quickly vortexed and centrifuged for a min, and then, the recommended volume for samples, appropriate dilution of primary and secondary antibodies and luminol-peroxide mix were loaded into a 12-230 kDa Jess 13-well capillary plate for separation, centrifuged and then ran in the Jess instrument. The list of the primary and secondary antibodies is provided in Table S2.
The relative protein expression was determined by the ratio of area under the peak to that of βeta-actin generated by the Compass for SW software 5.0.1 (ProteinSimple).

| Data
To overcome the lack of reference human kidney biopsy-derived renal progenitors, we performed a meta-analysis comparing our data with other data sets collected from NCBI GEO. The three data sets have been collected to validate the findings, (1) GSE128281, the microarray data of human urine-derived renal progenitor cells

| Statistics
The machine-learning approaches and classifiers were used to understand the sample characteristics; principal component analysis (PCA) 18 was used to find the sample distribution between the groups. The goal here is to determine the distribution of samples and visualize how the global gene expression profile scattered in different groups. The correlation between the samples was calculated using Pearson's coefficient, 19 and the heatmap method 20 was used to plot the correlation coefficient value to find the most correlated samples. The Volcano plot 21 was used to demonstrate the fold change (log2 ratio) difference against the absolute confidence (-log10 adjusted P-value) measured between the two groups. Each dot on the plot represents one gene (coloured, significant genes; black, non-significant). The statistical significance of each gene within each data set was calculated by running t tests 22 between the categories for conditions. The significance level of P-value ≤0.05 was employed as a standard to filter genes. Further, the pathway and function enrichment analyses of the significant genes were carried out using Ingenuity Pathway Analysis (IPA, QIAGEN Inc.), David and Reactome. [23][24][25][26] Statistical analysis for ddPCR and protein-level data consisted of one-way ANOVA with Tukey's or Sidak's multiple comparisons testing performed by GraphPad Prism 8. All experiments were performed in triplicates, and the data are plotted as the mean ±SE of triplicate determinations.  (Table S3). The gene functional classification using DAVID software identified 16 gene groupings with enrichment scores from 0.15 to 3.61. Six of the gene groups display enrichment scores greater than 2.5 ( Figure 2A). A pathway analysis using Reactome produced the 25 most significant pathways with 9 pathways having P ≤0.0001 ( Figure 2E). An analysis by Ingenuity identified the top 5 canonical pathways out of which remodelling of adherens junctions and junctional signalling was the most prominent, from the unique set of HRTPT genes ( Figure 2D).

| Significantly different gene expressions between HRTPT and HREC24T cell lines
The tubulin genes were identified as a prominent gene grouping using David (Figure 2A), and also prominent entities in the initial 8 pathways were identified using Reactome ( Figure 2E). Taken together, the majority of the results indicate that the major difference between the co-expressing CD133 and CD24 cells when compared to those not expressing CD133 is centred around the organization of the apical membrane and its interaction with the extracellular environment.

| Validation of significant genes in HRTPT cells analysed by DAVID, reactome and ingenuity pathway analysis
The expression of many of the genes identified in the above groupings and pathways was further examined using qPCR to validate the array and determine expression levels of the genes between the HRTPT cells and the HREC24T cells ( The tubulin genes were identified frequently, both as a gene group when analysed by David and as a major participant in the first 8 of the 9 pathways identified using Reactome (Figure 3Bi-v, Figure 2A and 2E). RHO-GTPases were also identified as a gene group and in one of the pathways identified by Reactome ( Figure 2E). The annexins and integrins were identified as a gene group, but not associated with a specific pathway. However, they would likely par-

| FGFR, FGF expression and treatment with SU5402 in HRTPT cells
The HRTPT expression was further analysed using only the upregulated 1117 probes (653 unique genes) from global gene expression analysis ( Figure 1C, Table S3

| Analysis of HRTPT downregulated genes
Similar to the above section, the HRTPT-downregulated 220 genes (366 probes) were further examined to find the functional associations ( Figure 1C, Table S3). The pathway analysis by Reactome,  family members were identified as specific to the HREC24T cells (Table S4). All three analysis programmes identified transport as a feature of the HREC24T cells. Related to these transport processes were interactions between the cell surface and the ex-

| Association between HRTPT cells and infant kidney expressing CD133 progenitor cells
The gene set identified for the HRTPT cells as described above was compared with a publicly available database (GSE90628) that compared the expression of CD133-and CD133+ cells isolated from the human infant kidney biopsies. 27 The infant kidney-derived cells were isolated, and sorted by FACS, and the CD133+ and CD133-fractions were placed into cell culture for 2 passages before microarray analysis. The microarray analysis identified 3,948 genes with P < 0.05 that were uniquely expressed by the CD133+ cell fraction extracted from data sets for GSE90628 (Table S5). To identify the functional association, Reactome and Ingenuity profiles were compared between the 3948 CD133+ infant kidney cell gene set (Table S6B-E) and the 873 HRTPT cell gene set ( Figure 2B-E). Reactome analysis demonstrated that there were no common pathways between the infant kidney cells expressing CD133 and the HRTPT cells ( Figure 2E and Table S6B). Ingenuity Pathway Analysis showed no similarity for the top canonical pathways ( Figure 2D and Table S6C), but did show high symmetry within the molecular and cellular function, which included cellular movement, cellular development, and cell death and survival being common between both sets of genes ( Figure 2B and Table S6D). Ingenuity Pathway Analysis identified dexamethasone, ESR1 and beta-oestradiol as upstream regulations, as well as TNF and TGFB1 (Table S6E). An analysis of gene groups employing David software could not be performed due to the large size of the set of the infant kidney genes (Table S6A).

F I G U R E 3 Expression of gene groups (David) in the HRTPT and HREC24T cells. qPCR analysis of
An analysis to determine the common genes between the 873 HRTPT gene set and the 3,948 CD133+ infant gene set identified 332 co-expressed genes ( Figure S3). To validate the functional association with previous identified pathways between two sets of genes, we identify the pathways associated with 332 co-expressed gene set using David, Reactome and Ingenuity software (Table S7A-E). When compared to previously identified pathways (ie Figure 2 and Table S6), Ingenuity software provided the strongest comparison among the pathways with agreement in the area of molecular and cellular function, identifying cellular movement, cellular development, and cell death and survival as top 5 elements (Table S7D). In addition, both cellular growth and proliferation, and cell-cell signalling were identified in both 873 HRTPT gene sets and 332 co-expressed gene sets ( Figure 2B and  (Tables S6E and S7E). There was no similarity of pathways identified by Reactome and the top canonical pathways determined by Ingenuity software among the 332 co-expressed genes and the other 2 gene sets ( Figure 2D and 2E, and Tables S7B and S7C, Tables S6B and S6C. There was one gene group identified by David that had identity with the 873 HRTPT gene set, identifying integrins A1, A2, B3 and B8 (Figure 2A and Table S7A). The expression of FGF9 was identified in the 332 com-

| Association between diverse urine-derived progenitor cells, infant kidney CD133+ Cells and HRTPT cells
Urine from healthy human donors has been reported to be a readily available, non-invasive source for the isolation and culture of human renal progenitor cells. 28 The study isolated 9 putative renal    (Table S8). However, the previous analysis showed that CD133  (Table S9A and E).
Overall, as the number of genes in the data set narrowed, ESR1 and beta-oestradiol were identified as upstream regulators and cellular movement and cell-cell signalling as top canonical pathways (Table S9D,E). In most data sets, the integrins, cell junctions, cell surface interactions and receptors were identified frequently in gene groupings and Reactome pathways (Table S9A,B). The ESR1 was validated for expression in the HRTPT cell line and found to be near the background for detection ( Figure S1H).

| Single-cell analysis of HRTPT cells
The laboratory employed scRNA-seq technology to find the clusters of similar cells by examining the higher expression of single gene CD133 ( Figure 7A) and the set of genes named as common 92 ( Figure 7B) and common 332 ( Figure 7C). The entire cell population was divided into six clusters and examined the log2 feature maximum count to find the similarities between PROM1 and gene sets at cellular level to better understand the function of an individual cell in the context of its microenvironment. The outcome shows that the PROM1 and common 92 gene set are very close in terms of cellular pattern of log2 expression average of genes in that cell ( Figure 7A and 7B) and partially enriched in the cluster 1, cluster 2, cluster 4 and cluster 6 only ( Figure 7C).

| DISCUSS ION
The majority of laboratories studying renal tubule progenitor cells identify this population based on the co-expression of CD133 and CD24 within the kidney and in renal-derived cell cultures. 7,8,13,28,29 This laboratory was able to sort the HRTPT cells isolated from a renal following ischaemia/reperfusion. 31 The only gene grouping that did not appear to be linked to microtubules and cytoskeleton was the neuroblastoma breakpoint genes that have not been highly studied, but do have a role in kidney development. 35 Overall, the analysis of global gene expression allowed the definition of genes and pathways associated with a CD133 and CD24 co-expressing cell line, known to have features associated with renal progenitor cells.
The association of renal CD133 with the progenitor phenotype appears to be largely accepted for renal tubular progenitor cells. [5][6][7][8][9][10]28,29 Therefore, with the goal to identify the genes essen-  There were two preliminary findings in the study that will require further investigation. The first is the relationship between PROM1 (CD133) and PROM2 expression, the major question being whether PROM2 can substitute wholly or partially for the function of PROM1 as the comparison of two publically available dataset representing renal progenitors showed expression of PROM2 but no co-expression of CD133 when compared to the HRTPT data set. PROM2 has seen only limited study. The tissue distribution of PROM2 is very similar to PROM1, being highly expressed in the adult kidney and other epithelial tissues. 45 PROM2 was shown to be co-localized with PROM1 in its association with plasma membrane protrusions. Several other studies support at least a partial role for PROM2 in its ability to substitute for PROM1 in some situations. [46][47][48] In the PROM1 KO mouse, some stem

| CON CLUS ION
Unique gene sets specific to HRTPT cell line, having co-expression of CD133 and CD24 and that displayed progenitor-like characteristics, and HRECT24 cell line, expressing only CD24 that did not display progenitor characteristics, were identified providing the differences between the two cell population at the gene level.