Fetal neural stem cells (NSCs) have received great attention not only for their roles in normal development but also for their potential use in the treatment of neurodegenerative disorders. To develop a robust method of assessing the state of stem cells, we have designed, tested, and validated a rodent NSC array. This array consists of 260 genes that include cell type-specific markers for embryonic stem (ES) cells and neural progenitor cells as well as growth factors, cell cycle-related genes, and extracellular matrix molecules known to regulate NSC biology. The 500-bp polymerase chain reaction products amplified and validated by using gene-specific primers were arrayed along with positive controls. Blanks were included for quality control, and some genes were arrayed in duplicate. No cross-hybridization was detected. The quality of the arrays and their sensitivity were also examined by using probes prepared by conventional reverse transcriptase or by using amplified probes prepared by linear polymerase replication (LPR). Both methods showed good reproducibility, and probes prepared by LPR labeling appeared to detect expression of a larger proportion of expressed genes. Expression detected by either method could be verified by RT-PCR with high reproducibility. Using these stem cell chips, we have profiled liver, ES, and neural cells. The cell types could be readily distinguished from each other. Nine markers specific to mouse ES cells and 17 markers found in neural cells were verified as robust markers of the stem cell state. Thus, this focused neural stem array provides a convenient and useful tool for detection and assessment of NSCs and progenitor cells and can reliably distinguish them from other cell populations.
Neural stem cells (NSCs) and progenitor cells are present throughout development and persist in the adult [1–3]. Multiple classes of NSCs and progenitors have been identified, and these cells differ from each other in their differentiation ability, cytokine response, and antigens expressed [3, 4]. Moreover, as cells have been evaluated for therapy it has become clear that non-neural cells may also transdifferentiate or dedifferentiate into neural derivatives upon exposure to appropriate differentiating agents [4, 5]. Furthermore, embryonic stem (ES) cells may be used to obtain NSC and progenitor cells, and these differentiated cells must be distinguished from ES cells to prevent transplantation of potentially teratocarcinoma-forming cells [5, 6]. Finally, NSCs differ from many other stem cell populations in that investigators have developed techniques to propagate them in continuous culture. However, it has been noted that cultures may spontaneously differentiate or alter characteristics sufficiently, such that their stem cell character is altered or lost. These observations highlight the importance of careful monitoring of the properties of the cells harvested and the quality of the cells maintained in culture using a reliable and reproducible method that is low cost and readily available to most laboratories involved in stem cell research.
What is also clear is that few unique markers for NSCs exist. Many of the markers previously thought to be specific for stem cells have been shown to be expressed by differentiated cells (nestin, nucleostenin, musashi, etc.) or shared with other stem cell populations (e.g., ABCG2, telomerase reverse transcriptase [TERT], telomerase activity), and no single positive marker specific for NSCs is currently available. Rather, laboratories have successfully isolated stem cell populations from mixed cultures using a combination of markers. Presence of candidate stem cell markers, such as AC133, Hoechst dye labeling, Sox-1, Sox-2, and TERT, have been combined with negative selection using the absence of a battery of markers to detect and enrich stem cell populations .
Evaluating the purity and state of NSC populations therefore relies on using a battery of markers. Different laboratories utilize some subset, often nonoverlapping, of known markers. Given the amount of tissue or cells required for the variety of tests and the difficulty in obtaining, maintaining, and validating antibody, Western blot, polymerase chain reaction (PCR), or Northern blot markers, consistent evaluation of neural populations across laboratories and even between one batch of cells and another has been difficult.
Traditional assay techniques such as Northern blot, ribonuclease protection assay, or more recent techniques such as quantitative real-time reverse transcriptase-PCR (RT-PCR), are suitable for studying one or a few genes in a single assay. The ideal tool required would be one that is capable of studying gene expression of related gene sets simultaneously. If such a sufficiently reliable tool existed, then not only could cell type-specific markers be studied, but additional candidate genes known to be expressed at the appropriate stage of development could be rapidly evaluated as well. Such genes could be growth factors, extracellular matrix (ECM) molecules, chemokines, or key regulators of cell proliferation or apoptosis. Thus, developing a reliable gene expression analysis tool for stem cell-specific gene sets will not only enhance the ability to characterize stem cells but also help in understanding the mechanisms that regulate stem cell differentiation.
cDNA microarray technology, a relatively new technology that allows simultaneous assessment of the expression of thousands of genes, is a potential candidate assay method. This method has been successfully used to identify gene expression patterns associated with specific biological functions [5, 8] and has been proven to be reliable and reproducible [5, 8]. However, there are a number of technical limitations and problems that are associated with most commercially available microarrays that limit their use for assays, such as the one we have proposed.
The first is the limitation of gene coverage. Although high-density microarrays contain thousands of genes on a single glass slide, they usually fail to provide full gene coverage of specific gene groups necessary for a particular application. The incomplete coverage of the array may come from the fact that: A) collection of genes in high-density cDNA arrays is usually generated without specific application in mind, and B) the majority of genes involved in regulation of cellular activities and cell differentiation are often only expressed for a very short period of development . It is therefore not unusual that many of those cell development and differentiation importance genes are not included in high-density microarrays. For example, only 49 of the 70 known interleukin and receptor-related genes are present on one commercial array. Likewise, few of the 23 known fibroblast growth factor (FGF) genes are present on any of the large microarrays that are available.
The second problem with large-scale arrays is cDNA fragment selection and quality. The cDNA fragments used in the microarrays are usually 3′ end biased since they are generated from oligo(dT)-primed cDNA synthesis. The 3′ end region is not necessarily the most gene-specific region for some genes and is often a poor choice for a gene-specific probe. The third problem is its complexity of data collection and analysis. In fact, the cost and complexity involved in special experiment equipment acquirement and complex data analysis are two barriers that prohibit most research laboratories from using microarray technology as a routine research tool. Thus, a general high-density cDNA array may not be the best tool for gene-expression profiling in stem cell research.
In this paper, we report a mouse stem cell array containing a specific set of genes related to stem cell proliferation and differentiation. We have collected, cloned, and validated 260 genes for this array. The chip is composed of known molecular markers for ES cells, neural progenitor cells, growth factors, cell cycle-related genes, and ECM molecules. The genes were printed as a validated set of 500-bp (approximate) PCR products. Testing these chips using either conventional Moloney murine leukemia virus (MMLV) RT or the linear polymerase replication (LPR) method showed that the arrays were of high quality and provided reliable detection and reproducibility. Using these stem cell chips, we can identify different gene expression patterns in tissues formed by different cell types, demonstrating the potential use. Overall, we conclude that this chip provides a convenient and useful tool for research in the stem cell field.
Materials and Methods
Primer Design, Clone Preparation, and Verification
Primers to amplify genes were designed by using a computer program developed by SuperArray Bioscience Corporation (Frederick, MD; http://www.superarray.com). Briefly, a list of candidate genes was prepared. The National Center for Biotechnology Information (NCBI) Unigene number was obtained from the NCBI database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene). The longest cDNA sequence among the Unigene cluster was then picked for cDNA fragment selection. From the longest cDNA sequence, PCR primer pairs were then designed by using Primer 3 software to obtain 300-600 base products. The resulting candidate cDNA fragment set was subjected to a basic local alignment search against all Genbank sequences. The fragment with the least sequence similarity against other genes was then chosen as the cDNA probe. Corresponding primers for syntheses of these gene-specific cDNA fragments were chosen and used in processes of cloning, RT-PCR, and labeling of probes with the AmpoLabeling kit (SuperArray) in array hybridization. The reverse gene-specific primers were also used in preparation of labeling of probes with a conventional method for array hybridization. An internal primer was used for quality control purposes.
Clones for verified cDNA fragments of candidate genes were prepared by following the procedures described in the TA Cloning Dual Promoter Kit (Cat# K2050-01, Invitrogen; Carlsbad, CA; http://www.invitrogen.com). The cDNA was synthesized with RNA from various tissues, including mouse D3 ES cells, by following the procedures described in the RT-PCR section. PCR reactions were performed and PCR fragments with the expected lengths were obtained and then ligated into the vectors in the presence of T4 DNA ligase. These completed ligation reactions were transformed into INVαF′-competent cells, plated on X-Gal plates, and grown overnight at 37°C. The white clones were picked to verify cDNA fragment cloning.
Validation of cDNA fragment clones were performed as follows: as indicated in the following pictures, quality control primers (3′ ≥ 5′, QC primer) were designed in the middle of regions of cDNA fragments of the genes replicated by forward and reverse primers. PCR reactions were performed by using the clones as templates in the presence of the forward and QC primers. If the PCR products were obtained with expected fragment lengths, the cloning was deemed successful. When the identity of the cDNA fragment was in doubt, DNA sequencing was used for verification.
Preparation of Arrays
Arrays were made by PCR-amplified fragments from plasmids containing the cloned inserts and forward and QC primers. The resulting PCR products were then concentrated and adjusted to 150 μg/ml in 0.08 N freshly prepared NaOH. Bromophenol blue (0.001%) was added into the source plate as the tracking dye to monitor the array printing quality. A Cartesian SynQuad Prosys dispensor (Genomic Solutions; Ann Arbor, MI; http://www.genomicsolutions.com) was used to dispense between 10-15 nl cDNA solution onto nylon membrane (Nytran, Amersham Bioscience; Buckinghamshire, UK; www.amershambioscience.com). All array spots were arranged in a rectangular area (23 × 35 mm). Spot diameter was between 0.7-0.9 mm. The spot-to-spot distance was 1.25 mm. The printed membrane was air dried at room temperature overnight and then subjected to 1200 J ultraviolet cross-linking. The array was stored at −20°C until used.
cDNA Microarray and Data Analysis
Total RNA from indicated tissues or cells was isolated by using TRIzol (Invitrogen). The biotin deoxyuridine triphosphate (dUTP)-labeled cDNA probes were specifically generated by following protocols of either the AmpoLabeling (LPR kit; Cat#: L-03N) or conventional MMLV RT method (Cat#: MM-601N, SuperArray Bioscience Corp.). For LPR amplification, RNAs were first annealed with a random primer at 70°C for 3 minutes, then reverse transcribed to cDNA at 37°C for 25 minutes. These cDNAs were amplified by PCR with gene-specific primers and presence of biotin-16-dUTP. The PCR cycle was 85°C for 5 minutes; 30 cycles (85°C, 1 minute; 50°C, 1 minute; 72°C, 1 minute); and 72°C for 5 minutes. For MMLV RT probe preparation, the RNAs were simply reverse transcribed to cDNAs in the presence of gene-specific primers and biotin-16-dUTP at 42°C for 90 minutes. The mouse stem array filters were hybridized with the biotin-labeled probes at 60°C for 17 hours. After that, the filters were first washed twice with 2 × standard saline citrate (SSC)/1% SDS and then twice with 0.1 × SSC/1% SDS at 60°C for 15 minutes each. Chemiluminescent detection steps were performed by subsequent incubation of the filters with alkaline phosphatase-conjugated streptavidin and CDP-Star (Applied Biosystems; Salt Lake City, UT; http://www.appliedbiosystems.com) substrate and exposure to electrochemiluminescence film.
For data analysis, the positive and negative spots were independently identified and verified by at least two people. Only the matched positive and negative results of two experiments are presented. For quantification, intensity of spots was first measured by ImageQuant 5.2 software (Amersham Biosciences), and then the average intensities derived from 10 blank spots were subtracted. These subtracted intensities were divided by an average of intensities from glyceraldehyde-3-phosphate dehydrogenase (GAPDH) (three spots in each array), to obtain a relative intensity for each spot. The relative intensities were used to generate scatter plots in Figures 1 and 2, Figure 2. using Excel software.
Seventeen genes were randomly selected (fgf4, Gcm2, Mtap1b, Mtab2, Tubb3, Prox1, Neurofilament [Nfl], neural cell adhesion molecule [NCAM]2, Cst3, Ccng2, CDKn1b, DNMT1, Cdh4, Itga6, Rpl13a, Actb, and GAPDH). Biotin-labeling probes were constructed by PCR. Each reaction of 10 μl volume contained 1 μl 10× PCR buffer, 150 μmol MgCl2, 0.8 μl buffer BN (SuperArray Bioscience Corp.), 20 pmol primer, 1 μl 100× diluted specific gene inserted plasmid, 1 U RedTaq DNA polymerase (Sigma; St. Louis, MO; http://www.sigmaaldrich.com), and 0.2 pmol biotin-dUTP. The reaction was performed in 35 cycles at 94°C for 30 seconds, 55°C for 30 seconds, 72°C for 30 seconds, and a final extension for 10 minutes at 72°C. Three microliters of each labeled cDNA fragment were used to check probe quality by running it in 1.2% agarose gel. The labeled probe for each gene was then pooled and used to perform an array hybridization following procedures described in the section on cDNA microarray and data analysis.
Isolation of Embryonic Neural Tubes, Hippocampus, and Liver
Timed pregnant Sprague-Dawley rats at E14.5 and young male Sprague-Dawley rats (6 months) were purchased from Harlan Sprague-Dawley (Indianapolis, IN; http://www.harlan.com) and housed individually in standard cages at the National Institute on Aging (NIA) National Institutes of Health (NIH). They were maintained at 22°C on a 12/12h light/dark cycle with free access to food (NIH-07) and water. The rats were allowed to acclimate to the vivarium until the day of the experiment. Experimental handling and experimental procedures were approved by the NIA Animal Care and Use Committee.
E14.5 Sprague-Dawley rat embryos were used to isolate neural progenitor cells as previously described . Briefly, the rat embryos were removed and placed in a Petri dish containing ice-cold phosphate-buffered saline (PBS; Invitrogen). The trunk segments of the embryos (last 10 somites) were dissected, rinsed, and then transferred to fresh cold PBS. The hippocampus and liver tissues from 6-month-old male Sprague-Dawley rats were dissected. All isolated embryonic neural tubes and tissues derived from adult rats were stored in an RNAlater solution (Ambion; Austin, TX; http://www.ambion.com) at 4°C for late RNA isolation.
Mouse D3 ES Cell Line
Undifferentiated D3 ES cells (American Type Culture Collection [ATCC]) were first expanded on STO-1 feeder cells (ATCC) that were treated with 1 mg/ml mitomycin C (Sigma) for 3 hours to arrest cell division. Subconfluent D3 cultures were then trypsinized and replated onto gelatin-coated tissue culture plates in the presence of 1,400 U/ml of leukemia inhibitory factor ([LIF] Chemicon; Temecula, CA; http://www.chemicon.com) in ES cell medium consisting of knockout Dulbecco's minimal essential medium ([MEM] Invitrogen/GIBCO) supplemented with 15% ES-qualified fetal bovine serum (Invitrogen/GIBCO), 100 mM MEM nonessential amino acids, 0.55 mM β-mercaptoethanol, 2 mM L-glutamine, and antibiotics (all from Invitrogen/GIBCO). When confluent, the D3 cells were harvested and stored at -80°C for late RNA isolation. A separate analysis of this D3 cell line grown under feeder-free conditions by RT-PCR and immunocytochemistry showed expression of OCT-3/4, Sox-2, Rex1, and Tert without detection of differentiated cell markers such as NCAM, Nfl, and glial fibrillary acidic protein (GFAP) (I. Ginis, personal communication).
cDNA was synthesized using 1 μg total RNA in the presence of Superscript II and oligo(dT) 12-18 (both from Invitrogen). The PCR was performed in a 20-μl reaction solution containing 2 μl 10× PCR buffer, 150 μmol MgCl2, 10 nmol deoxyribonucleoside-triphosphate , 20 pmol primer, 1 μl 10× diluted cDNA, and 1 U RedTaq DNA polymerase (Sigma). The PCR conditions were as follows: 35 cycles at 94°C for 30 seconds, 55°C for 30 seconds, 72°C for 30 seconds, and a final extension for 10 minutes at 72°C. Primer sequences are available upon request.
Developing a Rodent Neural Stem Gene Array
A total of 260 known genes was chosen (Table 1). Genes were selected on the basis of their known distribution and expression and our internal tests examining the expression of candidate genes on NSCs and progenitor cells [7, 9]. These genes included potential stem cell markers (87), growth factors and cytokines known to be involved in early developmental events (100), ECM-related molecules (37), cell cycle genes (5), components of the telomerase pathway (3), differentially expressed transcription factors (14), positive controls and housekeeping genes (4), and others (10). A detailed gene list including Unigene AA# and description can also be accessed on the web (http://www.superarray.com).
Table Table 1.. Gene list for mouse stem cell chip
The 260 genes were arrayed in an 18 × 16 format for a total of 288 spots. The 288 spots included 266 spots (including duplicates) of specific gene DNA fragments prepared as described in Materials and Methods, 12 spots for positive controls, and 10 spots of blanks serving as negative controls. The distribution of the gene fragments and blank and positive controls is shown in Figure 3.
Four genes, including cyclophilin A, ribosomal protein L13a, β-actin, and GAPDH were chosen as positive controls and spotted on the bottom of the chip in a triplicate fashion for visual monitoring of the quality of hybridization. Ten blank spots (negative controls) were randomly distributed on the chips. Note that candidate marker genes are grouped together for easy evaluation.
Potential molecular markers specific for ES cells were spotted on the first row, whereas neural markers were spotted on row 5. Ten genes (fgf4, fgfr4, epidermal growth factor receptor, Fzd9, NCAM, NCAM2, nerve growth factor receptor, platelet-derived growth factor receptor-α, vascular endothelial growth factor receptor 2, and Wnt11) were deliberately spotted twice in different locations to monitor the quality of the gene array preparation and uniformity of hybridization. For example, FGFR4 was located at spot 44 and 139, while FGF4 was spotted at 2 and 116. Thus, this design served as an easy and convenient means to monitor the quality of chip and array hybridization.
Evaluation of Rodent NSC Arrays and Assessment of Detection Sensitivity Using MMLV-Based RT and LPR Amplification
Several strategies were used to monitor the quality of the arrays prepared. As a first step, probes were prepared from the cloned fragments and hybridized to the array to determine that arrayed genes could be detected in a specific fashion without cross-hybridization. We randomly selected 17 genes to test the specificity of hybridization. Location of each gene in the array is bolded in Figure 3. As indicated in Figure 4, the biotin-labeled probe of each gene identified only its corresponding arrayed gene fragment, suggesting limited or no cross-hybridization. This is clearly illustrated with FGF4. A total of 22 members were spotted on the chip and shared varying degrees of homology. The FGF4 probe gave a strong signal on the FGF4 spot and showed no cross-hybridization with any other FGFs, suggesting a high specificity for this gene hybridization under our conditions. Moreover, the chip has a high quality in printing. All duplicated (FGF4 and NCAM2) and triplicate genes (Rpl13a, Actb, and GAPDH, underlined in Fig. 4B) were evenly detected. The intensities for these duplicated genes and triplicates were measured and presented in Figure 4C. We have also tested probes prepared with a mix of all 260 primers to label a plasmid mix of 20 selected genes. These results showed no cross-reactivity (data not shown), suggesting little cross-hybridization due to mishybridization of primers. Thus, results from these initial quality assessments showed that the arraying process worked, the choice of fragments arrayed was appropriate, and that hybridization under standard conditions resulted in little or no nonspecific cross-hybridization, even though several related families of molecules were arrayed.
As a next step, we compared the sensitivity of the array detection method with RT-PCR. RNA from mouse D3 ES cells and adult rat hippocampus was isolated and cDNA was prepared using oligo(dT) primers. Gene-specific primers specific to the gene fragment arrayed were designed and used to amplify all 260 genes to assess how many of the arrayed genes could be detected by a sensitive amplification method. Based on our preliminary experiments, we expected around 90% of the genes to be expressed in these tissues. PCR amplification could readily amplify 234 of 260 genes with single-band amplification of the expected size, indicating that the gene-specific primers worked appropriately and at least 90% of arrayed genes were present in ES cells and hippocampus at sufficient levels to be readily detected by RT-PCR.
We then harvested RNA from mouse D3 ES cells (0.6 μg) and adult rat hippocampus (0.2 μg) per μl and prepared probes to determine how many of the 260 arrayed genes were detectable using either standard MMLV RT or an amplification protocol. The LPR kit, which amplified labeled cDNAs, was prepared by conventional methods. The imaging profiles from one such experiment are shown in Figure 1A. Quantification showed 80 and 184 spot positives by using probes labeled with the conventional and LPR methods, respectively (Fig. 1B), indicating a low sensitivity for conventional RT probe preparation and a significant increase in sensitivity (from 31% [80/260] in MMLV RT to 71% [184/260] in LPR) for detection by using LPR compared with MMLV RT methods. However, neither method was as efficient at RT-PCR even when higher concentrations of probe were used (data not shown).
We also noted that the majority of expressed genes detected by arrays were also detected by RT-PCR (Fig. 1B). The percentage of agreements between the array and RT-PCR is about 98% (78/80) in MMLV RT and 96% (176/184) in LPR, respectively. With MMLV RT, the rate for detected genes (67%, 156/234) is greater than that using LPR (25%, 58/234).
There were two genes and eight genes, respectively, that we failed to amplify by RT-PCR but were detected by hybridization using probes prepared by MMLV RT and LPR methods. Both genes detected by MMLV RT were also detected by the LPR method, suggesting that this was not due to experimental error or unique to a particular methodology. Using the same PCR conditions with plasmids containing all eight genes as templates, we were able to obtain PCR products (data not shown), indicating that the primer design was good and PCR conditions were not the cause of this failure. The pattern of distribution of the genes detected by array hybridization but not by RT-PCR on the array did not suggest a pin defect or contamination by carryover. While this represents a small number of genes, it is important to keep such unexplained and unexpected discrepancies in mind while interpreting results.
Comparison of results obtained by array hybridization for genes expressed or absent showed a high reproducibility irrespective of the probe preparation technique used (Fig. 1C) even when different batches of RNA were used. Arrays prepared in different runs were not tested in this experiment, but previous experiments have shown high reproducibility . All blank spots (10) were consistently negative; duplicates showed consistent levels of expression and virtually all positives remained positive.
Scatter plot analysis of relative intensities for positive spots in experiments 1 and 2 indicate a higher correlation coefficient (r2 = 0.99) in MMLV RT than in LPR amplification and hybridization (r2 = 0.73, Fig. 1D). This may be caused by variability that was involved in amplification steps in probe preparation.
It is important to note that while LPR increased sensitivity and was quite reproducible from experiment to experiment irrespective of the probe concentration used, not all messages were amplified equally. Comparison of the pattern of expression of amplified versus unamplified probe hybridization showed slightly different expression profiles (Fig. 1A). For example, expression of integrin b5 (short arrow) was robust by LPR but not detected at all by the MMLV RT method. The expression of the four positive controls (underlined in Fig. 1A) also shows a different expression pattern when these two methods are compared. Results from semiquantitative RT-PCR analysis (Fig. 1E) of these four genes show a similar pattern to that obtained with a nonamplified probe (MMLV method), suggesting a bias of amplification by LPR. These results have prompted us to use MMLV RT for the analysis of expression patterns for all initial experiments and then switching to the more sensitive LPR method.
Array Analysis for Cell-Specific Tissues
Having assessed the quality and reliability of the focused microarray, we next determined if the arrayed genes were selective and sufficient in number to enable us to differentiate between neural and non-neural cell populations. We chose to examine ES cells, which are developmentally closely related to neural cells and share many antigens [11, 12], and fetal neural cells, which predominantly consist of NSCs and neural progenitors [13, 14]. We also included liver cells as an endodermal derivative that is distinct from either of the two populations.
As shown in Figure 2A, the gene expression profile could easily distinguish between these cell types. To quantify the results, we normalized the intensity data by first subtracting the average background (mean blank intensity) and then comparing intensity levels with the average intensity of the hybridization signal of the housekeeping gene, GAPDH. These calculated values (relative intensity of expression) were used to generate scatter plots for cross microarray comparisons. As shown in Figure 2B, there were low values of r2 when mouse D3 cells were compared with rat E14.5 cells or adult liver cells, although the r2 value between E14.5 cells and the liver is high. There were nine ES markers detected in mouse D3 ES cells that were absent or expressed at low/undetectable levels in neural cells or in liver (Panel C). Six of these ES markers were easily identified in the first row (short arrow, first six positive spots). Dnmt1 and Itga6, located on row 15 and 17, respectively, appeared to be specific to ES cells though they were not initially included on the array as cell type-specific markers. The relatively specific expression in D3 cells was confirmed by RT-PCR. As shown in Figure 2C and D, Sox-2 was present in D3 (ES cells) as well as E14.5 neural tubes (NSCs and progenitor cells), which is consistent with published data of Sox-2 expression in ES and NSC populations. In addition to positive expression of ES cell markers, 17 neural markers were also found that were expressed by neural cells with little or no expression in ES cells and the liver (Panel D). Some of them were easily identified in row 5 (as indicated by long arrow). RT-PCR analysis for expression of neural markers was in agreement with the array results (Panel D). Interestingly, glial markers present on the array did not hybridize to probes prepared from neural cells at E14.5. It has been shown that glial progenitor cells appear late compared with neuronal progenitors in neural tubes . The stem array analysis also failed to detect the glial markers such as S100β and GFAP at this stage. Thus, this stem cell array possesses the ability to distinguish stem cells and progenitor cells at different development stages.
While numerous markers that could readily distinguish neural cells from ES or liver cells were identified, we noted that several ES markers, which are known to be expressed by D3 cells, could not be detected by this stem array. As shown on RT-PCR results in Figure 2D, the genes for LIF and telomerase-related complex (tert, tebp, terf1) are expressed in mouse D3 cells. However, we could not detect the expression of these genes in the array analysis. This is likely due to the low sensitivity of the MMLV RT method since expression of LIF was found using LPR amplification (Fig. 1A). Thus it will be important to not only identify cell type-specific markers but also select those that are relatively abundant.
We have taken advantage of the current understanding of molecular markers for rodent ES cells, NSCs, and progenitor cell populations to construct a focused stem array. This chip contains 260 genes, 87 of which are genes for stem cell markers; 96 are genes for growth factors; and the remaining genes are for cell cycle, ECM, transcription factors, and controls. These growth factor families, such as bone morphogenetic proteins, FGFs, and Wnts, are known to play critical roles in stem cell development. All spots are printed as validated gene-specific PCR-amplified fragments and show little or no cross-hybridization. The quality of this chip is high, showing reproducible detection by using standard MMLV RT or LPR methods to prepare probes. Most importantly, the chip shows the ability to specifically identify gene expression patterns from tissues at different developmental stages. This array can be produced at an affordable cost and can be rapidly modified to add additional genes as better markers for a particular application become available. Thus, this chip provides a useful tool in routine use for most research laboratories in the stem cell field.
Assessment of the arrays showed that several quality control issues must be assessed in microarray construction. Of paramount importance was ensuring reproducibility both at the arraying facility and under differing hybridization conditions. Our results show that we could ensure such reproducibility. We determined that it was useful to run blanks and positive controls to ensure that hybridization conditions were similar and that it was useful to run duplicate spots to ensure that the arraying procedure and the hybridization itself was uniform across the membrane.
We emphasize that the analysis of the data using a focused microarray is also relatively straightforward given the relatively small number of genes, the fact that they are known genes with solid data on expression, and the availability of antibodies and often transgenic models. The signal to noise is relatively high, as genes have been selected based on published data on levels of expression and because gene-specific primers were used in the reverse transcription of probes (see Results). This large signal to noise allows one to use all or none of the cut-offs (especially when unamplified cDNA is used as a probe). A detailed statistical analysis is to a large extent unnecessary, for most purposes, that such a focused array be used. For example, expression of at least eight neural markers is undetectable in unamplified cDNA prepared from ES cells maintained in an undifferentiated state irrespective of the initial RNA input, while markers of ES cell differentiation are readily detected (Fig. 2). Any detectable signal for neural markers (more than one) would be an indication that ES cells are undergoing differentiation. Such all-or-none results are very useful for comparing the quality of cells maintained in different laboratories under different conditions or as quality control when large batches of stem cells are being prepared. It is important to emphasize that while detailed statistical information is not critical for some applications, it is always possible to analyze the data using the tools developed for array expression. Data on relative levels of expression, correlations, and clustering can be readily generated using standard bioinformatics tools. We are currently working with the NIA array facility to automate the data collection and analysis. Selecting markers unique to different stages of stem cell differentiation and different types of stem cells allows us to readily obtain a cell type signature. Theoretical calculations have suggested that as few as five or six markers are sufficient to unambiguously profile a cell type, and such strategies are being used for cancer diagnostics [16–18].
An important control that we have routinely performed when using focused microarrays is verification by PCR and in situ hybridization. We would recommend that a PCR primer set be made available to all microarray users for verification of novel or unexpected results. We have prepared such a primer set for the stem cell array that we have developed and verified PCR amplification conditions. In addition, we have cloned each fragment arrayed on the membrane into a plasmid vector for making probes for in situ hybridization experiments. Our results suggest a very good correlation between genes detected by array analysis and those verified by PCR, i.e., if detected, it is very likely to be present , or by in situ hybridization. These arrays, however, are much less expensive than performing 260 PCRs, and we would suggest that verification in routine use be limited to novel or unexpected results.
The value of microarrays designed for routine practical application on a widespread basis would be significantly enhanced if a database of results could be generated and maintained. We are currently working with the NIA array facility to set up standards for stem array data and to collect stem array data from public and various research institutes. Developing such a stem array database will facilitate comparative analysis of gene expression patterns for stem cells and gene discovery in stem cell development.
While there was high concordance between hybridization results and RT-PCR, we noted two possible difficulties in comparing data. RT-PCR, by its very nature, is far more sensitive than hybridization, and in all tissues more genes were detected by RT-PCR than by hybridization. However, somewhat surprisingly, a small subset of genes (8) was identified as being expressed by hybridization that was not detected by RT-PCR. Failure to detect Olig2 by RT-PCR could be readily explained as a failure to optimize RT-PCR conditions. Failure to detect expression of other genes was more difficult to explain, as we were unable to optimize PCR conditions or find published data for that particular stage. We ruled out contamination, accidental mislabeling, or pin transfer. We also confirmed that the primers could amplify the correct fragment from plasmid products under identical PCR conditions. We will continue to assess potential causes for this discrepancy, and if we are unable to reconcile the results, we will eliminate this set of genes from subsequent generations of the stem cell chip.
We noted that probes prepared by LPR substantially increased the number of genes that can be detected. The amplification is specific in that almost all genes detected by probe prepared after amplification could be verified by RT-PCR, and the number of genes detected remained constant over different amounts of input RNA. However, while there was an increase in sensitivity, amplification was not uniform in our hands (Fig. 1A). We would therefore suggest that comparisons be limited to hybridization results obtained using the same method of probe preparation. As we have noted earlier, PCR allows detection of far more expressed genes than the array, even when using radioactivity or when using amplification methods for probe preparation. Since we have verified to a large extent that the expression detected by PCR represents true expression, we imply that hybridization using even gene-specific primers and probe amplification is inherently less sensitive than PCR.
While the current analysis is focused on validation and reproducibility of the array and the discussion has been limited to the 87 or so cell type-specific markers, it is important to note that an additional 180 genes are present, which, while not cell type-specific, were selected for their known effects during early development. Even in a limited examination it is clear that examining the expression of cytokines, ECM molecules, and transcription factors in a well-defined population reveals interesting insights into the biology of cells. Expression of FGFs for example suggests FGF4 might be important and specific to maintaining ES cells in an undifferentiated state and is downregulated when cells differentiate into neural cells. Induced expression of FGF12 and 5 reveal an unexpected role for these factors at this early stage of development. The expression and function of these molecules can be easily monitored under different culture conditions or perturbations to reveal the downstream effect and/or compensatory upregulation of related members of the family. The ease and rapidity of such assessment highlight the importance of preparing such an array and rigorously selecting candidate genes and validating the hybridization process.
Our results suggest that while the membrane-based focused stem cell array worked as designed, several potential improvements could be made. Perhaps the most significant change would be to add additional cell type-specific markers as they become available. For example, since the design process, two potential markers for ES cells have become available. Embryonal stem cell specific gene 1 [19, 20] has been shown to be highly abundant and specific for ES cells. Likewise, FoxD3 has been suggested as an ES cell marker , while LIF receptor has been shown to be absent in human ES cells . A second potential improvement would be to enhance sensitivity. As discussed above, LPR was one effective strategy used to increase sensitivity, and the correlation between genes detected by LPR-amplified probe and PCR was far higher than between unamplified probe hybridization and PCR. Despite the limitations, the array in its current format is very useful. We have proposed using nitrocellulose-based membrane arrays as an initial material simply to allow flexibility for small laboratories that may not have specialized readers. Likewise, we assumed that dual labeling, which offers advantages in comparing samples, may not be readily available for smaller laboratories. However, we envisage no difficulty in replicating such an array in a slide-based format or designing an oligonucleotide-based array using a similar set of genes. Having multiple formats with the same basic subset of genes may allow a definitive comparison as to the optimal method in terms of sensitivity and reproducibility for a particular application.
In summary, we have constructed a focused stem array based on our best knowledge for understanding molecular markers for rodent ES cells and their progenitors. This chip focused on stem cell markers and growth factors required for stem cell development. The chip has a very high quality, which is indicated by its printing of validated gene-specific PCR-amplified fragments and reproducible detection by using either standard MMLV RT or LPR methods. The chip is capable of distinguishing gene expression patterns from tissues at different developmental stages. The cost of the array is affordable and its data analysis is simple. Although it is designed to be modified in the future, the current version can be provided for a convenient research tool in routine use.
Y.L. and I.G. were supported by the NIA. J.C. was supported by the NIA and the University of Utah. M.S.R. was supported by the NIA, the CNS foundation, and the ALS Center. A.H. was supported by the NINDS. Y.S., S.L., and S.Y. are employed by Superarray, which assisted in preparing the test microarrays. M.S.R. acknowledges the contributions of S. Rao that made undertaking this project possible.