Transcriptome profiling offers a powerful approach to investigating developmental processes. Long serial analysis of gene expression (LongSAGE) is particularly attractive for this purpose because of its inherent quantitative features and independence of both hybridization variables and prior knowledge of transcript identity. Here, we describe the validation and initial application of a modified protocol for amplifying cDNA preparations from <10 ng of RNA (<103 cells) to allow representative LongSAGE libraries to be constructed from rare stem cell-enriched populations. Quantitative real-time polymerase chain reaction (Q-RT-PCR) analyses and comparison of tag frequencies in replicate LongSAGE libraries produced from amplified and nonamplified cDNA preparations demonstrated preservation of the relative levels of different transcripts originally present at widely varying levels. This PCR-LongSAGE protocol was then used to obtain a 200,000-tag library from the CD34+ subset of normal adult human bone marrow cells. Analysis of this library revealed many anticipated transcripts, as well as transcripts not previously known to be present in CD34+ hematopoietic cells. The latter included numerous novel tags that mapped to unique and conserved sites in the human genome but not previously identified as transcribed elements in human cells. Q-RT-PCR was used to demonstrate that 10 of these novel tags were expressed in cDNA pools and present in extracts of other sources of normal human CD34+ hematopoietic cells. These findings illustrate the power of LongSAGE to identify new transcripts in stem cell-enriched populations and indicate the potential of this approach to be extended to other sources of rare cells.
Disclosure of potential conflicts of interest is found at the end of this article.
Genome-wide expression profiling has become an important tool for analyzing cell behavior and has been particularly useful for identifying molecular events associated with early developmental decisions and disease pathogenesis. Two technologies are now commonly used for comparing or characterizing the complete transcriptome of specific cell populations: hybridization-based arrays  and serial analysis of gene expression (SAGE) . In the first of these procedures, known transcript sequences or expressed sequence tags (ESTs) present on a solid phase surface were originally used to capture reverse-transcribed DNA copies of extracted cellular mRNA. The extent of hybridization achieved by competing cDNAs prepared from two cell sources was then determined to allow a comprehensive survey of differences in the gene expression profiles of the two cell populations being compared. The subsequent substitution of annotated oligonucleotides as capture probes has further improved consistency and signal detection.
SAGE involves the construction of large libraries of tags (typically 10 or 17 nucleotides long) that have been reverse-transcribed from the 3′ end of mRNAs present in the sample. The tags are then sequenced, and bioinformatics methods are used to derive transcript identities. Transcript levels can then be inferred directly from tag frequencies, bypassing any need for comparison to a reference cDNA preparation. As a result, each SAGE library becomes a permanent digital data resource accessible for repeated interrogation. The fact that SAGE does not require prior knowledge of the transcripts being surveyed also makes it useful for gene discovery. SAGE has thus become a particularly attractive technology for studies of cellular transcriptomes from organisms for which comprehensive genomic sequence information is available. Nevertheless, a major limitation of the original SAGE methodology has been the need for relatively large quantities of starting RNA (originally 5 μg, the amount typically obtained from approximately 106 cells ). Subsequent modifications to decrease the amount of starting material needed (microSAGE , amplified antisense RNA-LongSAGE , small amplified RNA-SAGE , SAGE-lite , and polymerase chain reaction [PCR]-SAGE ) have now made it possible for either SAGE or LongSAGE libraries to be generated from much smaller amounts of RNA (down to 40 ng). However, these are still not readily applicable to isolates containing fewer than 104 cells. Because of the very low frequency of normal or malignant stem cells in many primary tissues, this limitation still hampers the use of any SAGE approach for characterizing a variety of stem cell populations.
Here, we describe a method that adapts recent technology for amplifying cDNAs from a few nanograms of total cellular RNA [8, 9] in a fashion that meets the requirements for SAGE library construction, minimizes the generation of ambiguous tags, and preserves the initial transcript representation. Using this approach, we have created the first LongSAGE library thus far reported from the CD34+ subset of normal adult human bone marrow cells. Analysis of the tags obtained indicates the capture of many expected transcripts, as well as a number of transcripts not previously known to exist.
Materials and Methods
Normal human cord blood cells were obtained, with consent, from anonymized discarded placentas, and the low-density (<1.077 g/cm3) fraction of cells isolated by centrifugation on Ficoll-Hypaque (Pharmacia, Calgary, AB, Canada, http://www.pfizer.ca) was then cryopreserved. Samples were thawed, and the CD34+ cells were separated immunomagnetically using a CD34+ cell positive selection kit (EasySep; Stem Cell Technologies, Vancouver, BC, Canada, http://www.stemcell.com). The cells were then stained with a phycoerythrin-conjugated anti-human CD34 antibody (8G12; BD Biosciences [BD], San Jose, CA, http://www.bdbiosciences.com) and propidium iodide (PI) (Sigma-Aldrich, St. Louis, http://www.sigmaaldrich.com), and a population of viable (PI−) CD34+ cells was obtained using a FACSVantage machine (BD). Aliquots of 100, 500, 103, and 105 viable CD34+ cells were collected directly into vials containing 100 μl of RNA extraction buffer from the PicoPure RNA extraction kit (Arcturus, Mountain View, CA, http://www.arctur.com). Cryopreserved normal adult human bone marrow cells obtained with informed consent were provided by the Northwest Tissue Center (Seattle). After thawing, human cells expressing lineage (lin) markers of mature blood cells (CD2, CD3, CD14, CD16, CD19, CD24, CD56, CD66b, and glycophorin A) were removed immunomagnetically using a column (StemSep; Stem Cell Technologies) as recommended by the manufacturer and cryopreserved. The lin− cells were thawed at 37°C and incubated in 50% fetal calf serum in Hanks' balanced salt solution overnight at 4°C to minimize effects of freezing and thawing on the levels of different mRNAs present. This was established by comparing the levels of transcripts for 11 variably expressed genes using quantitative real-time (Q-RT)-PCR. We also found that the gentle thawing process adopted for previously cryopreserved cells did not perturb the differentiation capabilities of these cells as determined by colony-forming cell (CFC) assays (data not shown). Thawed cells were then stained with allophycoerythrin-conjugated anti-human CD34 antibody (8G12; BD), fluorescein isothiocyanate-conjugated lineage marker antibodies, and PI. Viable (PI−) lin−CD34+ cells were then isolated using a FACSVantage flow cytometer. The purity of the sorted cells was determined to be >98% as assessed by second fluorescence-activated cell sorting (FACS) analysis of an aliquot of the sorted cells. Total RNA extracts were prepared from viable lin−CD34+ cells isolated by FACS in the same manner as described for cord blood cells.
Hematopoietic Progenitor Cell Assays
CFC assays were performed by plating human lin−CD34+ bone marrow cells at 800 cells per milliliter in serum-containing methylcellulose medium (Methocult 4230; Stem Cell Technologies) supplemented with 3 U/ml erythropoietin (Stem Cell Technologies), 50 ng/ml Steel factor (SF) (prepared and purified in the Terry Fox Laboratory), 20 ng/ml each of interleukin-3 (IL-3) and granulocyte-macrophage colony-stimulating factor (both from Novartis International, Basel, Switzerland, http://www.novartis.com), 20 ng/ml granulocyte colony-stimulating factor (G-CSF) (Stem Cell Technologies), and 20 ng/ml IL-6 (Cangene Corp., Mississauga, ON, Canada, http://www.cangene.com) . Long-term culture-initiating cell (LTC-IC) assays were performed by culturing 2 × 104 lin− CD34+ bone marrow cells in 2 ml of myeloid LTC medium (Myelocult; Stem Cell Technologies) supplemented with 10−6 mol/l hydrocortisone sodium hemisuccinate (Sigma-Aldrich) for 6 weeks on pre-established, irradiated feeder layers of mouse fibroblasts genetically engineered to produce human SF, G-CSF, and IL-3. At the end of this time, the number of CFCs present was determined, and the number of input LTC-IC calculated assuming an average 6-week output of 18 CFCs per LTC-IC .
RNA Isolation and cDNA Preparation and Amplification
An RNA extract prepared from undifferentiated H9 human embryonic stem cells was kindly provided by Dr. J. Thomson (University of Wisconsin, Madison, WI). RNA extracts were also prepared separately from 100, 500, 103, or 105 FACS-purified human CD34+ cord blood cells using the PicoPure RNA extraction kit. To minimize contamination with genomic DNA, RNA isolates were treated with DNaseI (Amplification Grade; Invitrogen, Burlington, ON, Canada, http://www.invitrogen.com) according to the manufacturer's protocol. To quantify the extent of genomic contamination in the final purified cDNA used for SAGE library construction, we used intron-specific primers to amplify sequences for two genes on different chromosomes: forward primer 5′-CCCCATGAGTCAGGTCGG-3′ and reverse primer 5′-CCCAGACTGCATCTCAGCCA-3′ for the DRCG8 gene (22q11.2), and forward primer 5′-AGTTTCTCCTCTCTCCTCCCAAG-3′ and reverse primer 5′-TCACTTCACTTCATTTTCACTTCTC-3′ for the ATP11A gene (13q34), by quantitative PCR (Q-PCR). The results obtained with both pairs of primers showed that <0.1% of the cDNA sample contained genomic DNA. RNAs were reverse-transcribed, and the cDNAs obtained were amplified using the switching mechanism at the 5′ end of RNA transcripts (SMART) cDNA synthesis kit (catalog number 635000; Clontech, Mountain View, CA, http://www.clontech.com) following the manufacturer's protocol but using modified template switching (TS) and cDNA amplification primers, as detailed below (Fig. 1A). The first-strand cDNA was synthesized with an oligo(dT) primer (5′-AAG CAG TGG TAA CAA CGC AGG CTA CTT TTT TTT TTT TTT TTT TTT TTT TTT TTT TVN-3′, where V denotes A, C, or G and N denotes A, C, G, or T) and the PowerScript reverse transcriptase provided in the kit, in the presence of the modified TS primer. The TS primer was modified by introducing a sequence containing an AscI digestion site (5′-AAG CAG TGG TAA CAA CGC AGG CGC GCC GGG-3′ [the AscI site is underlined]). The first-strand cDNA was purified with a NucleoSpin column and then amplified using a modified PCR primer that contained a biotin molecule at its 5′ end (5′-biotin-AAG CAG TGG TAA CAA CGC AGG C-3′) and the Advantage II PCR Kit (Clontech). The biotinylated 5′ ends of the amplified cDNAs were then removed by digestion of the initial amplified product with AscI (New England Biolabs, Beverly, MA, http://www.neb.com). The cDNA was purified on a Chroma-Spin 200 Column (Clontech), and its concentration was determined using a spectrophotometer (GeneQuant Pro; Biochrom, Cambridge, U.K., http://www.biochrom.co.uk).
SAGE Library Construction
For the PCR-LongSAGE libraries, the amplified cDNA was first digested with NlaIII and incubated with streptavidin beads (M-280; Invitrogen); the immobilized, truncated cDNAs were then linked to two different adaptors, and LongSAGE libraries were constructed using the I-SAGE kit (Invitrogen) following the manufacturer's protocol. The I-SAGE kit was also used to construct a SAGE-lite library from 400 ng of PCR-amplified cDNA (22 cycles) obtained using a methodology described before  that also uses SMART cDNA technology.
RNA was reverse-transcribed with SuperScriptII (Invitrogen) to generate first-strand cDNA for use as the template for Q-RT-PCR analysis of transcript levels in nonamplified RNA preparations. Q-RT-PCR was performed using SYBR Green PCR MasterMix (Applied Biosystems, Foster City, CA, http://www.appliedbiosystems.com) and an iCycler PCR machine (Bio-Rad, Hercules, CA, http://www.bio-rad.com). After an initial denaturation step at 94°C for 5 minutes, 50 cycles of a three-step PCR with a single fluorescence measurement were undertaken (94°C for 15 seconds, 60°C for 20 seconds, and 72°C for 30 seconds). The PCR products were also subjected to melting curve analysis for verification of single amplicons and absence of primer dimers. Q-RT-PCR and data analysis were performed on an iCycler iQ system, using iCycler iQ real-time detection software (Bio-Rad). The primers used are shown in supplemental online Table 1. Q-PCR assays were used to confirm the expression of the unique tags identified by bioinformatics analysis of the lin−CD34+ human adult bone marrow LongSAGE library. For this purpose, RNA was extracted from lin−CD34+ cells isolated from human bone marrow samples from three different normal adult donors; one of these samples was the same as that used for construction of the PCR-LongSAGE library. cDNA was generated, as described and as a negative control the same amount of RNA was used without adding reverse transcriptase. Primers for detecting novel transcripts were selected from the human genome (Human BLAT Search, http://www.genome.ucsc.edu/cgi-bin/hgBlat) flanking 5′ and 3′ regions of the identified unique tags in such a way that the amplicons would include the unique tag sequences (supplemental online Table 2).
Development of a cDNA Amplification Protocol Suitable for Constructing LongSAGE Libraries
To allow LongSAGE libraries to be constructed from highly PCR-amplified preparations of 3′ cDNAs without major distortion of the original transcript representation, we used the SMART technology developed by Clontech  and also used in the SAGE-lite protocol  with two modifications. The original technology makes use of a TS primer containing a short poly-guanine sequence at its 3′ end for the first-strand cDNA synthesis step. We then modified the cDNA amplification primer so that it contained a biotin molecule at the 5′ end. In addition, we modified the TS primer by introducing an eight-base (GGCGCGCC) AscI restriction endonuclease recognition sequence into its 3′ end (Fig. 1A). These modifications allowed the biotinylated primers incorporated into the 3′ ends of the cDNA products to later be removed to yield a final product in which the cDNAs were biotinylated exclusively at their 5′ ends, as required for SAGE library construction (Fig. 1A). This approach is a variation of the previously described introduction of a seven-base SapI site for the same purpose . However, from in silico analyses, we found that 24% of Ensembl transcripts contain at least one SapI site, which could result in a potential loss of >600 tag types following SapI digestion. In contrast, only 3% of Ensembl transcripts contain one or more AscI restriction sites, and only 80 contain an AscI site between the first NlaIII site 5′ of the poly(A) tail and the poly(A) tail itself (Fig. 1B). Consistent with the expectation of a minimal loss of tags after digestion with AscI (at 37°C for 1 hour), we found that there was no detectable change in the size distribution of the amplified cDNAs when they were analyzed electrophoretically (Fig. 1C).
To determine the number of cycles of amplification to use, we generated cDNA samples independently from three separate 10-ng aliquots of RNA extracted from undifferentiated human H9 embryonic stem cells (http://www.transcriptomES.org) and then examined the electrophoretically separated products obtained after 18–24 cycles of amplification. The results showed that the PCR amplification reaction had not yet reached a plateau after 21 cycles, by which time there was already sufficient product to construct a SAGE library (supplemental online Fig. 1A). This result was also validated by Q-RT-PCR analyses (supplemental online Fig. 1B).
Evidence of the reproducibility of the cDNA amplification protocol and its ability to preserve relative transcript levels in amplified cDNA products was obtained from separate Q-RT-PCR measurements of the levels of six differentially expressed mRNAs in the H9 cell extract described above on samples taken before and after three independent amplifications of the starting cDNA pool (Fig. 2).
We next asked what would be the minimum number of normal adult human cells from which a suitable amplified cDNA product could be obtained to allow construction of a 200,000-tag LongSAGE library. To address this question, we used a combination of immunomagnetic cell separation and multiparameter FACS procedures to isolate CD34+ cells from a pool of cells from three normal human cord blood harvests (Fig. 3A). cDNA products were prepared from separately collected aliquots of 100, 500, 103, and 105 of the CD34+ cells isolated, and they were then amplified or not (105 cell samples). Figures 3B and 3C show comparisons of the levels of 10 transcripts quantified in these extracts by Q-RT-PCR before and after amplification. All 10 transcript species were detected in the amplified cDNA products obtained from as few as 500 cells, and their levels were highly correlated with those measured in the nonamplified material (R = 0.83 ± 0.04; Fig. 3B). In addition, the RNA extracted from the 500-cell sample yielded more than 400 ng of amplified cDNA, which is more than enough to build a one million-tag LongSAGE library using the I-SAGE protocol (Invitrogen). The cDNA products generated from 100 CD34+ cord blood cells also showed a significant correlation between the levels of the more prevalent transcript species before and after their amplification, although some of the rarer transcript species were not detectable in the amplified products generated in this case (data not shown).
Comparison of Replicate LongSAGE Libraries Prepared from Amplified and Nonamplified cDNAs
We then compared the complete tag profiles from LongSAGE libraries constructed from amplified and nonamplified cDNAs derived from the same original RNA extract. For this analysis, two of the independently amplified H9 cDNA preparations analyzed in Figure 2 were used to prepare replicate libraries. The two PCR-LongSAGE libraries were sequenced to depths of 57,470 (library A) and 112,517 (library B) total tags (all analyses performed using http://www.transcriptomES.org). To minimize effects due to poor-quality tags, we applied sequence quality cut-offs of 95.0% and 99.9% to the nonsingleton and singleton tags, respectively. This reduced the number of tags in the two PCR-LongSAGE libraries to 46,241 (library A) and 83,557 (library B). The library prepared from nonamplified material was a 467,522-tag library constructed from 20 μg of RNA using the standard I-SAGE protocol. Also included in this analysis was a 60,492-tag SAGE-lite library prepared from a 100-ng aliquot of the same RNA extract. All four libraries showed the expected predominance of low-abundance tags and, in this respect, were indistinguishable from one another (data not shown). They also contained readily detectable frequencies of tags unique to transcripts of known relevance to undifferentiated human embryonic stem cells (supplemental online Table 3) .
We then used DiscoverySpace software to compare the tag representation in these four libraries on a pairwise basis. This software uses Audic-Claverie statistics  to allow the tag composition of SAGE libraries to be compared independent of library size. This analysis showed the two replicate PCR-LongSAGE libraries to be 98% similar to one another using a 95% confidence interval, that is, only 2% of tag types were present at significantly different levels (p < .05) in one of the two PCR-LongSAGE libraries (Fig. 4A). Comparison of each of these libraries to the conventional LongSAGE library prepared from nonamplified material gave corresponding similarity values of 95% (for PCR-LongSAGE library B; Fig. 4B) and 84% for the PCR-LongSAGE library A (data not shown). Values for parallel similarity comparisons with the SAGE-lite library were 96% (library B; Fig. 4C) and 97% (library A; data not shown), and the value for comparison of the LongSAGE library with the SAGE-lite library was 97% (data not shown). In fact, only seven tags were consistently over- or under-represented in both of the PCR-LongSAGE libraries compared with the tags from the LongSAGE library prepared from nonamplified material, and none of these mapped to a unique site in the most recent version of the human genome (RefSeq database, build 35, August 26, 2004).
Pearson correlation analysis of tag frequencies in each pair of libraries generated correlation coefficients of 0.8 for the two PCR-LongSAGE libraries and somewhat lower values when these were compared with the library obtained from nonamplified material (0.61 and 0.65, respectively) or to a corresponding SAGE-lite library (0.61 and 0.66, respectively) (Fig. 4D). This latter method of comparison is more sensitive to differences between higher frequency tags. Hence, to avoid distortion from repetitive sequences, only tags that could be matched to a unique sequence in the most recent version of the human genome (build 35, August 26, 2004) were included in this analysis.
Construction and Analysis of a PCR-LongSAGE Library from CD34+ Cells Isolated from Normal Adult Human Bone Marrow
We then used this method to construct a library from ∼3,000 highly purified lin−CD34+ cells isolated by FACS from a sample of normal adult human bone marrow cells (Fig. 5A). Functional assays applied to these CD34+ cells demonstrated that 12% had granulopoietic, erythroid, or mixed granulopoietic and erythroid CFC activity in vitro. In addition, 0.3% of these cells were detectable as 6-week precursors of CFCs in LTC-IC assays , as described in Materials and Methods. From this library, 201,106 tags were sequenced, and 42,310 unique tag types were obtained with a typical SAGE tag frequency distribution (Fig. 5B). A complete listing of all the tags is given at http://www.transcriptomES.org. Q-RT-PCR of cDNA preparations generated from extracts of independently purified lin−CD34+ cells from the same bone marrow sample showed a good correlation between the transcript levels measured and those inferred from the PCR-LongSAGE tag counts using DiscoverySpace for tag-to-transcript identification (Fig. 5C).
The tag-to-transcript analysis showed that 8,959 tags in the PCR-LongSAGE library mapped to single RefSeq transcripts or multiple variants of a single gene in the RefSeq database. This included transcripts that are known to be expressed in CD34+ human bone marrow cells, such as transcripts that encode various transcription factors and cell surface receptors [15, , –18]. A number of these transcripts have not been found in previously published libraries generated from phenotypically similar cell populations using the original 14-mer SAGE protocol (examples highlighted in Table 1) [17, 19]. Nevertheless, when DiscoverySpace was used to compare all of the tags present in our library with those present in the two related published libraries [18, 20], 96% and 94% similarity values, respectively, were obtained (at a 95% confidence interval; Fig. 6A, 6B). When we compared the nonsingleton tags in the newly constructed lin−CD34+ bone marrow library with nonsingleton tags in the other two CD34+ human cell libraries, the result showed that 2,166 tags were present in all three (Fig. 6C). The tag-to-transcript mapping of these 2,166 tags yielded 718 RefSeq transcripts (the tag and annotation information are summarized in supplemental online Table 5). The consistent expression of these transcripts in the three CD34+ libraries suggests that these genes may play important roles in the maintenance and/or differentiation of human hematopoietic stem/progenitor cells.
Table Table 1.. Transcripts detected in a polymerase chain reaction-long serial analysis of gene expression library prepared from normal adult human lin−CD34+ bone marrow cells
Gene Ontology analysis of these 718 RefSeq transcripts showed the presence of cell death-related genes where there was a balance in the positive and negative regulators of cell death. We also observed the presence of several positive regulators of cell growth, reflecting the likelihood that some of the cells in the CD34+ subset of human bone marrow are proliferating . In addition, we observed the presence of several transcripts encoding proteasome components and members of the ubiquitination complex (supplemental online Fig. 3). Interestingly, it was recently demonstrated that the proteasomal activity of human hematopoietic progenitor cells prevents their infectability with lentiviral vectors .
We also compared our normal adult lin−CD34+ human bone marrow cell SAGE library to 287 publicly accessible SAGE libraries prepared from multiple types of human cells (available primarily through the Cancer Genome Anatomy Project at http://www.cgap.nci.nih.gov, including the two human CD34+ cell libraries mentioned above). This more extensive comparison revealed 936 tags that appeared only in our lin−CD34+ bone marrow cell library, of which 192 mapped to a single sequence in the human genome and not to any site included in the mammalian genome collection (ftp://ftp.ncbi.nih.gov/repository/MGC/MGC.sequences), RefSeq (ftp://ftp.ncbi.nih.gov/refseq/daily), or Ensembl, version 20. We then estimated the probability of single-base pair errors by combining a library-wide construction error rate and a tag-specific sequencing error probability , which indicated that 190 of the 192 tags could be judged to be error-free (p ≤ .05). Of the 190 tags, 23 mapped to highly conserved regions in mouse, rat, and human genomes and, in the human genome, were located at least 5,000 base pairs away from well-annotated transcripts and were also not present in any human EST database. These 23 novel tags are listed with their chromosomal locations in supplemental online Table 4. Q-RT-PCR was then used to investigate the expression of these 23 novel tags in three cDNA samples prepared from independently from three samples of lin−CD34+ adult human bone marrow cells, including one prepared from the same pool of RNA used for making the PCR-SAGE library.
To assess the possibility of genomic DNA contamination and its contribution to the detection of the unique tag expression, we included a strict negative control in which RNA from each bone marrow sample was used as PCR template (described in Materials and Methods). Q-RT-PCR analyses showed 10 of the 23 tags to be consistently detectable in the cDNA samples examined with no detectable amplification in the negative controls. Four of these 10 novel tags were also observed in nine additional PCR-LongSAGE libraries that we have recently prepared from related sources of primitive human hematopoietic cells (i.e., the lin−CD34+CD38−CD7−CD36−CD45RA−CD71− and lin−CD34+CD38+CD7−CD36−CD45RA−CD71− subsets of cells in normal adult human bone marrow, umbilical cord blood, G-CSF-mobilized peripheral blood, and human fetal liver; Y.Z. and C.J.E., unpublished data), and 1 of the 10 novel tags was present in two of these nine libraries (supplemental online Table 4).
SAGE technology offers a powerful approach to global gene expression profiling of defined cell populations and can serve as an important gene discovery tool. It is therefore particularly attractive for investigations of changes in cellular programs, both normal and aberrant. However, the use of SAGE to interrogate many key events is often precluded because these take place in rare cell types that are inaccessible to SAGE analysis because the amounts of RNA required cannot be obtained. Here, we describe a modified method for preparing amplified cDNA products that enables LongSAGE to be reproducibly applied to samples 10-fold smaller than were previously possible (103 cells or less). This modification makes use of a template switching primer containing a rare (AscI) restriction site and a 21-cycle PCR that yields sufficient cDNA product to allow the construction of SAGE libraries from which millions of tags can be derived by direct sequencing. Here, we used the Long-SAGE protocol because of the improved yield of tags obtained from such libraries that can be uniquely mapped to genomic DNA .
Currently, many of the methods available to amplify RNA make use of the error-prone T7 RNA polymerase. If applied to material to be used for SAGE, a high frequency of ambiguous or incorrect tags might be expected. Amplification of cDNAs by the PCR method makes use of Titanium Taq polymerase with a TaqStart antibody to provide automatic hot-start PCR, as well as proofreading activity. These latter features maximize reliability by ensuring that the amplified cDNA contains very little product derived from nonspecific cDNA strand amplification or mismatched sequence errors (estimated at 1/50,000 nucleotides). Here, we validated these predictions by a series of experimental and statistical comparisons of the tag or transcript representation in amplified versus nonamplified cDNA preparations and SAGE libraries prepared from these samples. The results demonstrated that PCR-LongSAGE is a reproducible method for performing SAGE analyses on small numbers of cells without significant distortion or loss of transcripts present in the original RNA extract.
The power of this method is illustrated here for the transcriptome analysis of the small fraction of lin−CD34+ cells present in normal adult human bone marrow. These cells are of particular interest because they are highly enriched in hematopoietic stem and progenitor cells . Comparison of the PCR-LongSAGE library obtained from this subset with published (SAGE) libraries prepared from nonamplified cDNA obtained from similar cells showed extensive similarities in tag composition and the presence of many expected transcripts. In addition, our studies underscore the power of the LongSAGE protocol for identifying novel transcripts and transcripts of potential developmental importance because of their restricted but reproducible detection in closely related primitive cell populations. We therefore expect that this method will broaden the application of SAGE to other purified or microdissected subsets of cells and thereby facilitate the investigation of many processes not previously accessible to global gene expression analysis.
Disclosure of Potential Conflicts of Interest
The authors indicate no potential conflicts of interest.
We are grateful to members of the Stem Cell Research Laboratory and FACS facility in the Terry Fox Laboratory for assistance in the initial processing and FACS isolation of the human cord blood and bone marrow cells used, to Dr. J. Thomson (University of Wisconsin, Madison, WI) for providing the H9 cell RNA extract, and to A. Wanhill and D. Wytrykush for assistance in preparing the manuscript. This work was supported by funds from Genome BC and Genome Canada, the Stem Cell Network, and the Terry Fox Run (as a grant from the National Cancer Institute of Canada). Y.Z. held postdoctoral fellowships from the Stem Cell Network and the Leukemia Research Fund of Canada. A.R. held postdoctoral fellowships from the Canadian Breast Cancer Foundation and the Canadian Institutes of Health Research. D. Kent held Studentships from the Stem Cell Network and the Canadian Institutes of Health Research. M.A.M. and S.J. are Scholars of the Michael Smith Foundation for Health Research, and M.A.M. is a Terry Fox Young Investigator of the National Cancer Institute of Canada. Y.Z. and A.R. contributed equally to this work.