Recent advances in quantitation of mRNA by hybridization to microarrayed gene sequences or by deep sequencing of cDNA (RNA-seq) have provided global views of the abundance of each transcript. Analyses of RNA samples taken at 2 or 4 h intervals throughout development of Dictyostelium discoideum have defined the developmental changes in transcriptional profiles. Comparisons of the transcriptome of wild-type cells to that of mutant strains lacking a gene critical to progression through the developmental stages have defined key steps in the progression. The transcriptional response to cAMP pulses depends on the expression of pulse-independent genes that have been identified by transcriptional profiling with microarrays. Similar techniques were used to discover that the DNA binding protein GBF functions in a feed-forward loop to regulate post-aggregation genes and that expression of a set of late genes during culmination is dependent on the DNA binding protein SrfA. RNA-seq is able to reliably measure individual mRNAs present as a single copy per cell as well as mRNAs present at a thousand fold higher abundance. Using this technique it was found that 65% of the genes in Dictyostelium change twofold or more during development. Many decrease during the first 8 h of development, while the rest increase at specific stages and this pattern is evolutionarily conserved as found by comparing the transcriptomes of D. discoideum and Dictyostelium purpureum. The transcriptional profile of each gene is readily available at dictyBase and more sophisticated analyses are available on DictyExpress.
For the last 50 years it has been recognized that changes in the pattern of gene expression is the basis for establishing specialized cell types during differentiation and embryogenesis (Jacob & Monod 1961; Materna et al. 2010). Initial studies relied on indirect estimation of mRNA levels based on the rate of change in specific enzymes and proteins following inhibition of transcription (Nakada & Magasanik 1964; Sussman & Sussman 1965; Tomkins et al. 1966). Global changes in mRNA could be directly determined from the rates of hybridization to genomic DNA but changes in specific mRNAs were only possible following the discovery of techniques to clone unique genes in the 1970s. Northern blots of size separated RNA could then be probed with labeled DNA of specific genes and the abundance directly visualized (Alwine et al. 1977). The next revolution came with the construction of microarrays on which thousands of cloned DNAs or oligonucleotides carrying gene sequences were robotically positioned in microscopic spots before being hybridized with fluorescently labeled RNA (Schena et al. 1995). Computer-assisted analyses of the fluorescence at each of the microarrayed DNA spots could track changes in thousands of specific mRNAs throughout development. This technique can rapidly generate quantitative data for pure cell populations but is of limited use for embryos when they reach the stage at which multiple cell types have differentiated. However, whole mount in situ hybridization with labeled DNA probes can uncover the patterns of differential gene expression in specific tissues as embryogenesis proceeds (John et al. 1969; Tecott et al. 1988; Escalante & Loomis 1995). More recently, deep sequencing of RNA has become the method of choice to accurately quantitate changes in thousands of mRNAs simultaneously (Mortazavi et al. 2008; Wang et al. 2009).
Microarray hybridization of cDNAs
Transcriptome studies in the social amoeba Dictyostelium discoideum have exceptionally high resolution as the result of the rapid, synchronous development of large numbers of cells (Loomis 1975; Kessin 2001). Moreover, the two major cell types, prestalk and prespore cells, can be separated on the basis of their ionic response in density gradients (Ratner & Borth 1983; Iranfar et al. 2001; Parikh et al. 2010b). The first global transcriptomic studies were carried out with microarrays carrying 5655 cDNAs from the Japanese cDNA Project (Van Driessche et al. 2002; Iranfar et al. 2003). Profiles of mRNA abundance were generated from samples collected at 2 h intervals following the initiation of development. There were changes in the transcriptional profiles at every stage. Hundreds of genes gave robust signals at specific times in the 24 h of development and could be used as molecular markers of progression through the stages. Such markers are highly informative when characterizing developmental mutants or the effects of specific treatments.
In the absence of cAMP, mRNAs from only three genes were found to accumulate significantly (at least threefold) immediately after development was initiated by nitrogen limitation (Iranfar et al. 2003). They encode the surface receptor for cAMP, CAR1; the trimeric G protein subunit, Gα2, that is specific for CAR1; and the secreted cAMP phosphodiesterase, PdsA. Each of these proteins is essential for cells to be able to aggregate in response to the chemoattractant cAMP. These genes were expressed when the cells were suspended in buffer and were further induced by the addition of pulses of 30 nmol/L cAMP every 6 min starting at 2 h. Pulses also induced a set of at least 15 genes that first accumulated between 2 and 4 h of development (Iranfar et al. 2003). This set included the major adenylyl cyclase, ACA, the cell–cell adhesion protein, gp80, as well as the late adhesion proteins TrgB1 and TrgC1 that are involved in self recognition signaling (Benabentos et al. 2009). These genes were expressed in mutant cells lacking ACA as long as the cells were pulsed with cAMP. Later genes were not expressed in these mutant cells unless the cAMP protein kinase was made constitutive (Iranfar et al. 2003).
Expression of the pulse-independent genes prepares the cells for expression of the pulse-dependent genes. The transcriptional state is stabilized by the positive feedback loop in which cAMP synthesized by ACA is secreted and stimulates further cAMP synthesis when bound to CAR1 coupled to trimeric G protein containing Gα2. When cAMP increases internally, PKA is activated, leading to subsequent changes in transcription.
Expression of trgC1 is not only dependent on cAMP pulses but also on the DNA binding protein GBF (Brown & Firtel 2001). Expression of post-aggregative genes is regulated by a feed-forward loop involving both GBF and signaling from TrgC1 (Iranfar et al. 2006). This coherent feed-forward loop has three positive signals and can filter noisy inputs during multicellular development. Many of the genes controlled by this loop are expressed exclusively in prespore or prestalk cells where they are likely to get further transcriptional instructions.
Transcription of later genes does not occur in cells developed in suspension but occurs synchronously in cells allowed to develop as dense lawns on buffer saturated filters (Sasik et al. 2002). Global microarray studies of filter developed cells revealed changes in the transcription profiles during early development that were similar to those seen in suspension developed cells (Van Driessche et al. 2002). The biggest difference in consecutive samples of filter developed cells occurred between 6 and 8 h as the cells were entering aggregates. There was also a considerable change in the transcriptional profiles between 10 and 12 h of development as prestalk and prespore cells sorted out to form tipped aggregates. Profiles changed relatively little during the first 6 h of development or for 4 h prior to culmination. While the number of genes with differences in abundance was low during these periods, it does not necessarily follow that there was little physiological differentiation since changes in a few critical genes can have a marked effect.
Microarray data on the relative abundance of specific mRNAs in different samples is innately noisy, mostly due to variable cross hybridization (Mcmullen et al. 2010). However, RNA-seq technology avoids most of these problems and generates robust, digital data that are accurate over at least three orders of magnitude. For these studies mRNA was purified from total RNA using oligo(dT) beads, fragmented to an average size of 200 bases to avoid snap-back loops, and then used to generate double stranded cDNA. cDNA libraries were sequenced on a high-throughput Illumina Genome Analyzer which generates >20 million reads of 35 bases. These short reads were aligned with the fully sequenced genome of D. discoideum to estimate the abundance of specific mRNA molecules in the cell. It can be calculated that genes with at least 30 hits have generated more than one copy of mRNA per cell (Parikh et al. 2010b). Since only sequences of 35 bases that occur just once in the genome are used to identify genes, the results are unambiguous. A total of 12 713 mapable genes can be recognized in the D. discoideum genome. When all the data from growing cells and from samples collected every 4 h throughout development were considered, a total of 9639 genes were seen to be transcribed (Parikh et al. 2010b). Expression of almost all of these genes changed during development, some increasing and some decreasing at specific stages. Changes in expression of individual genes were found to be highly reproducible and many were confirmed by quantitative reverse transcription–polymerase chain reaction (qRT–PCR). The profile for each gene is readily available at dictyBase (Fey et al. 2009) or directly at DictyExpress http://www.ailab.si/dictyexpress/, which also allows users to analyze the data in a variety of ways (Rot et al. 2009). During development, mRNAs from 1779 genes decreased twofold or more and stayed low, those from 3777 genes increased twofold or more and stayed high, and mRNAs from 2822 genes increased twofold or more and then decreased (Parikh et al. 2010b).
There are 107 genes in the D. discoideum genome that encode ribosomal proteins. Their mRNAs are all abundant in growing cells but they turnover rapidly following the initiation of development such that they are at least 10-fold reduced after 8 h (Parikh et al. 2010b). This dramatic drop in mRNAs coding for ribosomal proteins is not confined to D. discoideum but also occurs during early development of D. purpureum (Parikh et al. 2010b). Considering that mRNAs for ribosomal proteins make up almost 50% of total mRNAs in growing cells, a 90% decrease could have a significant effect in freeing up ribosomes for translation of other mRNAs (Fig. 1). Assuming that translation is limited by the number of ribosomes, degradation of 40% of the mRNA in vegetative cells will favor translation of the remaining mRNA. Another 1672 mRNAs from genes with mixed functions and lower abundance also decreased significantly during the first 8–12 h of development. One consequence of such differential turnover would be to accelerate translation of mRNA from genes expressed during early development.
mRNAs from 6140 genes were found to increase fourfold or more during development (Fig. 2). It would be nice to present the changes on a gene-by-gene basis but that would require a multi-page fold out. Therefore, the genes were sorted on the basis of their temporal pattern of express and put into pools of 30 genes. The abundance at each stage was calculated relative to the median of the abundance of each gene over time (Fig. 2). While it might be of interest to inspect the absolute abundance of each gene during development, the overall pattern of temporal control of transcription would be skewed towards the most abundant mRNAs. The quantitative results from each gene and the expression patterns of specific gene groups can be readily seen at DictyExpress. Abundantly expressed genes do not necessarily play more important roles than rarely expressed genes; it all depends on the physiological roles of the genes.
A few general aspects of the changes in mRNA abundance of developmentally-induced genes can be seen in Figure 2. Most of the mRNAs that increase at least fourfold during the first 8 h start to decrease by 16 h and are at low abundance by 20 h. Those that first accumulate between 8 and 16 h stay relatively high until fruiting bodies are formed at 24 h. Individual groups of genes that are expressed in similar manners show that there are a large number of specialized patterns at every stage of development. Some groups of genes are highly expressed at two distinct stages separated by a period of relative quiescence.
Genes that increase at least fourfold during development are likely to encode proteins that provide selective advantages under one or another condition. The number of such genes recognized by RNA-seq is considerably higher than previous estimates based on saturation mutation screens or microarray results and clearly indicates that much more remains to be learnt about Dictyostelium development. The discrepancy from the mutation analysis is probably due to our inability to observe subtle developmental defects that result from mutations in genes that are not essential for morphogenesis. The difference from the microarray data is probably due to the improved accuracy of the RNA-seq method.
Just because a gene is expressed at a specific stage of development does not necessarily mean that it functions at that stage; however, if its transcriptional profile has been conserved over long evolutionary periods, it is likely that it plays an essential role when it is expressed. Dictyostelium discoideum and Dictyostelium purpureum diverged about 400 million years ago but have retained very similar stages in the formation of fruiting bodies (Sucgang et al. 2010; Parikh et al. 2010b). Most, but not all, developmental genes with homologues in the other species have similar developmental profiles. This type of comparison is possible with RNA-seq because the data are quantitative and absolute, whereas the data provided by microarrays could not be compared across species. The comparison can be made for each gene of interest in DictyExpress where data from parallel RNA-seq experiments on the species are conveniently presented (Parikh et al. 2010b).
Although the temporal resolution of the present RNA-seq data is lower than that of the microarray studies, the two very different measurements of mRNA abundance give similar results, at least for the abundant mRNAs. Low abundance mRNAs are more reliably measured by RNA-seq. For instance, RNA-seq showed about 10 times more cell-type specific genes: 915 prestalk specific genes, 850 prespore specific genes. Nevertheless, almost all genes thought to be cell type specific on the basis of microarray studies (Maeda et al. 2003) were confirmed by RNA-seq (Parikh et al. 2010b).
Despite the efforts of many laboratories over the years, no clear central genetic circuit can be traced that can account for the transcriptional changes that occur at well defined stages throughout the 24 h developmental cycle. Likewise, the mechanisms for restricting expression of certain genes to prestalk cells and others to prespore cells are poorly defined. It is assumed that transcription is either induced or repressed as the result of regulatory proteins binding to specific sequences near the target gene. The pattern of cis-acting sequences can either be simple or complex leading to control by a master regulatory gene or by combinatorial control using several different DNA binding proteins. Moreover, regulation can be a multistep process in which protein at one cis-acting site recruits other proteins that modify the histones of adjacent nucleosomes or even the DNA itself.
In a few cases, mutational analyses have partially explained how gene expression patterns change. Cells depleted in the DNA binding protein CbfA as the result of an amber mutation in its gene were unable to develop (Winckler et al. 2001). Northern blot analyses showed that the adenylyl cyclase, acaA, was not expressed in cbfAam mutant cells, which could easily explain why they were unable to aggregate (Winckler et al. 2004). CbfA binds to a homopolymer dT/dA sequence found in the regulatory region of acaA that may present a specialized structure rather than a specific sequence (Siol et al. 2006). CbfA has an AT-hook near the carboxy terminus that may recognize high A/T regions. At the N-terminal of CbfA there is a domain of the jumonji family, which includes chromatin modifying factors. CbfA may modify the regulatory region so that other transcription factors, such as MybB, can bind and induce transcription (Otsuka & Van Haastert 1998).
Microarray studies showed that the abundance of dozens of genes differed in cbfAam mutant cells during growth and that only a few early genes were expressed during development (Lucas et al. 2009; Winckler et al. 2004). The lack of expression of developmental genes was shown to result from the lack of intercellular cAMP signaling since pulsing the cells with exogenous cAMP overcame the block and allowed them to complete development and form fruiting bodies (Winckler et al. 2004). Moreover, pulsed cbfAam cells expressed acaA indicating that cAMP signaling recruits other transcriptional activators to acaA. Such a positive feedback loop can stabilize the commitment to initiate development.
The gene encoding GbfA is expressed shortly after the initiation of development and its mRNA reaches a peak after 12 h (Parikh et al. 2010b). GbfA binds to sites with sequences closely related to CACAC where it acts in conjunction with other DNA binding proteins to either induce or repress transcription (Wang & Williams 2010). Mutants lacking GbfA arrest development at the loose aggregate stage and do not express post-aggregation genes (Iranfar et al. 2006). One of the early GbfA dependent genes is trgC1, which encodes a cell–cell adhesion protein that generates intracellular signals when highly similar cells are encountered. Expression of post-aggregation genes is dependent on a feed-forward loop involving both GbfA and signaling from TrgC1 (Iranfar et al. 2006). This acts as a low-pass filter that ensures synchronous development of cells within an aggregate.
Several prestalk specific genes are known to be induced by the chlorinated hexaphenone differentiation inducing factor (DIF) when cells are developed as monolayers (Williams et al. 1987). Induction has been shown to depend on the transcription factors DimA, DimB, and MybE (Thompson et al. 2004; Huang et al. 2006; Zhukovskaya et al. 2006; Fukuzawa et al. 2006). A microarray study using chips that covered 8579 genes showed that several dozen genes that were induced by DIF in wild type cells developed as monolayers were not induced by DIF when cells of dimB− or mybE− null strains were developed identically (Yamada et al. 2010). The investigators also found that DimB and MybE negatively regulated a considerable number of genes under these conditions. The microarray data, which were confirmed by qRT–PCR, showed that one of these genes, rtaA, was DIF inducible in mybE− and dimB− null cells but not in wild type cells. This gene was found to be expressed in a subset of anterior-like cells in the prespore region that are subsequently found in the upper cup, which tops the mass of prespore cells as they mount the growing stalk (Yamada et al. 2010).
Dictyostelium accumulates a DNA binding protein similar to the Serum Response Factor, SrfA, late in development that functions during fruiting body formation (Escalante & Sastre 1998; Escalante et al. 2003). It carries a MADS box domain that recognizes sequences related to ATAAG. Microarray studies of srfA− null mutants identified 21 genes that were reduced in the mutant cells (Escalante et al. 2004). All of them were expressed late in wild type development and many were previously characterized components of spores or shown to be involved in germination. Ten of the SrfA-dependent genes are preferentially expressed in prespore cells, while three are preferentially expressed in prestalk cells. It appears that SrfA plays a major role in controlling late gene expression in both of the major cell types. Many spore genes also depend on the GATA-like DNA binding protein StkA that recognizes the sequence AATCAA (Loughran et al. 2000). This gene was first recognized in a mutant that committed all its cells to making a long thin stalk and failed to make any spores (Morrissey & Loomis 1981). Genome- wide studies have not yet been carried out with stkA− null cells but they should be able to determine the role of StkA in the late steps of development.
In situ hybridization studies have uncovered the complex dynamic nature of transcriptional regulation in prestalk cells (Maeda et al. 2003). Specific transcription factors are known to play roles in the expression patterns of certain specific genes but global patterns have yet to be established. These DNA binding proteins include STATa, STATc, DimA, DimB, CudA, MybC and MybE (Shaulsky & Huang 2005). They mediate transcriptional responses to cAMP, DIF, and other intercellular signals but the details have not been worked out. There are about 100 genes in the Dictyostelium genome that encode proteins homologous to known transcription factors. A considerable number of these genes are differentially expressed during development and are likely to be involved in the regulation of transcriptional profiles at various stages as well as restriction of expression to certain cell types. We have only barely scratched the surface of the genetic networks underlying the temporal and cell type specific patterns of transcription, but further work is likely to discover intricate, interconnected pathways and sophisticated feedback loops that tie development together.
The commonly used laboratory strains of D. discoideum are able to grow in axenic medium by fluid uptake in macropinosomes but can also grow by engulfing bacteria in phagosomes. Most of the microarray studies were carried out with cells that had been grown axenically in HL5 medium to avoid bacterial RNA which might interfere with hybridization to the arrayed cDNAs. However, amoebae grow at least twice as fast when fed bacteria and have a fourfold increase in the rate of phagocytosis within a few hours. To characterize the adaptations to bacterial growth, amoebae were incubated with Escherichia coli B/r and collected after 2 h or after they started growing exponentially on the bacteria (Sillo et al. 2008). Genome-wide analyses were carried out on microarrays carrying DNA from 8500 genes. Comparisons were made between axenically grown and bacterially grown cells. mRNAs from 185 genes were seen to increase at least twofold within the first 2 h of incubation in the presence of bacteria while 343 decreased. Only 32 of the upregulated mRNAs remained enriched in cells growing exponentially on bacteria, while 70 of the mRNAs that decreased in the first 2 h remained low when the cells reached steady-state. It seems that adaptation to the new food source required more substantial transcriptional change than maintaining the new state. Manual annotation of the genes that were regulated by phagocytosis revealed that 20 of the 179 proteins identified in a proteomic study of purified phagosomes (Gotthardt et al. 2006) were encoded by the upregulated genes (Siol et al. 2006). Potential roles for the other genes were indicated by their gene ontology (GO) designations and presented many interesting possibilities to be tested by directed studies.
The presence of pathogenic bacteria in the food can slow the growth of Dictyostelium (Steinert & Heuner 2005). Pseudomonas aeruginosa is a prevalent opportunistic human pathogen that sickens Dictyostelium when it makes up more than 3.5% of the bacterial food source (Carilla-Latorre et al. 2008). The transcriptional response of Dictyostelium was studied 4 h after exposure to either of two clinical isolates using whole genome microarrays (Carilla-Latorre et al. 2008). The less virulent strain PA01 elicited more than a twofold increase in 24 genes and a twofold or more decrease in 126 genes, while the more virulent strain PA14 elicited an increase in 20 genes and a decrease in 105 genes. Surprisingly, only four genes were found to be upregulated in both populations while 70 of the downregulated genes were seen to be in common. It appears that the host response to different isolates of the same pathogen is quite specific (Carilla-Latorre et al. 2008). Moreover, the response to P. aeruginosa was substantially different from that to another pathogenic bacterium, Legionella (Farbrother et al. 2006). Only eight genes were altered in response to both pathogens.
Legionella is an intracellular parasite that interferes with vesicle traffic and grows within the phagosome. It is the causative agent of Legionnaires’ disease. Transcriptomes of cells exposed to two different species of Legionella were characterized on microarrays carrying 5423 cDNAs and compared to the transcriptome of uninfected cells (Farbrother et al. 2006). Samples were collected at various times during the 48 h period before the cells started to lyse. The most dramatic change in the transcriptional profile occurred 24 h after infection when the bacteria started to multiply within the phagosome. Two hundred and seventy-three genes increased at least 1.5-fold, while 325 genes decreased at least 1.5-fold (Farbrother et al. 2006). By comparing the 24 h transcriptome of cells infected with virulent Legionella pneumophila to that of cells infected with an avirulent mutant of Legionella pneumophila or the weakly virulent Legionella hackeliae, a set of 79 upregulated genes and a set of 52 downregulated genes specific to Legionella could be recognized. Inspection of the GO annotations for these genes for over-represented categories indicated that the pathogen not only induced a stress response but also modified the metabolism of the host for its own needs (Farbrother et al. 2006).
Escherichia coli, P. aeruginosa, and L. pneumophila are all Gram-negative bacteria. It will be interesting to characterize the response of Dictyostelium cells to a diet of Gram-positive bacteria such as Bacillus. Dictyostelium mutants are known that can grow well on Gram-negative bacteria but not on Gram-positives, indicating that there are genes dedicated to each major bacterial kingdom (Newell et al. 1977; Morrissey et al. 1980). It is possible that the transcriptional profiles of cells growing on Gram-positive or Gram-negative bacteria will be found to differ significantly. Other environmental changes, such as drug treatment at sub-lethal doses, may also result in informative transcriptional responses.
It seems clear that transcriptomic studies using RNA-seq techniques provide higher quality data than previous techniques. With the average number of reads in the thousands for a medium abundant mRNA, quantitating changes during development or other alterations in a physiological state can be highly accurate. If appropriate normalization algorithms are used, then the relative abundance of different mRNAs at a given state can be reliably estimated. However, there is the question of cost. Academic genomic facilities charge about $400 per sample for microarray studies. The cost of RNA-seq is higher, but not much. At University of California San Diego (UCSD) and Baylor there are facilities that charge less than $1000 per sample for reads that are 35 bases long and usually provide over 35 million individual reads. For just twice the cost, the research gets far superior digital data that will be valuable for years to come. Although data are not yet available for Dictyostelium mRNAs analyzed in different laboratories, there is no reason to think there will be any problem in reproducibility or transferability. We have it on good authority that the DictyExpress team would be more than happy to host any RNA-seq data from other laboratories. It would be a great resource for the community and we encourage anyone who is planning to generate RNA-seq data to deposit their data in DictyExpress after publication.
It will be of considerable interest to determine the developmental transcriptomes of a wide variety of mutant strains using RNA-seq. Mutants of high interest include those lacking the DNA binding proteins that were mentioned above. Although interpretation of the results would have to take into account indirect effects where genes directly affected by the loss of the transcriptional regulator affect expression of other genes in a dependent pathway (Loomis et al. 1977), it would give insight into which genes might or might not be controlled by the specific DNA binding protein.
Preliminary studies of a set of chosen mutant strains using microarray data have recognized a regulon of chemotaxis genes (Booth et al. 2005) and uncovered new components of the Dictyostelium PKA signal transduction pathway (Parikh et al. 2010a). The microarray data used in these studies are innately noisy and so limit the interpretations. Revisiting these analyses with high quality RNA-seq data should uncover further candidates and connections.
We benefited from discussions with Professor Terry Hwa, UCSD, concerning the consequences of ribosomal limitation. This work was supported by the National Institutes of Health (P01-HD39691).