†These authors contributed equally.
UPSC-BASE –Populus transcriptomics online
Article first published online: 8 NOV 2006
The Plant Journal
Volume 48, Issue 5, pages 806–817, December 2006
How to Cite
Sjödin, A., Bylesjö, M., Skogström, O., Eriksson, D., Nilsson, P., Rydén, P., Jansson, S. and Karlsson, J. (2006), UPSC-BASE –Populus transcriptomics online. The Plant Journal, 48: 806–817. doi: 10.1111/j.1365-313X.2006.02920.x
Publication of this paper was delayed until acceptance for publication of Tuskan et al. (2006).
- Issue published online: 8 NOV 2006
- Article first published online: 8 NOV 2006
- Received 24 August 2005; revised 13 December 2005; accepted 21 December 2005.
- expression profiling;
The increasing accessibility and use of microarrays in transcriptomics has accentuated the need for purpose-designed storage and analysis tools. Here we present UPSC-BASE, a database for analysis and storage of Populus DNA microarray data. A microarray analysis pipeline has also been established to allow consistent and efficient analysis (from small to large scale) of samples in various experimental designs. A range of optimized experimental protocols is provided for each step in generating the data. Within UPSC-BASE, researchers can perform standard and advanced microarray analysis procedures in a user-friendly environment. Background corrections, normalizations, quality-control tools, visualizations, hypothesis tests and export tools are provided without requirements for expert-level knowledge. Although the database has been developed primarily for handling Populus DNA microarrays, most of the tools are generic and can be used for other types of microarray. UPSC-BASE is also a repository of Populus microarray information, providing data from 21 experiments on a total of 407 microarray hybridizations in the public domain of the database. There are also an additional 10 experiments containing 347 hybridizations, where the automatically analysed data are searchable.
Advanced global gene-expression profiling tools, such as Gene-Chips (Lockhart et al., 1996) and spotted microarrays (Schena et al., 1995), are becoming common features in laboratories, used not only by highly trained specialists but increasingly also by researchers who lack training in the sophisticated data-handling procedures required to optimize their use. The goal for biologists using DNA microarrays is to find relevant information and answers to their specific questions as quickly and conveniently as possible, rather than spending large amounts of time identifying the ideal cDNA-synthesis, labelling and statistical techniques for their experiments. Problems with labelling, hybridization, washing, scanning, image analysis, normalization and statistical treatment of data strongly influence the outcome of the analysis, and re-analysis of published data can often lead to results that differ significantly from those obtained originally (Gu and Gu, 2003; Wang et al., 2002). If DNA microarray results from several experiments are to be compared, new problems may appear relating not only to the biological source material, but also to differences in analytical procedures. In recent years, several large-scale array-analysis projects, such as AtGenExpress (Schmid et al., 2005), have been conducted in which a large number of samples have been analysed in a standardized, ordered fashion. These projects create invaluable resources for the scientific community, but require resources far beyond those of a normal research project.
Appropriate logistical tools are required for handling and analysing the tremendous amounts of data produced in even a single microarray experiment. To gain as much knowledge as possible from microarray experiments, at least two types of database are essential: a storage and analysis database of expression data obtained from the analyses; and an annotation database connecting independent array elements with second-level sequence information and possible gene identification, and third-level functional classification. As increasing amounts of data from array experiments are published, the need for public repositories has become increasingly evident to allow re- and meta-analysis of data (Ball et al., 2004). This need is being met by the large numbers of commercial and public DNA microarray database structures now available (Penkett and Bahler, 2004). Such repositories, for instance ArrayExpress (Brazma et al., 2003) and Gene Expression Omnibus (GEO) (Edgar et al., 2002), require data to be submitted in a standardized format. Within the microarray community such standards have been established by the Microarray Gene Expression Data Society, which has presented the ‘Minimum information about a microarray experiment’ (MIAME) standards (Brazma et al., 2001; Stoeckert et al., 2002).
The genome of Populus trichocarpa was the third plant genome to be fully sequenced (Tuskan et al., 2006), making Populus the most important model tree system for plant genomics currently available. Extensive Populus expressed sequence tag (EST) collections have been compiled (Bhalerao et al., 2003; Kohler et al., 2003; Nanjo et al., 2004; Sterky et al., 1998, 2004), which not only are important for annotation of the genome and to confirm the expression of predicted genes, but also can be used to obtain digital expression profiles of genes (Ewing et al., 1999; Sterky et al., 2004). However, these digital expression profiles do not yield very accurate estimates of expression levels. DNA microarrays have much greater potential to provide precise information on gene expression, and have been used in several cases to analyse changes in gene expression in Populus (Andersson et al., 2004; Hertzberg et al., 2001; Israelsson et al., 2003; Kohler et al., 2003; Lafarguette et al., 2004; Moreau et al., 2005; Rishi et al., 2004; Schrader et al., 2004a,b; Smith et al., 2004; Taylor et al., 2005). Furthermore, full-genome arrays, based on the genome sequence, are under production. We were the first to produce Populus cDNA microarrays (Hertzberg et al., 2001), and our most recently generated array is based on a 100 K EST data set (Sterky et al., 2004), estimated to represent 17 345 of the gene models in the Populus genome (B. Segerman, UPSC, Umeå, Sweden personal communication). This corresponds to a significant part of the transcriptome (Tuskan et al., 2006). As a large number of experiments are being performed with our DNA microarrays, we wanted to establish a standard operating procedure that should make the analysis simple and more reliable, especially for less experienced researchers, to allow a higher throughput of array experiments. A further advantage of a standard operating procedure is that it should facilitate comparisons of array data generated in different experiments and by different researchers, and thus help make the overall value of the array experiments greater than the sum of the individual experiments. For this reason, we wanted to develop a DNA microarray analysis pipeline and a database to store the results.
We have developed the UPSC-BASE database (http://www.upscbase.db.umu.se) for hosting plant microarray data (more specifically, data from Populus and Arabidopsis arrays). The database provides the user with up-to-date microarray procedures in the laboratory, as well as tools for downstream data analysis. It connects to annotation databases (PopulusDB) as well as other gene-expression databases [for Arabidopsis, The Arabidopsis Information Resource (TAIR) (Huala et al., 2001), Nottingham Arabidopsis Stock Centre (NASC) affymetrix service (Craigon et al., 2004), Gene Expression Omnibus (GEO) (Edgar et al., 2002) and Genevestigator (Zimmermann et al., 2004)]. It is based on the free web-based database solution base (BioArray Software Environment) (Saal et al., 2002). In our setup, the intention has been to provide the researcher with logical steps without bottlenecks in the data production and analysis steps, so that data deposition, normalization, transformation, and statistical and hypothesis tests are easy to follow, and generate understandable results that can lead researchers to valid conclusions. All experimental protocols (all tested and optimized), all relevant information on the Populus DNA microarrays, all plug-ins developed, and a description of all modifications of the original base package are freely available at the UPSC-BASE website.
Early microarray experiments were typically small-scale and had little biological or technical replication. As the technology has matured, it has become possible to perform more complex experiments examining the effects of several factors (e.g. time, mutations and environmental treatments). Consequently, it has become increasingly important to apply appropriate experimental designs to cDNA microarray analyses to ensure results are reliable (Churchill, 2002; Kerr and Churchill, 2001; Yang and Speed, 2002). UPSC-BASE features an interactive tool for generating experimental designs, referred to as the design advisor. The design advisor calculates optimal design solutions for situations where many biological samples are present and an exhaustive pairwise hybridization scheme is unrealistically labour-intensive and costly. The functionality of the design advisor is described in more detail on the UPSC-BASE website and by Vinciotti et al. (2005); Wit and McClure (2004); Wit et al. (2005). By utilizing design advisor prior to hybridization, we believe the final quality of data can be increased while keeping the required number of hybridizations to a minimum, thus reducing both cost and manual effort. Although few published microarray studies have used a loop or factorial design, the published data suggest that this approach is superior to other alternatives (Vinciotti et al., 2005). This is the approach we follow in our recommended pipeline.
To demonstrate the reproducibility of our analysis pipeline and the features and usefulness of UPSC-BASE, we used a data set obtained from rehybridizing six leaf samples collected during various stages of development from a free-growing aspen. The biological samples are described in (and the raw data for the original experiment stored as) experiment UMA-0032 at http://www.upscbase.db.umu.se. The ‘wet’ part of the pipeline and the initial downstream steps were performed by another individual, and the experimental design was completely different from that used during the ‘original’ data collection, to assess the robustness and consistency of our pipeline. In contrast to the common reference in the original experiment, we chose to use an all-versus-all design for the demonstration experiment, with one sample date included in triplicate in the hybridization design (Figure 1a,b). UPSC-BASE features a visualization plug-in for producing overview graphs of the experimental design. The raw data for the all-versus-all experiment, plus the results obtained after each step of the analysis, can be downloaded from experiment number UMA-0013 at http://www.upscbase.db.umu.se. We aimed at demonstrating both the high reproducibility of our array analysis pipeline, and the fact that a good design can make it possible to obtain faithful data from many samples using a minimum of hybridizations. In this design we analysed eight samples using a total of 28 microarrays.
Production of the Populus (POP2) microarray
The microarrays used here constitute the second generation of the global Populus cDNA microarrays and contain, in total, 24 735 cDNA fragments. This array is based on the first-generation 13-k Populus array (Andersson et al., 2004) with clones from seven cDNA libraries, representing the cambial zone (AB), young leaves (C), floral buds (F), tension wood (G), senescing leaves (I), dormant cambium (UA) and active cambium (UB). The 25-k array contains clones from the 13-k array plus 12 additional cDNA libraries, representing the apical shoot (K), cold-stressed leaves (L), roots (R), bark (N), shoot meristem (T), male catkins (V), dormant buds (Q), female catkins (M), petioles (P), fibre death (X), imbibed seeds (S) and virus/fungus-infected leaves (Y). For a detailed description of the construction and sequencing of the cDNA libraries, see Sterky et al. (2004). The arrays were produced and quality-tested as described by Moreau et al. (2005).
Generating the array image
The flow chart of the ‘wet’ part of our analysis pipeline is depicted in Figure 2. We provide several protocols that we have found suitable for extracting RNA from different Populus tissues. The cetyl trimethyl ammonium bromide (CTAB)/lithium chloride (LiCl) method (Chang et al., 1993; Doyle and Doyle, 1987) is a general-purpose RNA-extraction method that is useful if plenty of material is available. Modified protocols are also available for small samples and for extracting RNA from stem tissue. A TRIzol reagent method (Invitrogen, Carlsbad, CA, USA) works less robustly for Populus than for Arabidopsis. A modified RNeasy kit (Qiagen, Valencia, CA, USA) gives lower yields, but the procedure is quicker. The Dynabead-based mRNA-extraction method (Hertzberg et al., 2001) allows extraction from tissue samples as small as 1 mg using direct labelling of the cDNA. If plenty of material is available (>20 μg RNA per hybridization), the standard cDNA synthesis/indirect labelling protocol is the most robust method. More recently, the MessageAmp amplification method (Ambion, Austin, TX, USA) with indirect labelling has also proved useful when smaller amounts of RNA (approximately 1 μg) are available.
Hybridization in an automated slide processor (Amersham Bioscience, Little Chalfont, UK) is a highly reproducible method, suitable for most experiments, which generates very high-quality data. The standard protocols have been optimized in several ways for our material to give stronger hybridization signals, less background and more uniform hybridization results. In our setup, 12 slides can be processed in parallel within a total run time of 24 h. As an alternative, the standard manual hybridization protocol is believed by some researchers to give stronger hybridization signals, but typically at the expense of the evenness of the hybridization.
The linear range of the signal intensity is a limitation of microarrays. It is often impossible to select the scanning parameters in such a way that both very strongly and weakly expressed genes can be faithfully analysed simultaneously. We provide two parallel scanning procedures. First, when the aim is to study a subset of genes, the channel calibration method can be used. The results are limited to genes with signals within the linear range (with moderate expression levels); spots with too low or high intensity will not give ratio values that are directly correlated with the ‘true’ hybridization signals. In order to standardize the procedure to obtain useful data concerning genes with both high and low expression levels, we have implemented an alternative method, based on multiple scanning of the microarray slides. Each of microarray slide is scanned three times with predetermined increases in laser power and photo-multiplier tube settings using a Scanarray 4000 scanner (Perkin-Elmer Life and Analytical Sciences, Wellesley, MA, USA). The data from each physical microarray slide are then merged with a regression method, restricted linear scaling, within the linear range (Ryden et al., 2006). Restricted linear scaling is a method to handle the problems associated with missing and saturated signals, which occur in most types of microarray experiments.
For storing and processing the data, we have incorporated two major microarray-analysis packages: base (Saal et al., 2002) and bioconductor (Gentleman et al., 2004), into UPSC-BASE. Compared with the downloadable base system, our installation features a large set of local adjustments, primarily related to large-scale data handling and a simplified user interface. A comprehensive description of the modifications is beyond the scope of this paper, however, a brief summary of the most wide-ranging extensions is outlined below, and additional information is available on the UPSC-BASE website.
Depending on the design of the experiment, data can be analysed in several different ways. In contrast to bioconductor, which uses the command-line interface of r (Ihaka and Gentleman, 1996), UPSC-BASE is accessed via a web interface and helps the user perform advanced analyses without expert knowledge in statistics and computer programming. Many plug-ins have been implemented in the system (Table 1), most based on published methods, but some developed in-house.
|Principal components analysis|
|Principal components analysis (Nonlinear iterative partial least squares)|
|Pearson/Spearman correlation of signal intensities or ratios|
|Normal and exponential|
|UmeaSAMED linear background correction|
|Hypothesis test||t-test/Mann–Whitney test|
|DEDS (Differential expression via distance synthesis)|
|manova (fixed model)|
|Significance analysis of microarrays|
|Normalization||Global median ratio|
|2D spatial location|
|Robust neural networks (neural nets normalization)|
|Optimized local intensity- dependent normalization|
|Between arrays (scale or quantile)|
|Quality control||Array plots|
|MA control spots|
|Transformation||UmeaSAMED restricted linear scaling|
|Gene ontology plots|
Data import en masse is facilitated using a batch import tool suitable for importing complete experiments. Required data files (such as scanned images and data files from the feature extraction) are uploaded by the user via file transfer protocol (FTP) and linked to a suitable experiment based on an experiment description file, a tab-delimited text file that defines various properties of the experiment.
The experiment description file can handle virtually all aspects of a MIAME-compliant experiment (Brazma et al., 2001). For cross-experiment comparisons, a database search tool has been implemented that browses the entire set of available analysed data in the database. The search procedure queries a set of array elements (based on internal ID, annotation information or functional class) and displays matching slides grouped by experiment or array element.
All public data in UPSC-BASE are analysed automatically with a standard procedure, including linear scaling and stepwise normalization (Wilson et al., 2003) to give reliable and standardized data. Analysis procedures have been simplified by sending supplementary laboratory information management systems information to the analysis tools, making it possible to utilize the complete design of the experiment without user intervention. This is particularly useful for hypothesis tests such as B-statistics (Lonnstedt and Speed, 2002), which would otherwise require the potentially error-prone step of manually inputting the design matrix of the experiment. Furthermore, to provide a basis for standard analysis packages, a batch plug-in feature is available: instead of running analysis tools one at a time, several plug-ins can be queued to run sequentially. This also makes it possible to provide standard analysis pathways in order to bring conformity to the analysis procedures of the data within the database. Currently, several proposed analysis packages have been pre-defined and are available for all regular users.
In addition to enabling cross-experiment comparisons and large-scale data handling, integration with internal and public databases has been a key consideration in the design of the UPSC-BASE microarray database. Annotation information and functional class assignments are updated automatically from PopulusDB (Sterky et al., 2004) on a weekly basis to provide up-to-date annotations. Furthermore, PopulusDB has been extended with suitable links to the microarray database search tool in order to find quickly information regarding specific clones. For the Complete Arabidopsis Transcriptome MicroArray (CATMA; Huala et al., 2001), annotations are also downloaded from TAIR on a weekly basis. A more detailed description of database modifications is available on request.
Data generation and quality assessment
An overview of the analysis pipeline, from TIFF image to interpretation, is shown in Figure 3. The image analysis is performed in genepix 5.0 (Axon Instruments, Union City, CA, USA) with standardized settings. In our experience, analysis of TIFF images with composite pixel intensity (CPI) settings set to find circular features with a diameter of 80–150% of the expected size, and composite pixel intensity threshold set to 300, produces the best results. In this way, very weak spots are automatically marked as ‘not found’. The extracted data are stored as plain text files and composite JPEG images. There are three alternative ways to handle bad spots: no flagging, manual flagging or automatic flagging. Although time-consuming, the manual method works well for experiments including relatively few (and not too large) microarrays, while the automatic flagging method masqot (microarray spot quality control; Bylesjo et al., 2005) is a reproducible alternative for high-throughput studies.
Microarrays to be included in the analysis are selected by creating a BioAssaySet containing extracted data from the raw files. Raw data can be imported as median- or mean-quantified values for background and foreground. Spatial and intensity visualizations of foreground and background intensities are used as quick solutions to spatial quality control problems and unbalanced signal intensities are common, especially for manual hybridizations. We have implemented several quality-control plug-ins, for instance arrayplots (Dudoit and Yang, 2002; Smyth, 2004); bias estimation (Futschik and Crompton, 2004a); rank intensity plot (Kroll and Wolfl, 2002) and Umeasamed qc (http://www.umu.se/climi/bact/Microarray/R-libraries.htm) to visualize potential problems. The rank intensity plot plug-in could be an effective tool for deciding whether to remove or keep a microarray in the data set, based on the number of missing spots and intensity distributions.
Cross-hybridization, incomplete washing and dust are all factors that contribute to background noise in the observed intensities. The ordinary local background correction, in which the observed background intensities are subtracted from the foreground intensities, is likely to underestimate the true background noise. We have implemented several methods for advanced background correction (for an overview see Table 1). For example, the linear background correction method (http://www.umu.se/climi/bact/Microarray/R-libraries.htm) combines information from observed background intensities and observations from negative control genes to estimate background-corrected intensities.
A median normalization is capable of coping with linear signal-intensity differences between two channels, but with large differences the systematic error is typically not linear. Loess normalization can remove non-linear intensity dependencies, often visualized as a curvature in an MA plot (Yang et al., 2002). An MA-plot is a plot of log-intensity ratios (M-values) versus log-intensity averages (A-values). Normalization methods, such as optimized local intensity-dependent normalization (Futschik and Crompton, 2004a,b), neural nets normalization (Tarca et al., 2005) or stepwise normalization (Wilson et al., 2003; Yang et al., 2002), are needed to remove spatial problems. To obtain highly reproducible and robust results between microarray experiments, we have chosen to use the stepwise normalization in our pipeline.
Analysis and visualization tools
The effects of systematic error are minimized in normalized data sets, and different kinds of hypotheses can be tested to pinpoint the biological implications of the results. UPSC-BASE has several plug-ins for hypothesis testing and visualization. For pairwise comparisons, methods based on different types of t-test can be applied. However, for experiments with three or more samples – or multiple factors – more advanced methods are needed to avoid the need to perform multiple pairwise comparisons using all the information in the multivariate experimental design. In UPSC-BASE, there are two choices: either anova (Kerr et al., 2000) or analysis by linear models (Smyth, 2004).
Overview plots are helpful to obtain unbiased indications of general trends in the results, but the final goal is biological interpretation. Microarrays can be used to extract a few candidate genes for further studies, but can also be used as a first screening tool to elucidate biological themes (Hosack et al., 2003) or regulatory gene networks (Banerjee and Zhang, 2002). The gene ontology classification (Ashburner et al., 2000), TAIR (Huala et al., 2001), Munich Information Center for Protein Sequences (Mewes et al., 1999; Schoof et al., 2002), and Kyoto Encyclopaedia of Genes and Genomes (Kanehisa and Goto, 2000) classification schedules are very useful resources for plant biology. Gene lists generated in earlier analysis steps can be used to look for over-represented categories. The results can then be visualized as dendrograms or graphs. In Figure 4, over-represented gene ontology cellular component categories are highlighted, based on the list of upregulated genes in the sample collected on 27 May. This represents an early leaf-development stage and was compared against the overall expression profiles from young to mature leaves. The directed acyclic graph structure makes it easy to follow affected processes from general to specific categories. In our demonstration data set of 27 May, over-representation was found, for example, for the tubulin, ribosome and nucleosome cellular component categories.
The ‘digital Northern’ tool can compare gene expression in different tissues/treatments, based on the EST data from 19 tissues/treatments (for a detailed description of the source material for the libraries see Sterky et al., 2004), mapped onto the different Populus gene models (B. Segerman, personal communication). For example, the same subset of genes as shown in Figure 4 can be compared with the digital expression profiles of 19 libraries (Figure 5), demonstrating that the leaf transcriptome on that particular day during early leaf development is most similar to the transcriptome of the apical shoot meristem, and most dissimilar to the transcriptome of senescencing leaves. This tool can also be used, with some limitations, to provide a rough verification of microarray results, if Northern blotting or real-time RT-PCR is not going to be performed.
The ‘chromosome viewer’ displays the chromosomal locations of a gene list, based on EST mapping (B. Segerman, personal communication) onto the Populus genome sequence (Tuskan et al., 2006). This is particularly useful for analysing features such as the co-localization of differentially expressed genes or quantitative trait loci (Kennedy and Wilson, 2004; Wu and Stettler, 1994).
Integration with external software
Although many different tools are included in the database, compatibility with other analytical tools is an important feature. Several export functions were implemented in the original base installation to facilitate advanced analysis in external software. In addition to those, we have added a general function for exporting data in channel-wise intensities or MA values (Dudoit and Yang, 2002). We have also simplified data export from UPSC-BASE to the commercial software genespring (Silicon Genetics, Redwood City, CA, USA) and used the r-genespring package to make a plug-in for this purpose, whereby all information about the biological samples can be transferred to genespring as parameters.
Other useful microarray data visualization software packages for the plant research community are MapMan (Thimm et al., 2004; Usadel et al., 2005) and AraCyc (Mueller et al., 2003). The MapMan/AraCyc export plug-in extracts the microarray data in a format that can be directly imported into these packages. In Figure 6 an overview of the metabolic pathways is presented, visualizing gene-regulation patterns in early leaf development (27 May). Array elements showing positive B-statistics and at least twofold differences were exported to the MapMan software package. As visualized, genes involved in photosynthetic light reactions and in the Calvin cycle were downregulated, while those involved in cell wall degradation and mitochondrial electron transport/ATP synthesis were upregulated.
It is often desirable to obtain alternative confirmation of the biological interpretation of the results from microarray studies. Most commonly, real-time RT-PCR is used for gene-wise confirmation. To simplify the primer design procedure, a fasta (Pearson and Lipman, 1988) export sends the sequences of the elements in a gene list as a multi-FASTA file. The exported file can be used directly in various external primer design software packages.
To assess the robustness of the microarray analysis pipeline, we compared the data obtained from the ‘original’ analysis of the leaf development samples with data generated by two individuals using a different experimental design. Most importantly, in the experimental loop for the demonstration data set, we included a triplicate of one sample in the hybridization design. These three samples were virtually indistinguishable from each other (data not shown). When the original data set was compared with the demonstration data set using principal components analysis (Wold et al., 1987), it was clear that the results were very consistent. The total pattern of gene expression from one sample (one date) in the two experiments was always almost identical (Figure 7), demonstrating the quality of the microarray and analysis. Thus variations introduced by differences in experimental design, sample handling and hybridization appeared to be much smaller than the genuine differences found between samples. Therefore we conclude that data generated using the procedure described here generally have reasonable levels of confidence. Data displayed in the public domain of UPSC-BASE provide reliable estimates of expression levels of specific genes.
UPSC-BASE – a public resource for Populus genomics
Data on all experiments performed using our Populus DNA microarrays are stored in UPSC-BASE. The database can be accessed via the web, and has an anonymous (public) login option. The analysis is performed by individual researchers, and at this stage data can be shared with others in the same working unit, but after publication of the results, or after an appropriate time lag, data are transferred to the public domain of the database. In the public domain, anonymous users can access data on several of the performed experiments (currently 21) and 407 microarray hybridizations (Table 2). Also, an additional number of unpublished experiments (currently 10) and 347 microarray hybridizations have data searchable under public access.
|UMA no.||Experiment description||Slides|
|1a||Seasonal variation in gene expression – whole season||37|
|2||Popface experiment; changes in gene expression in elevated CO2||22|
|3||Virus infection of Populus tremula||12|
|5||Analysis of secondary cell wall genes during tension wood formation||8|
|6a||Fungal infection of P. tremula||35|
|7||Global profile of wood-forming tissues||20|
|9||Popyomics EU programme; drought stress||55|
|10||Assessment of impact of elevated CO2 of biomass production in cottonwood||18|
|11||Impact of constitutive expression of CCAAT-binding factor (CBF) on frost tolerance of hybrid aspen||32|
|12||Finding genes involved in regulation of fibre cell death in hybrid aspen wood||5|
|13||Optimization of experimental design in microarray analysis||28|
|17a||Comparative analysis of cambial and bud dormancy||80|
|20a||Transcript profiling of the apical region during primordia/leaf development||33|
|21||Meristem identity in the cambial zone||49|
|25a||Effects of ozone on oxidative stress responses and respiratory processes in poplar leaves||3|
|28a||Analysis of auxin-responsive gene expression during annual cambial cycle||40|
|30a||Global tissue profiling||47|
|31||Transcriptomics of poplar in response to poplar mosaic virus||18|
|32a||Seasonal variation in gene expression – spring 2000 and 2002||26|
|35||Popface experiment; changes in gene expression in elevated CO2 (continuation)||8|
|36||Popyomics EU programme; UK drought-stress extremes||24|
|38||Assessing the effect of altered carbohydrate supply||14|
|42||Dynamics of leaf growth||20|
|43a||Resistance of Salix viminalis to the gall midge Dasineura marginemtorquens||26|
|44||Active versus dormant cambium||6|
|45a||Adventitious root formation||20|
|48||A transcriptional roadmap to wood formation||18|
|49b||Changes in gene expression in the wood-forming tissue of transgenic hybrid aspen||2|
|50||A transcriptional timetable of autumn senescence||16|
|81||Expression analysis of genes encoding putative cellulose synthases in hybrid aspen||8|
In combination with the JGI genome browser (http://genome.jgi-psf.org/Poptr1) and PopulusDB, UPSC-BASE allows Populus researchers not only conveniently to access the Populus genome, but also to obtain expression characteristics for a considerable fraction of the genes. UPSC-BASE is being continuously developed and improved with novel analysis tools, and is rapidly growing as more experiments are transferred into the public domain, accessible for external users. By streamlining the analysis procedure, we are trying to provide the community with a resource containing data generated in a standardized way, which should simplify comparisons of the data set. We must, however, point out that many of the experiments included in the public domain of UPSC-BASE were performed before this analysis pipeline was established. Details on these experiments are stored in the database, but the procedures that were used could differ from those described here. As the individual experiments – like most other cDNA array experiments – typically use different designs and reference samples, direct between-experiment comparisons are not always possible. A considerable weakness of the Populus model system is the lack of a standardized vocabulary to describe different tissues. This is a problem that the Populus community has to solve if the value and user-friendliness of the microarray data are to be maximized. Despite these limitations, we believe the rapidly increasing number of experiments in UPSC-BASE will constitute a useful resource for both the Populus community, and perhaps for the plant science community in general.
We have described the pipeline for DNA microarray analysis, developed for our Populus microarrays. Most of the tools are generic and can also be used for other microarrays. For example, we use them to analyse Arabidopsis CATMA microarrays (Hilson et al., 2004). These efforts have provided a large set of protocols and generic plug-ins for the microarray community using the base system. With minor modifications, the plug-ins can be used in all kinds of base installations, regardless of the organism and microarray system concerned. We believe, however, that the most significant value of this contribution is the description of the publicly accessible database, which will increase the attraction of Populus as a model system for molecular biology, genetics and genomics.
This work was supported by the Knut and Alice Wallenberg Foundation, Swedish Foundation for Strategic Research, the Swedish Research Council, Kempestiftelserna and the European Commission through the Directorate General Research within the Fifth Framework for Research – Quality of Life and Management of the Living Resources Programme, contract no. QLK5-CT-2002-00953 (POPYOMICS).
- 2004) A transcriptional timetable of autumn senescence. Genome Biol. 5, R24. , , et al. (
- 2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29. , , et al. (
- 2004) Submission of microarray data to public repositories. PLoS Biol. 2, E317. , , et al. (
- 2002) Functional genomics as applied to mapping transcription regulatory networks. Curr. Opin. Microbiol. 5, 313–317. and (
- 2003) Gene expression in autumn leaves. Plant Physiol. 131, 430–442. , , et al. (
- 2001) Minimum information about a microarray experiment (MIAME) – toward standards for microarray data. Nat. Genet. 29, 365–371. , , et al. (
- 2003) ArrayExpress – a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71. , , et al. (
- 2005) MASQOT: a method for cDNA microarray spot quality control. BMC Bioinformatics, 6, 250. , , , , , and (
- 1993) A simple and efficient method for isolating RNA from pine trees. Plant Mol. Biol. Rep. 11, 113–116. , and (
- 2002) Fundamentals of experimental design for cDNA microarrays. Nat. Genet. 32, 490–495. (
- 2004) NASCArrays: a repository for microarray data generated by NASC's transcriptomics service. Nucleic Acids Res. 32, D575–D577. , , , , and (
- 1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19, 11–15. and (
- 2002) Bioconductor R packages for exploratory analysis and normalization of cDNA microarray data. In The Analysis of Gene Expression Data: Methods and Software (Parmigiani, G., Garett, E.S., Irizarry, R.A., Zeger, S.L. eds). New York: Springer, pp. 73–101. and (
- 2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210. , and (
- 1999) Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res. 9, 950–959. , , , , and (
- 2004a) Model selection and efficiency testing for normalization of cDNA microarray data. Genome Biol. 5, R60. and (
- 2004b) OLIN: optimized normalization, visualization and quality testing of two-channel microarray data. Bioinformatics, 21, 1724–1726. and (
- 2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80. , , et al. (
- 2003) Induced gene expression in human brain after the split from chimpanzee. Trends Genet. 19, 63–65. and (
- 2001) A transcriptional roadmap to wood formation. Proc. Natl Acad. Sci. USA, 98, 14732–14737. , , et al. (
- 2004) Versatile gene-specific sequence tags for Arabidopsis functional genomics: transcript profiling and reverse genetics applications. Genome Res. 14, 2176–2189. , , et al. (
- 2003) Identifying biological themes within lists of genes with EASE. Genome Biol. 4, R70. , , , and (
- 2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 29, 102–105. , , et al. (
- 1996) R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314. and (
- 2003) Changes in gene expression in the wood-forming tissue of transgenic hybrid aspen with increased secondary growth. Plant Mol. Biol. 52, 893–903. , , , , and (
- 2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30. and (
- 2004) Plant functional genomics: opportunities in microarray databases and data mining. Funct. Plant Biol. 31, 295–314. and (
- 2001) Statistical design and the analysis of gene expression microarray data. Genet. Res. 77, 123–128. and (
- 2000) Analysis of variance for gene expression microarray data. J. Comput. Biol. 7, 819–837. , and (
- 2003) The poplar root transcriptome: analysis of 7000 expressed sequence tags. FEBS Lett. 542, 37–41. , , , and (
- 2002) Ranking: a closer look on globalisation methods for normalisation of gene expression arrays. Nucleic Acids Res. 30, e50. and (
- 2004) Poplar genes encoding fasciclin-like arabinogalactan proteins are highly expressed in tension wood. New Phytol. 164, 107–121. , , , , , and (
- 1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680. , , et al. (
- 2002) Replicated microarray data. Stat. Sin. 12, 31–46. and (
- 1999) MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 27, 44–48. , , , , , and (
- 2005) A genomic approach to investigate developmental cell death in woody tissues of Populus trees. Genome Biol. 6, R34. , , , , , , and (
- 2003) AraCyc: a biochemical pathway database for Arabidopsis. Plant Physiol. 132, 453–460. , and (
- 2004) Characterization of full-length enriched expressed sequence tags of stress-treated poplar leaves. Plant Cell Physiol. 45, 1738–1748. , , , , and (
- 1988) Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 85, 2444–2448. and (
- 2004) Navigating public microarray databases. Comp. Funct. Genomics 5, 471–479. and (
- 2004) Identification and analysis of safener-inducible expressed sequence tags in Populus using a cDNA microarray. Planta, 220, 296–306. , , , and (
- 2006) Evaluation of microarray data normalization procedures using spike-in experiments. BMC Bioinformatics, 7, 300. , , , , , and (
- 2002) BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol. 3, SOFTWARE0003. , , , , and (
- 1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270, 467–470. , , and (
- 2005) A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37, 501–506. , , , , , , , and (
- 2002) MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome. Nucleic Acids Res. 30, 91–93. , , , , , , , and (
- 2004a) Cambial meristem dormancy in trees involves extensive remodelling of the transcriptome. Plant J. 40, 173–187. , , , , , and (
- 2004b) A high-resolution transcript profile across the wood-forming meristem of poplar identifies potential regulators of cambial stem cell identity. Plant Cell, 16, 2278–2292. , , , , , and (
- 2004) The response of the poplar transcriptome to wounding and subsequent infection by a viral pathogen. New Phytol. 164, 123–136. , , and (
- 2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, No 1, Article 3. (
- 1998) Gene discovery in the wood-forming tissues of poplar: analysis of 5692 expressed sequence tags. Proc. Natl Acad. Sci. USA, 95, 13330–13335. , , et al. (
- 2004) A Populus EST resource for plant functional genomics. Proc. Natl Acad. Sci. USA, 101, 13951–13956. , , et al. (
- 2002) Microarray databases: standards and ontologies. Nat. Genet. 32, S469–S473. , and (
- 2005) A robust neural networks approach for spatial and intensity dependent normalization of cDNA microarray data. Bioinformatics, 21, 2674–2683. , and (
- 2005) The transcriptome of Populus in elevated CO2. New Phytol. 167, 143–154. , , , , , , , and (
- 2004) MapMan: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J. 37, 914–939. , , , , , , , , and (
- 2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604. , , et al. (
- 2005) Extension of the visualization tool MapMan to allow statistical analysis of arrays, display of corresponding genes, and comparison with known responses. Plant Physiol. 138, 1195–1204. , , et al. (
- 2005) An experimental evaluation of a loop versus a reference design for two-channel microarrays. Bioinformatics, 21, 492–501. , , et al. (
- 2002) Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinformatics, 3, 36. , , , and (
- 2003) New normalization methods for cDNA microarray data. Bioinformatics, 19, 1325–1332. , , and (
- 2004) Statistics for Microarrays: Design, Analysis and Inference. Chichester: Wiley. and (
- 2005) Near-optimal designs for dual-channel microarray studies. Appl. Stat. 54, 817–830. , and (
- 1987) Principal component analysis. Chemom. Intell. Lab. Syst. 2, 37–52. , and (
- 1994) Quantitative genetics of growth and development in Populus.1. A three-generation comparison of tree architecture during the first 2 years of growth. Theor. Appl. Genet. 89, 1046–1054. and (
- 2002) Design issues for cDNA microarray experiments. Nat. Rev. Genet. 3, 579–588. and (
- 2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30, e15. , , , , , and (
- 2004) GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol. 136, 2621–2632. , , and (