Advances in forest tree genomics


Forest Trees Workshop, Plant and Animal Genome XIII Conference, San Diego, CA, USA, January 2005
What's up in forest tree genomics?

Genomics can be defined as the development and application of genome-wide experimental approaches to assess gene structure and function, which include DNA sequencing, gene mapping and gene expression profiling. Forestry entered the genomic era in the early 1990s, with the introduction of forward (genetic mapping for quantitative trait loci (QTL) detection, reviewed by Cervera et al., 2000; Sewel & Neale, 2000) and reverse (knockout and overexpression; MacKay et al., 2004) genetics approaches. In the late 1990s, the establishment of functional genomics tools in many forest tree research laboratories has enabled the simultaneous analysis of thousands of transcripts or proteins, providing an opportunity to understand protein function, gene regulation and eventually how these long-lived organisms are assembled. In 2004, this discipline reached an unprecedented level with the public release of the complete genome sequence of Populus trichocarpa (; Tuskan et al., 2004). The Forest Trees Workshop, begun in 1993 and organized within the Plant and Animal Genome meeting, has been an excellent forum for reviewing the current state of knowledge and discussing new directions of research in this community. In January 2002, this workshop also became the annual meeting of the Forest Genomics working party of the International Union of Forestry Research Organizations (IUFRO; and is complemented by the biannual Tree Biotech meeting ( which focuses on the molecular biology, genetics and biotechnology of trees as well as more basic aspects of growth, development and environmental and biological forest tree interactions. Here we focus on the 2005 Forest Trees workshop, where many advances were reported and discussed, covering aspects of tree genomes, environment–genome interactions, genomics of wood formation, molecular ecology and leveraging model systems.

‘The need for improved EST resources is also argued by the evidently large number (approximately 75%) of genes predicted from the poplar genome sequence that are not supported by EST sequence information’

Which genes really matter for forest trees adaptation?

Growth, development and survivorship of long-lived organisms such as forest trees are continuously challenged by biotic and abiotic stresses. Accelerated climatic changes and the biotic environmental challenges resulting (e.g. the rapid colonization of previously hostile niches by pests and pathogens) suggest that if these long-lived, late-reproducing, immobile organisms are to survive in their current physical distributions, they will need to respond via both phenotypic plasticity and selection-driven changes on allele frequencies. An immediate question, now partly testable, is whether the present structures and levels of genetic diversity in existing forest tree species/populations is sufficient to allow adaptation to these future conditions. Addressing this question will help underpin actions to preserve adaptability and possibly avoid major losses, especially in managed forests. Quantitative transcriptomics and proteomics allow remarkable variation in gene/protein expression to be discovered in many forest tree species, thus making available ‘expressional candidate genes’ for many traits of ecological interest (e.g. cold and drought tolerance, disease resistance and phenology). Gene activation tagging and reverse genetics approaches also allow exploration of gene expression–phenotype correlation (but stop well short of demonstrating causation). The second question, concerning whether observed expressional/physiological variation matters (i.e. whether it affects the function and fitness of organisms in natural populations), is the next frontier. Here we report on advances in forest tree genomics and especially on how functional genomics combined with association genetics promises major new insights into forest tree adaptation.

Functional genomics in forest trees

Systematic and genome-wide approaches are now being applied for gene discovery and analyses of gene function by using complementary methods. Work in hardwood trees (angiosperms) including Populus and Eucalyptus is now poised to benefit directly from the entirely sequenced, assembled and annotated Populus genome sequence. On the other hand, work on the very distantly related commercial softwoods (gymnosperms) must rely much more heavily on expressed sequence tag (EST) sequence information and comparative genomic approaches, because the whole genome sequence is not expected to be available in the foreseeable future.

From EST sequencing to gene expression profiling

The large-scale sequencing and analysis of ESTs remains a fundamental part of genomics research in most forest tree species. Results were presented for loblolly pine (J. Dean, University of Georgia at Athens, USA), maritime pine (J. Paiva, IBET/INRA, France) and white spruce (J. MacKay, Laval University, Canada). Whereas many EST sequencing projects carried out in forest trees focus on wood formation and secondary xylem, more and more projects have involved a broader diversity of tissues and a growing interest in the response to abiotic stresses (including drought stress and frost tolerance). In the loblolly pine ADEPT project (J. Dean;, cDNA libraries were made from seedlings during drought stress and the recovery from drought. For each condition, separate libraries were made from three different genotypes and the comparative analysis of their ESTs identified several putative drought-responsive genes. Furthermore, the transcript profiles of one of the genotypes strongly suggested that it responded more strongly to the stress. If further analysis shows that this genotype displays a greater tolerance to drought, its differentially expressed genes may represent expressional candidate genes for drought response or tolerance. Gene discovery projects are also sequencing cDNAs from both 5′ and 3′ ends as means to augment the sequence information and improve the annotation of genes, while providing sequences that offer highly robust assemblies. The need for more complete information has been highlighted by the recent analyses of pine genes, which show that the majority of contigged sequences which have no sequence similarity to other genomes are indeed very short and that a large majority of sequences above 1 kb in length give strong matches to Arabidopsis in particular (Kirst et al., 2003; Pavy et al., 2005). The need for improved EST resources is also argued by the evidently large number (approximately 75%) of genes predicted from the poplar genome sequence that are not supported by EST sequence information (S. DiFazio, Oak Ridge National Laboratory, USA).

Gene sequences identified through large-scale EST sequencing, using targeted cDNA discovery methods such as suppressive subtractive hybridization (SSH), or based upon the poplar genome sequence, are now being used in several laboratories to characterize gene families and develop microarrays for comprehensive gene expression profiling. Several groups are focusing on transcription factors, which are thought to play regulatory roles in primary and secondary xylem formation in conifers, poplar and Arabidopsis. Systematic prospecting of pine xylem ESTs has identified several sequences from diverse families of well-characterized plant transcription factors, including AP2, MYBs, HDzip, KNOX, LIM-domain, MADS and others (S. Rui, North Carolina State University, USA). Analysis of their transcript abundance by quantitative polymerase chain reaction (PCR) indicated that many of the sequences are somewhat ubiquitous and a minority are preferentially expressed in secondary xylem tissues. A detailed analysis of the KNOX-1 family in conifers revealed that gene evolution and the resulting family structure is clearly distinct from that of angiosperms (Guillet-Claude et al., 2004). Conifers KNOX-1 genes form a single rapidly evolving cluster and are found in only one of the three large clades observed in Angiosperms. The report suggests that extrapolation of the biological role of genes in this family based upon sequence similarities alone may be inadequate, and it argues in favor of reverse genetic studies using gain-of-function and loss-of-function approaches. Unfortunately, there have been few reverse genetic experiments owing to the lack of simple transformation and regeneration methods. During the meeting it was reported that several transgenic spruce lines misexpressing conifer transcription factors are now being analyzed to help to delineate more clearly the function and biological role of genes belonging to multimember families (J. MacKay).

The discovery of a large number of micro-RNAs from poplar xylem was also reported, and the potential implications of this important class of regulatory molecules in trees were discussed (V. Chiang, North Carolina State University, USA). Although many studies in Arabidopsis have linked micro-RNAs to the regulation of plant development through their action on transcripts of homeotic transcription factor, it was reported that the putative targets of nearly 50% of the poplar micro-RNAs isolated in this study are structural proteins and enzymes. Transcripts encoding cellulose synthase, mananne biosynthesis enzymes, and enzymes implicated in cell wall phenylpropanoid and flavonoid biosynthesis are thought to represent novel targets for micro-RNAs. This intriguing finding may suggest a specialization of micro-RNAs in secondary xylem of trees.

Wood formation in forest trees is characterized by its remarkable phenotypic plasticity, especially in conifers where the morphology and properties of xylem cells change significantly across the growth season, with the age of the cambial meristem, and in response to environmental cues. Macroarray analyses were used in maritime pine to uncover differential gene expression related to the shift between early wood and late wood (formed within a single growing season), juvenile and mature wood (formed at different developmental stages), and in compression wood (formed in leaning or bent trees; J. Paiva). Microarrays developed with cDNAs isolated from subtractive libraries are being used to investigate the acquisition of frost tolerance in scots pine (Pinus sylvestris) and European beech (Fagus sylvatica; P. Balk, ATO, the Netherlands). Frost tolerance and cold resistance are phenomena that vary widely across plant species. This study has identified several gene transcripts related to osmotic stress in particular, which appear to show similar responses in the angiosperm and gymnosperm trees during the acquisition of frost tolerance. Gene sequences (including dehydrins, ABA responsive genes and PR proteins) were up-regulated, whereas tubulin, certain membrane-intrinsic proteins and expansins (among others) were down-regulated. Finally, a major outcome of the poplar genome sequence is the development of oligonucleotide arrays (Y. Sun, North Carolina State University, USA). The high level of specificity of oligo arrays is being used to delineate systematically between members of multigenic families of transcription factors and cell wall associated proteins, in order to determine which family members are specifically expressed in secondary vascular tissues.


Proteomics methods have yet to gain widespread application to forest trees. Protein profiling using one- and two-dimensional gel electrophoresis has been applied in several studies in the last decade (e.g. Plomion et al., 2000). However, few studies have reported advances based upon recent technological developments enabling the identification of proteins at higher throughputs, using much smaller quantities of protein and at much lower cost. D. Lippert (University of British Columbia, Canada) presented a study in which protein profiles of somatic embryos of white spruce were analyzed over a developmental sequence using two-dimensional gel electrophoresis and proteins identification by liquid chromatography - mass spectrometry (LC-MS) analyses (Lippert et al., 2005). Similar methods are now being applied to characterize protein profiles in response to herbivory by insects and are helping to differentiate between the wounding response and the response to weevil infestation in spruce. The rate of identification of proteins in both studies was 60–70%. It was shown that the identification of a majority of proteins was made possible with information derived from EST sequence data. Identical results were recently reported in pine (Gion et al., 2005). These findings argue in favor of developing proteomic studies in conjunction with EST sequencing and transcript profiling research.


The growing need to process, analyze and integrate data in functional genomics is leading to the development of more and more advanced and accessible bioinformatic tools. The flow of data and information between different genomic technologies is at the very heart of several experiments. For example, processed EST sequences must be analyzed to identify and annotate unigene sets, used to design and manufacture microarrays, and for the discovery of single nucleotide polymorphisms (SNPs) (Le Dantec et al., 2004). The magic database package (L. Pratt, University of Georgia at Athens, USA) was designed to accommodate this central need for integration. It is a portable package that was designed for use by biologists with minimal informatic expertise ( The database is currently used in conjunction with EST sequencing and microarray manufacture in loblolly pine (J. Dean); it also supports microarray databasing.

Beyond candidate genes

By bringing together ecologists, molecular biologists and population geneticists, a new research area is being developed in the forestry community to identify the genes responsible of forest tree adaptation. The approach seeks to identify functionally important genes from the study of nucleotide diversity patterns and to test these nucleotide polymorphisms for associations with the phenotypic variation in adaptive traits (Neale & Savolainen, 2004). Once a set of candidate (structural or regulatory) genes is determined, mutations of adaptive significance (i.e. that natural selection has favored) can then be identified, based on the within and between species nucleotide diversity pattern analysis (Kreitman, 2000). To determine whether or not observed variation has adaptive significance, scientists are describing the following.

  • The number (SNPs and insertion/deletion events (INDELs)), nature (silent vs nonsynonymous) and genomic location (coding vs noncoding) of nucleotide polymorphisms (Brown et al., 2004; Pot et al., 2005).
  • • The level and the structure of diversity (at the nucleotide and haplotype level). Diversity is one of the key elements to preserve the adaptive potential and the capacity of organisms to adapt to new environmental conditions. It is therefore of main importance to quantify the level of diversity in genes putatively involved in forest tree adaptation. The structure of diversity (i.e. how the genetic diversity is distributed within and among populations) is also an important feature to be considered.
  • • The extent of linkage disequilibrium (LD), i.e. the tendency of alleles to be inherited together, is also an important parameter, both for understanding the genealogical history of populations, and for predicting the efficiency of association studies in natural populations (Goldstein & Weale, 2001).
  • • The extent to which nuclear diversity patterns are the result of selection. There are a number of methods used to search for ‘molecular signatures’ of natural selection (Ford, 2002). They generally use neutrality as the null hypothesis, and they are based on site/allele frequency distributions or ratios of synonymous vs nonsynonymous polymorphisms within and between species.

The validation of putatively important SNPs (or haplotypes) is carried out in a third step, wherein nucleotide polymorphisms are tested for association with the phenotypic variation of adaptive traits in natural populations (Cardon & Bell, 2001). Given the low LD window frequently observed in these largely outbred and highly polymorphic species (in poplar: Ingvarsson, 2005; in pine: Brown et al., 2004; in spruce: presented by E. De Paoli, University of Udine, Italy), it is suggested that this association strategy should lead to the identification of functional variation to a scale that could not be reached by classical QTL mapping experiments.