De novo assembly of maritime pine transcriptome: implications for forest breeding and biotechnology
Summary
Maritime pine (P inus pinaster A it.) is a widely distributed conifer species in S outhwestern E urope and one of the most advanced models for conifer research. In the current work, comprehensive characterization of the maritime pine transcriptome was performed using a combination of two different next‐generation sequencing platforms, 454 and I llumina. De novo assembly of the transcriptome provided a catalogue of 26 020 unique transcripts in maritime pine trees and a collection of 9641 full‐length cDNA s. Quality of the transcriptome assembly was validated by RT ‐PCR amplification of selected transcripts for structural and regulatory genes. Transcription factors and enzyme‐encoding transcripts were annotated. Furthermore, the available sequencing data permitted the identification of polymorphisms and the establishment of robust single nucleotide polymorphism (SNP ) and simple‐sequence repeat (SSR ) databases for genotyping applications and integration of translational genomics in maritime pine breeding programmes. All our data are freely available at S ustainpineDB , the P . pinaster expressional database. Results reported here on the maritime pine transcriptome represent a valuable resource for future basic and applied studies on this ecological and economically important pine species.
Introduction
Forests are essential components of the ecosystems covering approximately one‐third of the Earth's land area and playing a fundamental role in the regulation of terrestrial carbon sinks. Trees represent nearly 80% of the plant biomass (Olson et al ., 1983) and 50%–60% of annual net primary production in terrestrial ecosystems (Field et al ., 1998).
Conifers are the most important group of gymnosperms. Having diverged from a common ancestor more than 300 million years ago (Bowe et al ., 2000), gymnosperms and angiosperms have evolved very efficient and distinct physiological adaptations (Leitch and Leitch, 2012). Coniferous forests dominate large ecosystems in the Northern Hemisphere and include a broad variety of woody plant species, some of which are the largest, tallest and longest living organisms on Earth (Farjon, 2010). Conifer trees are also of great economic importance, as they are the primary source for timber and paper production worldwide. Total timber production in the European Union in 2011 was 427 million m3 (UNECE, 2013). Approximately 22% was used to produce energy, while the rest was used to supply industrial demands. A study of United Nations Economic Commission for Europe/Food and Agriculture Organization (UNECE/FAO) point out that the future needs in forest biomass to meet the demands of industrial wood as an energy source will exceed production by 2020. The development of a more productive and sustainable forest plantation is essential to meet the increasing demand of wood worldwide together with minimizing environmental impacts (e.g. decreasing pressure on natural forests).
The extant conifers comprise 615 species classified into eight families within the division Pinophyta (Farjon, 2010). Some of the most important conifer trees such as pines, spruces and firs are included in the family Pinaceae . The genus Pinus comprises the largest number of diversified species (113). Maritime pine (Pinus pinaster Aiton) is a broadly planted species (4.2 million hectares) in the southwestern part of the Mediterranean Basin, especially along the Atlantic coast in France, Spain and Portugal where it is the dominant species on more than 2.3 million hectares (Sanz et al ., 2006). The maritime pine is particularly tolerant to abiotic stresses showing relatively high levels of intraspecific variability (Aranda et al ., 2010). The maritime pine is also one of the most genetically studied conifer species for genomic research in Europe (Mackay et al ., 2012; Neale and Kremer, 2011), and a large number of genomic resources and phenotypic data have been generated in the last few years and are available for the conifer research community (http://www.scbi.uma.es/sustainpine; https://w3.pierroton.inra.fr/PinusPortal). Furthermore, knowledge gained from studying this conifer species will potentially help to better understand gene function and diversity in closely related, economically significant species but also in other noneconomic but environmentally important gymnosperm species (Neale and Kremer, 2011).
Until recently, advances in the genomics of conifers were hampered by the large size of their genomes ranging from 20 to 40 Gb, which is more than 200‐fold the Arabidopsis genome and roughly sevenfold the human genome (Mackay et al ., 2012; Ritland, 2012). As the conifer genome is extremely large, major research efforts were concentrated on transcriptomic analysis. The first large‐scale EST (Expressed Sequence Tag) projects based on Sanger sequencing revealed that transcriptomes of pines and spruces are highly diverse and complex (Allona et al ., 1998; Cairney et al ., 2006; Li et al ., 2009; Pavy et al ., 2005; Ralph et al ., 2008).
The emergence of the next‐generation sequencing (NGS) has profoundly transformed the landscape of genome analysis. Research efforts in conifers have provided growing catalogues of ESTs for several species of economic and ecological importance, including Picea sitchensis (Ralph et al ., 2008), Picea glauca (Rigault et al ., 2011), P. pinaster (Fernández‐Pozo et al ., 2011) and Picea abies (Chen et al ., 2012). Recent landmark studies in conifer genomics reported the draft assembly of the first genome for a gymnosperm species. Specifically, massive parallel sequencing data were used for assembling the 20 Gb genomes of white (Birol et al ., 2013) and Norway (Nystedt et al ., 2013) spruces.
A full catalogue of genes and protein coding sequences in maritime pine is necessary to improve our understanding of biological functions and evolutionary relationships with other conifer trees and angiosperms. Comparative genomics provide a powerful mean to study gene structure and the evolution of gene function and regulation. Analysis of key genes and pathways allows scientists to better understand how complex biological processes are regulated and evolve (Koonin et al ., 2004; Soltis and Soltis, 2003).
In this study, we present results from deep sequencing of the P. pinaster transcriptome. The data were used to create a reference transcriptome in this pine species, containing an extensive set of genes expressed in a variety of actively growing tissues/organs (http://www.scbi.uma.es/sustainpinedb/sessions/new). Comparison to available public databases indicates that the transcriptome of P. pinaster is similar in size to that of previously characterized P. glauca (Rigault et al ., 2011) and P. abies (Nystedt et al ., 2013).
Results
Sequencing strategy and de novo assembly
Massive scale cDNA sequencing was performed to define a reference transcriptome for maritime pine. Long reads obtained by the 454 sequencing platform and short reads obtained by the Illumina platform were used for assembly of 210 513 putative unigenes from 18 cDNA libraries constructed from a variety of tree sources, tissues and experimental conditions (Table 1). Figure 1 shows a schematic representation of the workflow followed for preprocessing and assembling, as well as the combination of software algorithms utilized to obtain a final catalogue of unigenes. A total of 6 381 011 long reads provided 4 098 257 useful long reads (64.2%, mean length: 282 bp), and 591 174 069 short reads provided 292 596 546 useful short reads (49.5%, mean length: 86 bp). As shown in Figure 2, the recovery of useful 454 long reads was variable among the different libraries (37%–92%) with high recoveries (88%–92%) in EuroPineDB and UAGPF. Overall, recovery was around 70% of the initial reads. Regarding short reads, only 32% of single‐end short reads were kept after data cleaning. The primary reason of out‐filtering these single‐end sequences was redundancy.
| Gene library | Sequencing platform | Sampled plant material | Experimental conditions | SRA code |
|---|---|---|---|---|
| EuroPineDB | Sanger/454 | Bud, xylem, phloem, stem, needles, roots, stem, embryos, callus, cone, male and female strobili | ESTs and SSH libraries from different tissues and conditions as described by Fernández‐Pozo et al., 2011 | SRS479769 |
| Biogeco1 | 454 | Xylem, bud and needle | ESTs from differentiating xylem, swelling bud and young needles | SRX032960, SRX032961, SRX032962, SRX032963 |
| Biogeco2 | 454 | Bud | EST from quiescent buds harvested on 2‐year‐old maritime pine (low growing family) in well‐watered or drought‐stress conditions | SRX031546 |
| Biogeco3 | 454 | Bud | EST from quiescent buds harvested on 2‐year‐old maritime pine (fast growing family) in well‐watered or drought‐stress conditions | SRX031589 |
| UAGPF1 | 454 | Embryome | ESTs from developing, immature embryos (1‐week maturation) | SRX022618 |
| INIA_PPIN | 454 | Bud | ESTs from buds | PRJNA221139 |
| U_root | 454 | Root | ESTs from roots (1‐month‐old seedlings) | SRS480239 |
| U_tip | 454 | Root tips | ESTs from root tips (1‐month‐old seedlings) | SRS480265 |
| U_H | 454 | Hypocotyl | ESTS from hypocotyl (1‐month‐old seedlings) | SRS480236 |
| U_N | 454 | Needle | ESTs from needles (1‐month‐old seedlings) | SRS480237 |
| U_Cot_Os | 454 | Cotyledon | ESTs from cotyledons grown under dark conditions | SRS479771 |
| U_H_Os | 454 | Hypocotyl | ESTs from hypocotyl grown under dark conditions | SRS480236 |
| U_R_6 | 454 | Roots | ESTs from roots (6‐month‐old seedlings) | SRS480238 |
| U_S_8 | 454 | Stem | ESTs from stem (8‐month‐old seedlings) | SRS480261 |
| UAGPF2 | Illumina | Somatic embryo | Paired‐end ESTs from developing, immature embryos (1 week maturation) | SRR609713 |
| BIOGECO4 | Illumina | Bud | ESTs from young and aged buds | SRX031587 |
| BIOGECO5 | Illumina | Root | ESTs from drought‐stressed and control roots in hydropony | SRX031592, SRX031590 |
| BIOGECO6 | Illumina | Bud | ESTs from young and aged buds | SRX031594 |
| IBET | Illumina | Zygotic embryo | Paired‐end ESTs from embryos | SRS481044 |


The assembly strategy used a combination of two completely different algorithms with the hypothesis that it may provide superior results than each algorithm by separate. MIRA3 and Euler‐SR were selected for long‐read assembly because the former is based on overlap‐layout‐consensus algorithm, while Euler‐SR is based on de Bruijn graphs resolved by means of an Eulerian path. Furthermore, each algorithm produced the better results with simulated data (H Benzekri and MG Claros, unpublished results). Regarding short reads, we have used de Bruijn‐based algorithms. ABySS was selected for two main reasons: (i) it can efficiently assemble using different k‐mers (Lin et al ., 2011) and (ii) it is able to assemble vertebrate‐sized genomes and transcriptomes (Li et al ., 2010). Finally, contigs separately obtained from short read and long read (355 483 in total) were reconciled using CAP3, resulting in 210 513 contigs to describe a P. pinaster transcriptome. Notably, 1241 unigenes were longer than 3000 nucleotides. Unigene length ranged from 40 to 7876 bp, with an average of 495 bp and a median of 361 bp.
The complete set of unigenes was compared using blast with the previous P. pinaster transcriptome available at EuroPineDB (Fernández‐Pozo et al ., 2011), and 128 294 unigenes (61%) were not described in this earlier database, while 935 EuroPineDB unigenes were not present in the current unigene collection. When the P. pinaster transcriptome was compared with the P. glauca and P. abies draft genomes, 87.5% and 99.3% of homology was found, respectively, confirming that most assembled unigenes were pine transcripts.
Annotation of unigenes
Unigene annotation was achieved by combining the results of several annotation processes. Each annotation is associated with an E ‐value to enable the empirical assessment of annotation quality. A preliminary analysis of the collection of unigenes using Full‐LengtherNext (FLN) revealed that, from the 181 100 unigenes annotated (46.6%), 26 020 were nonredundant transcripts based on orthologue ID. It is also remarkable that 18 667 full‐length (FL) unigenes were reconstructed with a mean length of 1495 nucleotides, representing 19% of the total annotated unigenes (Table 2). Of these, 9641 FL unigenes were different, unique genes (9.8% of the total annotated unigenes, and 37.0% of unique unigenes). The frequency distribution of FL unigenes (Figure 3) indicated a high proportion of unigenes ranging from 500 to 1500 nucleotides with the longest transcript being 7876 nucleotides.
| Absolute number | % | |
|---|---|---|
| Unigenes | 210 513 | 100.0 |
| Artefacts | 2010 | 0.95 |
| Unigenes after resolving artefacts | 209 928 | 99.7 |
| Unigenes > 200 nt | 181 100 | 86.0 |
| Unigenes > 500 nt | 52 550 | 25.0 |
| Unigenes > 3000 nt | 1241 | 0.59 |
| Unigenes with orthologueaa
Percentages for this classification are calculated using this file as 100.0%.
|
98 175 | 46.6 |
| Different orthologue ID | 26 020 | 26.5 |
| Complete transcripts (full length) | 18 667 | 19.0 |
| Different full‐length transcripts | 9641 | 9.8 |
| Putative ncRNAs | 176 | 0.08 |
| Unigenes without orthologueaa
Percentages for this classification are calculated using this file as 100.0%.
|
111 577 | 53.0 |
| Coding | 29 736 | 26.6 |
| Putative coding | 23 545 | 21.1 |
| Nonredundant coding | 9799 | 8.8 |
- Numbers in bold can be considered the representative amount of unigenes of the category.
- a Percentages for this classification are calculated using this file as 100.0%.

Preliminary analysis using FLN also revealed that 111 577 unigenes (53.0%) did not possess significant homology to any other plant gene. This number includes new conifer genes as well as artefactual assemblies. To distinguish between both possibilities, FLN includes a TestCode analysis (Fickett, 1982) and a comparison with the noncoding RNA database (http://www.mirbase.org). As a result, at least 9799 nonredundant coding unigenes can worth a consideration of putative new conifer gene. In fact, 4608 unigenes had a homologue EST in the Pine Gene Index 9.0 database (http://compbio.dfci.harvard.edu/cgi-bin/tgi/gimain.pl?gudb=pine). However, only 176 unigenes from this study were determined to be candidate noncoding RNAs. Therefore, the minimal P. pinaster transcriptome can be calculated as 26 020 unigenes having unique ID. In addition, it could also be considered the 9799 unigenes without homology but having coding characteristics, plus the 176 noncoding RNAs, that is 35 995 unigenes.
Because this transcriptome was deemed satisfactory, genes annotated as described in Experimental procedures (GO term, a definition or a KEGG code) were subjected to statistical analyses. Of total unigenes, 62.2% (130 845) were annotated, which indicated that the level of annotation was similar to previously published results (Fernández‐Pozo et al ., 2011). Furthermore, the distribution of GO terms at level 2 of biological process and at level 3 of molecular function (Figure S1) shows that the putative transcriptome covers most important cell functions. A total of 58 296 unigenes possessed unknown sequences that could not be found in existing databases. The annotated transcriptome can be browsed, downloaded and queried at http://www.scbi.uma.es/sustainpinedb/.
Validation of full‐length cDNA sequences in the transcriptome database
Full‐length cDNA (FLcDNAs) are essential for gene annotation, unambiguous determination of intron–exon boundaries and gene functional analysis. To examine the quality of the FLcDNA collection established in the maritime pine transcriptome database, the validation of a number of selected genes coding for proteins of a variety of sizes was advisable. Sequences encoding structural and regulatory genes were selected, appropriate primers based on available sequences were designed, and the corresponding FLcDNAs were amplified by RT‐PCR from RNA samples extracted from a variety of maritime pine tissues. Table 3 shows a summary of this work. All selected genes were successfully amplified from cDNA samples using the sequence information available in the database. PCR products fit the theoretical, predicted size in most studied unigenes (Canales et al ., 2012; Cánovas et al ., 2007; Rueda‐López et al ., 2013; Villalobos et al ., 2012).
| Gene name | Theoretical size of ORF from assembly (bp) | Experimental size of ORF (bp) | Accession numberaa
Accession number of unigene in Sustainpine and GenBank.
|
|---|---|---|---|
| Arginase | 1026 | 1026 | sp_v3.0_unigene23824 |
| Xyloglucan endotransglycosylase | 860 | 860 | sp_v3.0_unigene29476 |
| Phenylalanine ammonia‐lyase | 2265 | 2265 |
sp_v3.0_unigene17298 |
| PII‐protein | 714 | 732 |
sp_v3.0_unigene23578 |
| Asparagine synthetase 1 | 1782 | 1782 | sp_v3.0_unigene14147 HQ625490 |
| Asparagine synthetase 2 | 1770 | 1773 |
sp_v3.0_unigene18231 |
| Glutamate decarboxylase | 1530 | 1530 |
sp_v3.0_unigene11755 |
| Glutamate dehydrogenase | 1236 | 1236 |
sp_v3.0_unigene15901 |
| Sucrose synthase | 1914 | 1914 | sp_v3.0_unigene34880 |
| Ammonium transporter 1.1 | 1584 | 1584 | KC807907 |
| Ammonium transporter 1.2 | 1464 | 1464 | KC807908 |
| Ammonium transporter 1.3 | 1539 | 1539 | KC807909 |
| Ammonium transporter 2.1 | 1461 | 1461 | KC807910 |
| Ammonium transporter 2.3 | 1446 | 1446 | KC807911 |
| Glutamine dumper 1 | 345 | 345 | sp_v3.0_unigene97635 |
| Glutamine dumper 2 | 384 | 384 | sp_v3.0_unigene20421 |
| bb
MYB family of TF.
MYB1 |
1023 | 1023 |
sp_v3.0_unigene29297 |
| bb
MYB family of TF.
MYB4 |
946 | 946 | sp_v3.0_unigene127348 |
| bb
MYB family of TF.
MYB 8 |
1605 | 1605 | FN868598 |
| cc
Dof family of TF.
Dof2 |
1065 | 1065 | KC688677 |
| cc
Dof family of TF.
Dof3 |
1047 | 1047 | KC688678 |
| cc
Dof family of TF.
Dof4 |
1053 | 1053 | KC688679 |
| cc
Dof family of TF.
Dof6 |
907 | 907 | KC688680 |
| cc
Dof family of TF.
Dof7 |
1404 | 1404 | KC688681 |
| cc
Dof family of TF.
Dof8 |
822 | 822 | KC688682 |
| cc
Dof family of TF.
Dof9 |
897 | 897 | KC688683 |
| dd
NAC family of TF.
NAC1 |
1559 | 1545 | sp_v3.0_unigene7635 |
| dd
NAC family of TF.
NAC2 |
1476 | 1398 | sp_v3.0_unigene14173 |
| dd
NAC family of TF.
NAC3 |
1457 | 1388 | sp_v3.0_unigene18613 |
| dd
NAC family of TF.
NAC4 |
1630 | 1278 | sp_v3.0_unigene20354 |
| dd
NAC family of TF.
NAC5 |
1824 | 1180 | sp_v3.0_unigene1398 |
| Methionine synthase | 2301 | 2301 |
sp_v3.0_unigene34452 |
| S‐adenosylmethionine synthase | 1176 | 1176 |
sp_v3.0_unigene3027 |
| S‐adenosylhomocysteine hydrolase | 1458 | 1458 |
sp_v3.0_unigene19530 |
| Methylenetetrahydrofolate reductase | 1785 | 1785 |
sp_v3.0_unigene15993 |
| Caffeate O‐methyltransferase | 1161 | 1095 |
sp_v3.0_unigene17184 |
| Hydroxycinnamoyl‐CoA: shikimate hydroxycinnamoyl transferase | 1302 | 1302 |
sp_v3.0_unigene8683 |
|
Glycine decarboxilase complex H‐protein a |
504 | 504 |
sp_v3.0_unigene30488 |
|
Glycine decarboxilase complex H‐protein b |
516 | 516 |
sp_v3.0_unigene126824 |
| Mitochondrial serine hydroxymethyltransferase | 1572 | 1572 |
sp_v3.0_unigene439 |
| Cytosolic serine hydroxymethyltransferase | 1413 | 1413 |
sp_v3.0_unigene17057 |
| D‐3‐Phosphoglycerate dehydrogenase | 1947 | 1947 |
sp_v3.0_unigene543 |
| 3‐Phosphoserine aminotransferase | 1302 | 1302 |
sp_v3.0_unigene37851 |
| Pinoresinol‐lariciresinol reductase | 939 | 939 |
sp_v3.0_unigene17681 |
| Phenylcoumaran benzylic ether reductase | 927 | 927 |
sp_v3.0_unigene31659 |
| Phenylpropenal double‐bond reductase | 1056 | 1056 |
sp_v3.0_unigene22698 |
- a Accession number of unigene in Sustainpine and GenBank.
- b MYB family of TF.
- c Dof family of TF.
- d NAC family of TF.
Maritime pine regulatory genes
Transcription factors (TF) were specifically searched for in the Sustainpine database. The unique transcripts containing domains of plant TF in maritime pine using Pfam motifs were 877 distributed in 30 families (Table S1). Comparative analysis was performed with other woody plants including white spruce (P. glauca ), poplar (Populus trichocarpa ) and grapevine (Vitis vinifera ), the herbaceous models Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa ), and the model moss Physcomitrella patens . The total number of TF in maritime pine was similar to the number previously reported for white spruce (Rigault et al ., 2011), suggesting that the information quoted here could be close to the full representation of the maritime pine transcriptome. In addition, the TF gene number in Physcomitrella (802) and conifer species (877–892) is smaller than in angiosperms (Table S1), either woody (1337–2499) or herbaceous (1407–1828).
Overall, the relative representation of each TF domain among total TF domains was largely conserved in maritime pine and spruces (Figure 4 and Table S1) with only a few exceptions. For example, the SRF family is under‐represented with only four members in maritime pine relative to the 45–71 reported in spruces or the 42–109 members reported in angiosperms. The AP2 family with 106 members is over‐represented with regard to spruces (24–63), but contains a similar number of TF than in angiosperms (93–209). In contrast, the histone‐like TF (CBF/NF‐Y) family is under‐represented in maritime pine (26) and Norway spruce (20) when compared to white spruce (52), but exhibits similar numbers to those found in angiosperm species.

When we compared this general distribution in maritime pine to six plant models, we observed conservation of TF categories that can be classified establishing a hierarchical clustering (Figure S2) where a closer conservation in members of each family is observed for Physcomitrella , spruce and pine with a second branch for Arabidopsis , rice and grapevine and a separate branch for poplar with a greater number of TF. Likely, the recent genome duplication increased the number of TF in each family of poplar (Tuskan et al ., 2006). Families not found in maritime pine and previously reported in white spruce are as follows: RNA polymerase II TF SIII, RNA polymerase III, TF IIIC subunit and alcohol dehydrogenase TF Myb/SANT‐like (Rigault et al ., 2011).
Enzyme‐encoding genes
The representation of gene family members involved in various metabolic pathways was determined, and data retrieved from the SustainpineDB were compared with those found in Norway spruce (P. abies) , P. trichocarpa and A. thaliana (Figure 5). A first set of genes studied were those participating in nitrogen acquisition and assimilation. The number of genes encoding nitrate reductase and nitrite reductase was similar in the three species (1–2). For genes involved in inorganic nitrogen transport, some important differences were observed among species. For instance, Arabidopsis contains almost twice genes encoding nitrate transporters that maritime pine, Norway spruce and poplar. For ammonium transporters, only the poplar genome presents an expanded gene family of 14 members.

Fewer transcripts for genes encoding enzymes of ammonium assimilation were found in conifer species. Only 2–3 transcripts for glutamine synthetase (GS) were identified in gymnosperms (P. pinaster and P. abies ), in accordance with previous results (Cánovas et al ., 2007). In contrast, genomes of angiosperm species are endowed with GS families with a higher number of members, eight in Populus and six in Arabidopsis . A single expressed gene was found for ferredoxin‐glutamate synthase (Fd‐GOGAT) and NADH‐GOGAT in maritime pine and Norway spruce. In contrast, two genes encode Fd‐GOGAT and NADH‐GOGAT in poplar.
A second group of genes where those encoding enzymes involved in synthesis of methionine and S‐adenosylmethionine (SAM), the activated form of methionine, which participate in a number of essential metabolic pathways in plants. In particular, we focused on three genes involved in the synthesis and recycling of SAM, a methyl donor in multiple cellular transmethylation reactions (Figure 5). The number of genes encoding cobalamin‐independent methionine synthase and SAM synthase was identical in conifers and Arabidopsis but slightly higher in P. trichocarpa , whereas an additional S‐adenosyl‐L‐homocysteine hydrolase gene was found in pine and Norway spruce in regard to angiosperms.
Cellulose, hemicellulose and lignin are major components of the secondary cell wall, a well‐developed structure in woody perennials compared with Arabidopsis, particularly in trees such as maritime pine and poplar. A similar number of genes encoding the cellulose synthase catalytic subunit (CesA) were found in conifers compared with Arabidopsis , whereas the number increased to 18 in poplar. The number of genes encoding sucrose synthase and UDP‐glucose 6‐dehydrogenase in the maritime pine transcriptome was slightly lower compared with the other three plant species.
We also compared the number of genes encoding three key enzymes of the phenylpropanoid pathway that direct the carbon flow from the shikimate pathway to monolignol biosynthesis: phenylalanine ammonia‐lyase, cinnamoyl‐CoA reductase and cinnamyl alcohol dehydrogenase (Figure 5). The number of genes encoding the three enzymes was similar in maritime pine, Norway spruce and Arabidopsis and also in poplar in the case of phenylalanine ammonia‐lyase. In contrast, the members of the cinnamoyl‐CoA reductase and cinnamyl alcohol dehydrogenase families are considerably higher in poplar (40 and 21, respectively).
Single nucleotide polymorphism and simple‐sequence repeat identification
Using CLCBio with stringent criteria both in terms of minimum allele frequency (MAF = 20%) and contig depth (minimum of 10 reads), we identified 55 607 in silico Single nucleotide polymorphism (SNPs) in 13 892 unigenes (Table S2), corresponding to an average of 1 SNPs per 2 kb surveyed in the assembled transcriptome. Setting the contig depth parameter to 4 or 10, yielded 542 830 or 20 208 SNPs, respectively, with GigaBayes. Suprisingly, only 10 829 and 2106 SNPs were simultaneously found between CLCBio and GigaBayes with relaxed (minimum contig depth, CRL = 4) and more stringent (CRL=10) detection criteria, respectively; corresponding to 2% and 10.4% of common SNPs with CLCBio. Despite a fivefold increase in commonly detected SNPs using the more stringent detection criteria in GigaBayes, both software detected highly divergent sets of SNPs. Given this inconsistency, we used a set of 5138 SNPs validated in natural populations and mapping pedigrees of maritime pine (Lepoittevin et al ., 2010; Chancerel et al ., 2011; Chancerel et al., 2013) to evaluate the ability of both software to identify true SNPs. Prior to this analysis, we checked that the flanking sequences associated with the true SNPs were found in the present assembly using BLASTn analysis. Using CLCBio, 36.26% (1863 SNPs) of true SNPs were detected, while this rate dropped to 9.17% (471 SNPs) and 1.36% (70 SNPs) using relaxed and stringent GigaBayes criteria. By normalizing Gygabayses detection rates to the 55 607 putative SNPs identified by CLCBio, the relaxed and stringent GigaBayes detection procedures led to a success rate of 0.94% (48 true SNPs detected) and 3.75% (193 SNPs), respectively. In conclusion, CLCBio was found to perform nearly 10 times better than the most stringent GigaBayes SNP detection procedure with our data set.
A total of 5974 putative simple‐sequence repeat (SSRs) were found, with trinucleotide repeats (3309) being the most common, and dinucleotide repeats (479) the less abundant. This is in agreement to previously published P. pinaster SSR abundance (Fernández‐Pozo et al ., 2011).
Discussion
Maritime pine transcriptome assembly
Long‐read sequence data sets are required for transcriptome assembly in nonmodel species for which a reference genome is not available. In conifers, 454 sequencing has been recently used to generate well‐defined transcriptomes in several species of ecological and economic interest, that is, Pinus contorta (Parchman et al ., 2010), P. glauca (Rigault et al ., 2011), P. pinaster (Fernández‐Pozo et al ., 2011), Pinus taeda and 11 other conifers (Lorenz et al ., 2012). In the present work, we used a combination of 454 and Illumina sequencing to define a minimal reference transcriptome for maritime pine (P. pinaster ). A similar approach was recently used to characterize, for example, the globe artichoke transcriptome (Scaglione et al ., 2012). The nonredundant transcriptome resulting from the assembly contains 26 020 unique transcripts with orthologue ID in public databases, a number very close to the 27 720 unique cDNA clusters reported for the P. glauca transcriptome (Rigault et al ., 2011) and higher than the 17 000 unique coding genes obtained in the assembly of P. contorta transcriptome (Parchman et al ., 2010). The number of unique transcripts in maritime pine is also close to the number of genes (28 354) resulting from the draft assembly of the 20‐gigabase genome of P. abies (Nystedt et al ., 2013). Considering all the available data, an elevated coverage of the maritime pine transcriptome is estimated.
FLcDNAs catalogues as genomic resources
The availability of large collections of FLcDNAs in several conifers, such as Sitka (Ralph et al ., 2008) and white spruces (Rigault et al ., 2011), as well as Cryptomeria (Futamura et al ., 2008), has greatly facilitated the assembly and annotation of FLcDNAs in maritime pine. FLcDNAs are crucial for accurate annotation, comparative analysis with other conifer species and also for functional analysis of relevant genes associated to maritime pine growth, development and response to environmental changes. Furthermore, this genomic resource will greatly facilitate protein identification as well as protein–protein interaction studies through proteomics approaches (Cánovas et al ., 2004). For all these reasons, it was of paramount importance to validate the assembly of the FLcDNA collection (9641 different transcripts).
Over the last few years, refined protocols have been developed for Agrobacterium ‐mediated genetic transformation of maritime pine embryogenic tissue, cryopreservation of transgenic lines and efficient transgenic plant regeneration through somatic embryogenesis (reviewed in Trontin et al ., 2013). These new developments and the availability of a large collection of FLcDNAs have paved the way for the application of reverse genetics towards functional dissection of traits of economic and ecological interest in maritime pine trees. Thus, the availability of both standardized transformation methods and FLcDNA catalogues is expected to significantly increase throughput in candidate gene analysis together with facilitating comparison across laboratories interested in maritime pine genomics. The functional analysis of key genes is crucial for future applications in tree improvement, new variety design and sustainable forest management (e.g. development of marker‐assisted selection).
Maritime pine gene families and genome size
It has been suggested that the increased size and complexity of conifer genomes relative to angiosperms may be explained by the existence of large gene families (Kinlaw and Neale, 1997). However, this assumption is not fully supported by available data as most TF (Figure 4 and Table S1) or other gene families (Figure 5) present in maritime pine (this work) or spruce genomes (Birol et al ., 2013; Nystedt et al ., 2013; Rigault et al ., 2011) were of similar or even lower size compared with angiosperm species (P. trichocarpa , A. thaliana and V. vinifera ). Meanwhile, the existence of large gene families in conifers coding for enzymes of secondary metabolism has been reported (Martin et al ., 2004), there are other families in primary metabolism that contain similar, or even shorter, number of functional members in conifers than in angiosperms (Cánovas et al ., 2007). High genome size and complexity in conifers may be more readily explained by divergence and accumulation of retrotransposons and pseudogenes (Morse et al ., 2009; Nystedt et al ., 2013). Retrotransposons and pseudogenes can be expressed and contribute in some extent to the collection of unigenes without orthologue in the maritime pine transcriptome. Accumulation of pseudogenes may have functional advantages in the regulation of gene expression if they are expressed. Recently, Poliseno et al . (2010) reported that expressed pseudogenes compete with authentic target transcripts for miRNA binding and, as such, modulate expression levels of their cognate genes.
Transcription factors
The identification of transcription factors (TF) and subsequent analysis of the composition and organization of TF families are necessary steps to understand the regulatory networks associated with key processes in conifer trees. The number of TF in P. pinaster appears to be similar to that of P. glauca (Rigault et al ., 2011), but considerably lower compared with those found in the genomes of several angiosperms. This fact is confirmed by studies carried out in specific families. For example, the Dof family has only ten members in maritime and loblolly pines (Rueda‐López et al ., 2013); this is twofold to eightfold lower than gene number in angiosperms, including 36 in A. thaliana , 30 in rice (Lijavetzky et al ., 2003), 22 in grape or 46 in poplar (Pérez‐Rodríguez et al ., 2010). Moreover, it has also been argued that some subfamilies such as Knox I involved in meristem cell identity have pursued a distinct path of evolution in conifers. Three of the four groups delineated by phylogenetic analyses in angiosperms have no conifer sister groups (Guillet‐Claude et al ., 2004), suggesting that conifers evolved in a nonlinear fashion compared with angiosperms. Recently, it has been proposed that expansion of the VASCULAR NAC DOMAIN (VND) gene family might be related to xylem vessel complexity in angiosperms (Nystedt et al ., 2013). Other example of this divergent pattern of evolution for TF is found in the HD‐Zip III gene family involved in regulation of cambium, and primary and secondary vascular differentiation whose structure also has diverged considerably between angiosperms and gymnosperms (Côté et al ., 2010). Furthermore, distinct evolutionary trends in angiosperms and gymnosperms are evident by differential family gene expansion of subgroup 4 R2R3‐Myb with more recent duplications in P. glauca (Bedon et al ., 2010). In this particular case, the higher number of members in conifers appears to be related with isoprenoid‐ and flavonoid‐oriented stress responses.
Most of the existing knowledge of plant TF genes was obtained from studies in Arabidopsis . While Arabidopsis is a useful model for many developmental and environmentally regulated processes in higher plants, it lacks certain traits that are of immense value to agriculture or forestry. To better understand the evolutionary relationship and dependence of transcriptional regulation and morphological complexity in Viridiplantae is important to analyse how particular families of transcription factors did expand in correlation with the general increase in morphological complexity. In this sense, the availability of full‐length sequences from gymnosperms come to close a huge gap and will help to understand how the land plants managed, in terms of transcriptional regulation, to become multicellular.
Gene families of metabolic pathways
A major contrast was observed for genes involved in nitrogen metabolism likely related with adaptations to nitrogen availability and use. For example, poplar, Norway spruce and pine possess almost half the number (7) of nitrate transporters encoded in the Arabidopsis genome (12). This difference may be due to differences among species on soil nitrate uptake and transport during active growth, compared with trees in which active growth is primarily supported by internal nitrogen storage and mobilization (Cantón et al ., 2005). In contrast, the family of ammonium transporters is expanded in P. trichocarpa (14) compared with pine, spruce and Arabidopsis (6). It has been suggested that this specific feature may be related to the peculiar physiology of a perennial and mycorrhizal tree (Couturier et al ., 2007). However, the observation of similar number of ammonium transporters in conifer trees and Arabidopsis does not support the above assumption anymore. Another major difference is the reduced number of the GS gene family in pine (2) and spruce (3) compared with Populus (8) and Arabidopsis (6). Deep transcriptome analysis in the current study supports previous reports describing only two genes for cytosolic GS pine and the lack of a plastidic isoform (Cánovas et al ., 2007).
In trees, consumption of methyl units during lignification implies the existence of an important carbon sink, and the S‐adenosylmethionine availability may affect wood quality through alterations in lignin content and composition (Villalobos et al ., 2012). Nevertheless, no major differences were found in the number of genes in poplar and conifers with respect to Arabidopsis .
The number of genes related to carbon partitioning to cellulose and hemicellulose seems to be similar in the four species, with the exception of CesA that is expanded in P. trichocarpa . The CesA gene family in poplar includes 18 members that form nine phylogenetic groups, eight of which contain a pair of CesA genes with nearly identical sequence as a likely result of recent genome duplication. Some of them showed redundant expression xylem‐specific, and it was suggested that more CesA genes could be required for the massive synthesis of cellulose in trees (Suzuki et al ., 2006). However, the number of genes encoding CesA subunits in P. pinaster, P. abies and A. thaliana seems to be similar.
Monolignol biosynthesis is a complex branch of the general phenylpropanoid pathway. The number of genes encoding phenylalanine ammonia‐lyase was similar in all species (4–6). However, an increased number of genes encoding cinnamoyl‐CoA reductase were found in P. trichocarpa , P. pinaster and P. abies compared with Arabidopsis . Previous work in angiosperms showed that cinnamoyl‐CoA reductase family represents the largest lignin biosynthesis gene family in several species (Xu et al ., 2009). In conifers, transcriptomic data reported here suggest that the cinnamoyl‐CoA reductase family is also expanded. In contrast, it has been previously reported that conifers may have a single copy of the cinnamyl alcohol dehydrogenase gene (Mackay et al ., 1997). However, different transcripts were found in maritime pine and Norway spruce.
Overall, gene families involved in metabolic pathways were of similar size in P. pinaster and P. abies . The number of genes in P. pinaster was either similar or lower to those existing in Populus or Arabidopsis . These findings should be confirmed when a reference genome for pine is available.
SNP identification
Based on previous unigene sets constructed with significantly fewer sequences obtained essentially from one ecotype (that of the Aquitaine provenance), highly multiplexed SNP arrays were constructed in maritime pine for linkage applications (Chancerel et al ., 2011, 2013) and association mapping (Lepoittevin et al ., 2012). The present study provides the most comprehensive SNP catalogue for maritime pine bringing together sequences from a diverse range of ecotypes (France, Spain, Portugal) and therefore allowing the use of high throughput genotyping technologies for applications in: i) genetic diversity and population structure analysis, with less ascertainment bias than was observed from the analysis of previous SNP data sets (Chancerel et al., 2011) thank to a better coverage of the species diversity; and, ii) genomic selection, because about 14k different gene‐loci are now covered by at least one SNP. Our study reveals a strong influence of the SNP calling algorithm on the number of detected SNPs. Such variability in the number of called SNPs between SNP detection software has been already reported in another conifer species (Muller et al ., 2012) and indicates that results should be interpreted with caution, especially if based on a single detection approach. In our case, using a set of already validated SNPs, we show that CLCBio was able to detect true SNPs at a rate that was ten times higher than the most stringent criteria implemented in GigaBayes. Trees have long generation times, and breeding (especially for pine) is a slow process (Mullin et al ., 2011). Genomic selection offers the possibility to increase genetic gain per time unit in these long‐lived organisms as illustrated recently for P. taeda (Resende et al ., 2012) and accelerate their domestication (Harfouche et al ., 2012). Genomic selection will also permit to control more precisely that sufficient level of diversity is maintained in future varieties to allow forest trees to cope with major biotic and abiotic constraints in rapidly evolving environment.
Concluding remarks
In this work, comprehensive characterization of the P. pinaster transcriptome was performed using a combination of two different next‐generation sequencing platforms, 454 and Illumina. The de novo assembly of the maritime pine transcriptome provides a large catalogue of expressed genes (26 020 unigenes with orthologue, 9799 unigenes with coding characteristics but without orthologue, 176 ncRNA) and a relevant collection of FLcDNAs (9641). In addition, the sequencing data permitted the identification and establishment of robust SSR‐ and SNP‐based databases for genotyping applications and translational integration in maritime pine breeding programmes. These genomic resources will facilitate genome sequencing, functional genomics and applied studies in maritime pine trees.
Experimental procedures
Tree source, tissues and experimental conditions
Pinus pinaster samples of developing xylem were collected from different genotypes of a Corsican clonal population planted in 1986 at the forestry station of INRA‐Pierroton (Aquitaine, France), as described (Villalobos et al ., 2012). Miscellanea of maritime pine tissues (cones, male and female strobili, buds, xylem and phloem) were collected from adult trees located at Sierra Bermeja (Málaga). Roots, stems and needles of 2‐week‐old pine seedlings and zygotic embryos were also sampled as well as maritime pine embryonal masses. Somatic embryos were produced from embryonal masses line (AAY06006) of P. pinaster originating from a Landes × Morocco polycross. Embryonal masses were induced from immature zygotic embryo (Park et al ., 2006). Embryonal masses and somatic embryos were cultured according to Lelu‐Walter et al . (2006). Zygotic embryos at several developmental stages isolated from immature seeds were also used. All samples were frozen in liquid nitrogen and stored at −80 °C untill further use.
RNA isolation, cDNA synthesis and construction of libraries
Total RNA was isolated as described by Canales et al . (2012). Double‐stranded cDNA libraries were constructed using the MINT kit (Evrogen) with incorporated modifications proposed by Babik et al . (2010). The quality of cDNA libraries was checked on 1.5% agarose gels and Agilent 2100 Bioanalyzer. cDNA libraries were also constructed from poly(A) enriched RNA. Briefly, equal amounts of total RNA (50 μg) from different tissues were combined prior to mRNA purification. Poly(A)+ mRNA was double purified using the Oligotex mRNA Mini Kit (Qiagen), according to the manufacturer's instructions. First and second strand cDNA synthesis was conducted from 10 μg of mRNA following the protocol described by Durban et al . (2011). Five cDNA synthesis reactions were performed to obtain sufficient cDNA quantity for 454 pyrosequencing. RNA isolation, purification and library construction from somatic embryos were carried out by GATC company (Constanz, Germany) using Smart cDNA construction kit and standard procedures for Illumina GA IIx sequencing and using standard kit for construction of normalized cDNA library dedicated to Roche/454 GS FLX sequencing.
DNA sequencing and preprocessing analysis
Libraries were sequenced using different sequencing platforms, including Roche/454 Titanium, Roche/454 GS FLX, Illumina GAIIx and Illumina HiSeq 2000. Chloroplast, mitochondrial and ribosomal sequences in short reads were filtered out by reference mapping using BWA (Li and Durbin, 2009). All reads were preprocessed using the SeqTrimNext pipeline (http://www.scbi.uma.es/seqtrimnext; Falgueras et al., 2010) available at the Genotoul's computer facilities (INRA Toulouse, France) and at the Plataforma Andaluza de Bioinformática (University of Málaga, Spain). Low quality, ambiguous and low complexity stretches, linkers, adaptors, vector fragments, organelle DNA, polyA/polyT tails, and contaminated sequences were removed while keeping the longest informative part of the read, discarding sequences below 20 (short reads) or 40 bp (long reads). The command line for long reads was seqtrimnext ‐t transcriptomics_plants.txt ‐Q input_reads.fastq > output.txt , where the configuration file transcriptomics_plants.txt is provided by default with SeqTrimNext and contains the configuration parameters. The command line for short reads was the same than above, but configuration file also indicates to filter out sequences shorter than 20 pb, skip preliminary statistic calculations and skip the read clonality analysis.
Transcriptome assembly
The bioinformatics process for obtaining the putative unigenes using the SeqTrimNext preprocessed reads is detailed in Figure 1. Short reads and long reads were assembled using well‐established protocols (see Methods S1). The complexity and redundancy of short read and long read assemblies was reduced using CAP3 assembler (Liang et al ., 2000) to obtain the unigenes. A 95% cut‐off for overlap per cent identity was applied to cope with both sequence variation (high heterozygosity of pine genes) and genome heterogeneity between samples as previously described (Fernández‐Pozo et al ., 2011).
Annotation analysis
Unigenes were analysed using Sma3s (Pérez‐Pulido, A., Muñoz‐Mérida, A., Claros, M.G., Trelles, O., submitted; http://www.bioinfocabd.upo.es/node/11) with the command line sma3s_v2.pl ‐a 123 ‐d uniprot_plants.dat ‐i Unigenes.fasta ‐b Unigenes.blast ‐p F > output using as input the unigenes (.fasta file), the blast result of unigenes against the plant UniProtKB database (.blast file) and the metadata of the plant UniProtKB database (.dat file) to provide a gene description, GO terms, EC keys, KEGG maps and InterPro codes for every sequence. They were also analysed with FLN to provide gene description, identify, which unigenes corresponded to FLcDNAs, detect putative start and stop codons as well as the putative protein sequence, extract, which unigenes could be sRNAs, and obtain a quick preview of the pine unigene content. AutoFact (Koski et al ., 2005) was also used to provide a third gene description.
Identification of polymorphisms
Because nucleotide variation is considered frequent in plant genes, we screened the unigenes for SNPs and SSRs variation. SSRs were screened using MREPS (http://bioinfo.lifl.fr/mreps/; Kolpakov et al ., 2003) with default parameters. SNP detection was performed based on 4 098 953 Roche/454 and Sanger reads using either CLC‐Bio Workbench, v6.0 (CLC Bio, Aarhus, Denmark) or Gygabayes (http://bioinformatics.bc.edu/marthlab/wiki/index.php/Software), an implementation and expansion of the PolyBayes SNP detection algorithm from Marth et al . (1999). With respect to the former, we first used the reference mapping function to map the reads onto the 210 513 unigenes generated in the present study. The following parameters were used: similarity score = 80%, minimum length fraction = 90% and maximum number of hits for reads = 10. Then, we used the SNP detection function, based on the neighbourhood quality standard (NQS) algorithm, using the following parameters: minimum coverage = 10, minimum central base quality = 20, minimum neighbourhood quality over a window length of 11 nucleotides = 15, maximum gap and mismatch count = 2 and minimum allele frequency = 20%. With respect to the latter, we used the following parameters: (ploidy = diploid; CRL = 4 or 10; CAL2 = 2; PSL=0.9; O = 1; D = 0.001). In both software, no insertion/deletion variants (InDels) were considered.
Acknowledgements
We are indebted to the anonymous reviewers for their thorough evaluation and constructive recommendations that helped to improve this manuscript. We would like to thank Aranzazu Flores Monterroso, Remedios Crespillo, José Vega Bartol and Marta Simôes for help in the preparation of samples. This work was supported by the SUSTAINPINE Project funded by the Plant KBBE programme, Scientific and Technological Cooperation in Plant Genome Research (PLE2009‐0016). Zygotic embryo data production was supported by FCT (Portugal) through projects PTDC/AGRGPL/102877/2008 and P‐KBBE/AGR‐GPL/0001/2009, and grants PEst‐OE/EQB/LA0004/2011 and SFRH/BD/79779/2011 (AR). Alexandre Aguiar and Isabel Carrasquinho from Instituto Nacional de Investigação Agrária e Veterinária (INIAV) are acknowledged for making zygotic embryos available. Somatic embryo data production was supported by the EMBRYOME project, contract number 33639, funded by the french ‘Conseil Régional de la Région Centre’.
References
Citing Literature
Number of times cited according to CrossRef: 70
- Laura Hernandez-Escribano, Erik A. Visser, Eugenia Iturritxa, Rosa Raposo, Sanushka Naidoo, The transcriptome of Pinus pinaster under Fusarium circinatum challenge, BMC Genomics, 10.1186/s12864-019-6444-0, 21, 1, (2020).
- Francisco Javier Colina, María Carbó, Ana Álvarez, Luis Valledor, María Jesús Cañal, The Analysis of Pinus pinaster SnRKs Reveals Clues of the Evolution of This Family and a New Set of Abiotic Stress Resistance Biomarkers, Agronomy, 10.3390/agronomy10020295, 10, 2, (295), (2020).
- María Teresa Llebrés, María Belén Pascual, Carolina Valle, Fernando N. de la Torre, José Miguel Valderrama-Martin, Luis Gómez, Concepción Avila, Francisco M. Cánovas, Structural and Functional Characteristics of Two Molecular Variants of the Nitrogen Sensor PII in Maritime Pine, Frontiers in Plant Science, 10.3389/fpls.2020.00823, 11, (2020).
- Daniel Wood, Guillaume Besnard, David J. Beerling, Colin P. Osborne, Pascal-Antoine Christin, Phylogenomics indicates the “living fossil” Isoetes diversified in the Cenozoic, PLOS ONE, 10.1371/journal.pone.0227525, 15, 6, (e0227525), (2020).
- Qingqing Yang, Dongsheng Zhao, Qiaoquan Liu, Connections Between Amino Acid Metabolisms in Plants: Lysine as an Example, Frontiers in Plant Science, 10.3389/fpls.2020.00928, 11, (2020).
- David B. Neale, Nicholas C. Wheeler, David B. Neale, Nicholas C. Wheeler, Gene and Genome Sequencing in Conifers: Modern Era, The Conifers: Genomes, Variation and Evolution, 10.1007/978-3-319-46807-5, (43-60), (2019).
- Andreia S. Rodrigues, Inês Chaves, Bruno Vasques Costa, Yao-Cheng Lin, Susana Lopes, Ana Milhinhos, Yves Van de Peer, Célia M. Miguel, Small RNA profiling in Pinus pinaster reveals the transcriptome of developing seeds and highlights differences between zygotic and somatic embryos, Scientific Reports, 10.1038/s41598-019-47789-y, 9, 1, (2019).
- Christian Rellstab, Benjamin Dauphin, Stefan Zoller, Sabine Brodbeck, Felix Gugerli, Using transcriptome sequencing and pooled exome capture to study local adaptation in the giga‐genome of Pinus cembra, Molecular Ecology Resources, 10.1111/1755-0998.12986, 19, 2, (536-551), (2019).
- Yichun Qian, Joseph H. Lynch, Longyun Guo, David Rhodes, John A. Morgan, Natalia Dudareva, Completion of the cytosolic post-chorismate phenylalanine biosynthetic pathway in plants, Nature Communications, 10.1038/s41467-018-07969-2, 10, 1, (2019).
- Dario I. Ojeda, Tiina M. Mattila, Tom Ruttink, Sonja T. Kujala, Katri Kärkkäinen, Jukka-Pekka Verta, Tanja Pyhäjärvi, Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris , G3: Genes|Genomes|Genetics, 10.1534/g3.119.400357, 9, 10, (3409-3421), (2019).
- Emily Telfer, Natalie Graham, Lucy Macdonald, Yongjun Li, Jaroslav Klápště, Marcio Resende, Leandro Gomide Neves, Heidi Dungey, Phillip Wilcox, A high-density exome capture genotype-by-sequencing panel for forestry breeding in Pinus radiata, PLOS ONE, 10.1371/journal.pone.0222640, 14, 9, (e0222640), (2019).
- Xiaoxian Ruan, Zhen Wang, Ting Wang, Yingjuan Su, Characterization and Application of EST-SSR Markers Developed From the Transcriptome of Amentotaxus argotaenia (Taxaceae), a Relict Vulnerable Conifer, Frontiers in Genetics, 10.3389/fgene.2019.01014, 10, (2019).
- Ana Margarida Rodrigues, Célia Miguel, Inês Chaves, Carla António, Mass spectrometry‐based forest tree metabolomics, Mass Spectrometry Reviews, 10.1002/mas.21603, 0, 0, (2019).
- Ricardo Durán, Victoria Rodriguez, Angela Carrasco, David Neale, Claudio Balocchi, Sofía Valenzuela, SNP discovery in radiata pine using a de novo transcriptome assembly, Trees, 10.1007/s00468-019-01875-w, (2019).
- Ángel García-Gutiérrez, Francisco M. Cánovas, Concepción Ávila, Glutamate synthases from conifers: gene structure and phylogenetic studies, BMC Genomics, 10.1186/s12864-018-4454-y, 19, 1, (2018).
- José M. Alvarez, Natalia Bueno, Rafael A. Cañas, Concepción Avila, Francisco M. Cánovas, Ricardo J. Ordás, Analysis of the WUSCHEL-RELATED HOMEOBOX gene family in Pinus pinaster : New insights into the gene family evolution, Plant Physiology and Biochemistry, 10.1016/j.plaphy.2017.12.031, 123, (304-318), (2018).
- Mingliang Dong, Zewei Wang, Qingwei He, Jian Zhao, Zhirong Fan, Jinfeng Zhang, Development of EST-SSR markers in Larix principis-rupprechtii Mayr and evaluation of their polymorphism and cross-species amplification, Trees, 10.1007/s00468-018-1733-9, 32, 6, (1559-1571), (2018).
- Xing Huang, Bo Wang, Jingen Xi, Yajie Zhang, Chunping He, Jinlong Zheng, Jianming Gao, Helong Chen, Shiqing Zhang, Weihuai Wu, Yanqiong Liang, Kexian Yi, Transcriptome Comparison Reveals Distinct Selection Patterns in Domesticated and Wild Agave Species, the Important CAM Plants, International Journal of Genomics, 10.1155/2018/5716518, 2018, (1-12), (2018).
- Rafael A. Cañas, Mª Belén Pascual, Fernando N. de la Torre, Concepción Ávila, Francisco M. Cánovas, Resources for conifer functional genomics at the omics era, , 10.1016/bs.abr.2018.11.002, (2018).
- Chung-Jui Tsai, Scott A Harding, Janice E K Cooke, Branching out: a new era of investigating physiological processes in forest trees using genomic tools, Tree Physiology, 10.1093/treephys/tpy026, 38, 3, (303-310), (2018).
- Maria Cano, Angeles Morcillo, Alicia Humánez, Isabel Mendoza-Poudereux, Alex Alborch, Juan Segura, Isabel Arrillaga, Maritime Pine Pinus Pinaster Aiton, Step Wise Protocols for Somatic Embryogenesis of Important Woody Plants, 10.1007/978-3-319-89483-6_13, (167-179), (2018).
- Cristina García, Erwan Guichoux, Arndt Hampe, A comparative analysis between SNPs and SSRs to investigate genetic variation in a juniper species (Juniperus phoenicea ssp. turbinata), Tree Genetics & Genomes, 10.1007/s11295-018-1301-x, 14, 6, (2018).
- Mª Belén Pascual, Fernando de la Torre, Rafael A. Cañas, Francisco M. Cánovas, Concepción Ávila, NAC Transcription Factors in Woody Plants, , 10.1007/124_2018_19, (2018).
- Balasubramanian Vikashini, Arunachalam Shanthi, Modhumita Ghosh Dasgupta, Identification and expression profiling of genes governing lignin biosynthesis in Casuarina equisetifolia L., Gene, 10.1016/j.gene.2018.07.012, 676, (37-46), (2018).
- Susana L. Torales, Máximo Rivarola, Sergio Gonzalez, María Virginia Inza, María F. Pomponio, Paula Fernández, Cintia V. Acuña, Noga Zelener, Luis Fornés, H. Esteban Hopp, Norma B. Paniego, Susana N. Marcucci Poltri, De novo transcriptome sequencing and SSR markers development for Cedrela balansae C.DC., a native tree species of northwest Argentina, PLOS ONE, 10.1371/journal.pone.0203768, 13, 12, (e0203768), (2018).
- Jun Zeng, Jie Chen, Yixuan Kou, Yujin Wang, Application of EST-SSR markers developed from the transcriptome of Torreya grandis (Taxaceae), a threatened nut-yielding conifer tree , PeerJ, 10.7717/peerj.5606, 6, (e5606), (2018).
- A Pérez-González, M Marconi, I Cobo-Simón, B Méndez-Cea, P Perdiguero, R Linacero, J C Linares, F J Gallego, Abies pinsapo Boiss. Transcriptome Sequencing and Molecular Marker Detection: A Novel Genetic Resources for a Relict Mediterranean Fir, Forest Science, 10.1093/forsci/fxy022, (2018).
- Johanna Carlsson, Ulrika Egertsdotter, Ulrika Ganeteg, Henrik Svennerstam, Nitrogen utilization during germination of somatic embryos of Norway spruce: revealing the importance of supplied glutamine for nitrogen metabolism, Trees, 10.1007/s00468-018-1784-y, (2018).
- Hagar Fox, Adi Doron-Faigenboim, Gilor Kelly, Ronny Bourstein, Ziv Attia, Jing Zhou, Yosef Moshe, Menachem Moshelion, Rakefet David-Schwartz, Transcriptome analysis of Pinus halepensis under drought stress and during recovery, Tree Physiology, 10.1093/treephys/tpx137, 38, 3, (423-441), (2017).
- M. T. Llebrés, C. Avila, F. M. Cánovas, K. Klimaszewska, Root growth of somatic plants of hybrid Pinus strobus (L.) and P. wallichiana (A. B. Jacks.) is affected by the nitrogen composition of the somatic embryo germination medium, Trees, 10.1007/s00468-017-1635-2, 32, 2, (371-381), (2017).
- María-Teresa Llebrés, María-Belén Pascual, Sandrine Debille, Jean-François Trontin, Luc Harvengt, Concepción Avila, Francisco M Cánovas, The role of arginine metabolic pathway during embryogenesis and germination in maritime pine (Pinus pinaster Ait.), Tree Physiology, 10.1093/treephys/tpx133, 38, 3, (471-484), (2017).
- María Belén Pascual, María‐Teresa Llebrés, Blanca Craven‐Bartle, Rafael A. Cañas, Francisco M. Cánovas, Concepción Ávila, Pp1, a main regulator of phenylalanine biosynthesis and utilization in maritime pine, Plant Biotechnology Journal, 10.1111/pbi.12854, 16, 5, (1094-1104), (2017).
- Vanessa Castro-Rodríguez, Rafael A. Cañas, Fernando N. de la Torre, Ma Belén Pascual, Concepción Avila, Francisco M. Cánovas, Molecular fundamentals of nitrogen uptake and transport in trees, Journal of Experimental Botany, 10.1093/jxb/erx037, 68, 10, (2489-2500), (2017).
- Zixin Lin, Jiyong An, Jia Wang, Jun Niu, Chao Ma, Libing Wang, Guanshen Yuan, Lingling Shi, Lili Liu, Jinsong Zhang, Zhixiang Zhang, Ji Qi, Shanzhi Lin, Integrated analysis of 454 and Illumina transcriptomic sequencing characterizes carbon flux and energy source for fatty acid synthesis in developing Lindera glauca fruits for woody biodiesel, Biotechnology for Biofuels, 10.1186/s13068-017-0820-2, 10, 1, (2017).
- Angela Carrasco, Jill L. Wegrzyn, Ricardo Durán, Marta Fernández, Andrea Donoso, Victoria Rodriguez, David Neale, Sofía Valenzuela, Expression profiling in Pinus radiata infected with Fusarium circinatum, Tree Genetics & Genomes, 10.1007/s11295-017-1125-0, 13, 2, (2017).
- Rafael A. Cañas, Zhen Li, M. Belén Pascual, Vanessa Castro‐Rodríguez, Concepción Ávila, Lieven Sterck, Yves Van de Peer, Francisco M. Cánovas, The gene expression landscape of pine seedling tissues, The Plant Journal, 10.1111/tpj.13617, 91, 6, (1064-1087), (2017).
- Miguel Nemesio-Gorriz, Peter B. Blair, Kerstin Dalman, Almuth Hammerbacher, Jenny Arnerup, Jan Stenlid, Shahid M. Mukhtar, Malin Elfstrand, Identification of Norway Spruce MYB-bHLH-WDR Transcription Factor Complex Members Linked to Regulation of the Flavonoid Pathway, Frontiers in Plant Science, 10.3389/fpls.2017.00305, 8, (2017).
- Sonia H. Van Kerckhoven, Fernando N. de la Torre, Rafael A. Cañas, Concepción Avila, Francisco R. Cantón, Francisco M. Cánovas, Characterization of Three L-Asparaginases from Maritime Pine (Pinus pinaster Ait.), Frontiers in Plant Science, 10.3389/fpls.2017.01075, 8, (2017).
- Jorge El‐Azaz, Fernando Torre, Concepción Ávila, Francisco M. Cánovas, Identification of a small protein domain present in all plant lineages that confers high prephenate dehydratase activity, The Plant Journal, 10.1111/tpj.13195, 87, 2, (215-229), (2016).
- Unai López de Heredia, José Luis Vázquez-Poletti, RNA-seq analysis in forest tree species: bioinformatic problems and solutions, Tree Genetics & Genomes, 10.1007/s11295-016-0995-x, 12, 2, (2016).
- Pedro Seoane-Zonjic, Rafael A. Cañas, Rocío Bautista, Josefa Gómez-Maldonado, Isabel Arrillaga, Noé Fernández-Pozo, M. Gonzalo Claros, Francisco M. Cánovas, Concepción Ávila, Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing, BMC Genomics, 10.1186/s12864-016-2490-z, 17, 1, (2016).
- Jesús Pascual, Sara Alegre, Matthias Nagler, Mónica Escandón, María Luz Annacondia, Wolfram Weckwerth, Luis Valledor, María Jesús Cañal, The variations in the nuclear proteome reveal new transcription factors and mechanisms involved in UV stress response in Pinus radiata, Journal of Proteomics, 10.1016/j.jprot.2016.03.003, 143, (390-400), (2016).
- José M. Granados, Concepción Ávila, Francisco M. Cánovas, Rafael A. Cañas, Selection and testing of reference genes for accurate RT-qPCR in adult needles and seedlings of maritime pine, Tree Genetics & Genomes, 10.1007/s11295-016-1018-7, 12, 3, (2016).
- Jun-Jun Liu, Anna W. Schoettle, Richard A. Sniezko, Rona N. Sturrock, Arezoo Zamany, Holly Williams, Amanda Ha, Danelle Chan, Bob Danchok, Douglas P. Savin, Angelia Kegley, Genetic mapping of Pinus flexilis major gene (Cr4) for resistance to white pine blister rust using transcriptome-based SNP genotyping, BMC Genomics, 10.1186/s12864-016-3079-2, 17, 1, (2016).
- Jérôme Bartholomé, Joost Van Heerwaarden, Fikret Isik, Christophe Boury, Marjorie Vidal, Christophe Plomion, Laurent Bouffier, Performance of genomic prediction within and across generations in maritime pine, BMC Genomics, 10.1186/s12864-016-2879-8, 17, 1, (2016).
- Jean-François Trontin, Krystyna Klimaszewska, Alexandre Morel, Catherine Hargreaves, Marie-Anne Lelu-Walter, Molecular Aspects of Conifer Zygotic and Somatic Embryo Development: A Review of Genome-Wide Approaches and Recent Insights, In Vitro Embryogenesis in Higher Plants, 10.1007/978-1-4939-3061-6_8, (167-207), (2016).
- Vanessa Castro‐Rodríguez, Iman Assaf‐Casals, Jacob Pérez‐Tienda, Xiaorong Fan, Concepción Avila, Anthony Miller, Francisco M. Cánovas, Deciphering the molecular basis of ammonium uptake and transport in maritime pine, Plant, Cell & Environment, 10.1111/pce.12692, 39, 8, (1669-1682), (2016).
- Sara Alegre, Jesús Pascual, Matthias Nagler, Wolfram Weckwerth, María Jesús Cañal, Luis Valledor, Dataset of UV induced changes in nuclear proteome obtained by GeLC-Orbitrap/MS in Pinus radiata needles, Data in Brief, 10.1016/j.dib.2016.03.074, 7, (1477-1482), (2016).
- Jérôme Bartholomé, Marco CAM Bink, Joost van Heerwaarden, Emilie Chancerel, Christophe Boury, Isabelle Lesur, Fikret Isik, Laurent Bouffier, Christophe Plomion, Linkage and Association Mapping for Two Major Traits Used in the Maritime Pine Breeding Program: Height Growth and Stem Straightness, PLOS ONE, 10.1371/journal.pone.0165323, 11, 11, (e0165323), (2016).
- Rafael Cañas, Fernando de la Torre, Maria Pascual, Concepción Avila, Francisco Cánovas, Nitrogen Economy and Nitrogen Environmental Interactions in Conifers, Agronomy, 10.3390/agronomy6020026, 6, 2, (26), (2016).
- Javier Terol, Francisco Tadeo, Daniel Ventimilla, Manuel Talon, An RNA‐Seq‐based reference transcriptome for Citrus, Plant Biotechnology Journal, 10.1111/pbi.12447, 14, 3, (938-950), (2015).
- Aditya Banerjee, Aryadeep Roychoudhury, Group II late embryogenesis abundant (LEA) proteins: structural and functional aspects in plant abiotic stress, Plant Growth Regulation, 10.1007/s10725-015-0113-3, 79, 1, (1-17), (2015).
- Christophe Plomion, Catherine Bastien, Marie-Béatrice Bogeat-Triboulot, Laurent Bouffier, Annabelle Déjardin, Sébastien Duplessis, Bruno Fady, Myriam Heuertz, Anne-Laure Le Gac, Grégoire Le Provost, Valérie Legué, Marie-Anne Lelu-Walter, Jean-Charles Leplé, Stéphane Maury, Alexandre Morel, Sylvie Oddou-Muratorio, Gilles Pilate, Léopoldo Sanchez, Ivan Scotti, Caroline Scotti-Saintagne, Vincent Segura, Jean-François Trontin, Corinne Vacher, Forest tree genomics: 10 achievements from the past 10 years and future prospects, Annals of Forest Science, 10.1007/s13595-015-0488-3, 73, 1, (77-103), (2015).
- C. Plomion, J. Bartholomé, I. Lesur, C. Boury, I. Rodríguez‐Quilón, H. Lagraulet, F. Ehrenmann, L. Bouffier, J. M. Gion, D. Grivet, M. Miguel, N. María, M. T. Cervera, F. Bagnoli, F. Isik, G. G. Vendramin, S. C. González‐Martínez, High‐density SNP assay development for genetic analysis in maritime pine (inus pinaster), Molecular Ecology Resources, 10.1111/1755-0998.12464, 16, 2, (574-587), (2015).
- Marina Rueda‐López, Rafael A. Cañas, Javier Canales, Francisco M. Cánovas, Concepción Ávila, The overexpression of the pine transcription factor PpDof5 in Arabidopsis leads to increased lignin content and affects carbon and nitrogen metabolism, Physiologia Plantarum, 10.1111/ppl.12381, 155, 4, (369-383), (2015).
- Marjorie Vidal, Christophe Plomion, Luc Harvengt, Annie Raffin, Christophe Boury, Laurent Bouffier, Paternity recovery in two maritime pine polycross mating designs and consequences for breeding, Tree Genetics & Genomes, 10.1007/s11295-015-0932-4, 11, 5, (2015).
- Isabelle Lesur, Grégoire Le Provost, Pascal Bento, Corinne Da Silva, Jean-Charles Leplé, Florent Murat, Saneyoshi Ueno, Jerôme Bartholomé, Céline Lalanne, François Ehrenmann, Céline Noirot, Christian Burban, Valérie Léger, Joelle Amselem, Caroline Belser, Hadi Quesneville, Michael Stierschneider, Silvia Fluch, Lasse Feldhahn, Mika Tarkka, Sylvie Herrmann, François Buscot, Christophe Klopp, Antoine Kremer, Jérôme Salse, Jean-Marc Aury, Christophe Plomion, The oak gene expression atlas: insights into Fagaceae genome evolution and the discovery of genes regulated during bud dormancy release, BMC Genomics, 10.1186/s12864-015-1331-9, 16, 1, (112), (2015).
- José Antonio Cabezas, Santiago C. González-Martínez, Carmen Collada, María Angeles Guevara, Christophe Boury, Nuria de María, Emmanuelle Eveno, Ismael Aranda, Pauline H. Garnier-Géré, Jean Brach, Ricardo Alía, Christophe Plomion, María Teresa Cervera, Nucleotide polymorphisms in a pine ortholog of the Arabidopsis degrading enzyme cellulase KORRIGAN are associated with early growth performance in Pinus pinaster , Tree Physiology, 10.1093/treephys/tpv050, 35, 9, (1000-1006), (2015).
- Xiaoyan Xiang, Zhongxin Zhang, Zhigao Wang, Xiaoping Zhang, Ganlin Wu, Transcriptome sequencing and development of EST-SSR markers in Pinus dabeshanensis, an endangered conifer endemic to China, Molecular Breeding, 10.1007/s11032-015-0351-0, 35, 8, (2015).
- Ma Belén Pascual, Francisco M. Cánovas, Concepción Ávila, The NAC transcription factor family in maritime pine (Pinus Pinaster): molecular regulation of two genes involved in stress responses, BMC Plant Biology, 10.1186/s12870-015-0640-0, 15, 1, (2015).
- Erik A. Visser, Jill L. Wegrzyn, Emma T. Steenkmap, Alexander A. Myburg, Sanushka Naidoo, Combined de novo and genome guided assembly and annotation of the Pinus patula juvenile shoot transcriptome, BMC Genomics, 10.1186/s12864-015-2277-7, 16, 1, (2015).
- Rafael A. Cañas, Isabel Feito, José Francisco Fuente-Maqueda, Concepción Ávila, Juan Majada, Francisco M. Cánovas, Transcriptome-wide analysis supports environmental adaptations of two Pinus pinaster populations from contrasting habitats, BMC Genomics, 10.1186/s12864-015-2177-x, 16, 1, (2015).
- Rafael A. Cañas, Javier Canales, Carmen Muñoz-Hernández, Jose M. Granados, Concepción Ávila, María L. García-Martín, Francisco M. Cánovas, Understanding developmental and adaptive cues in pine through metabolite profiling and co-expression network analysis, Journal of Experimental Botany, 10.1093/jxb/erv118, 66, 11, (3113-3127), (2015).
- David Velasco, Pedro Seoane, M. Gonzalo Claros, Bioinformatics Analyses to Separate Species Specific mRNAs from Unknown Sequences in de novo Assembled Transcriptomes, Bioinformatics and Biomedical Engineering, 10.1007/978-3-319-16480-9_32, (322-332), (2015).
- C. Lepoittevin, C. Bodénès, E. Chancerel, L. Villate, T. Lang, I. Lesur, C. Boury, F. Ehrenmann, D. Zelenica, A. Boland, C. Besse, P. Garnier‐Géré, C. Plomion, A. Kremer, Single‐nucleotide polymorphism discovery and validation in high‐density SNP array for genetic analysis in European white oaks, Molecular Ecology Resources, 10.1111/1755-0998.12407, 15, 6, (1446-1459), (2015).
- Rosario Carmona, Adoración Zafra, Pedro Seoane, Antonio J. Castro, Darío Guerrero-Fernández, Trinidad Castillo-Castillo, Ana Medina-García, Francisco M. Cánovas, José F. Aldana-Montes, Ismael Navas-Delgado, Juan de Dios Alché, M. Gonzalo Claros, ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome, Frontiers in Plant Science, 10.3389/fpls.2015.00625, 6, (2015).
- Hicham Benzekri, Paula Armesto, Xavier Cousin, Mireia Rovira, Diego Crespo, Manuel Merlo, David Mazurais, Rocío Bautista, Darío Guerrero-Fernández, Noe Fernandez-Pozo, Marian Ponce, Carlos Infante, Jose Zambonino, Sabine Nidelet, Marta Gut, Laureana Rebordinos, Josep V Planas, Marie-Laure Bégout, M Claros, Manuel Manchado, De novo assembly, characterization and functional annotation of Senegalese sole (Solea senegalensis) and common sole (Solea solea) transcriptomes: integration in a database and design of a microarray, BMC Genomics, 10.1186/1471-2164-15-952, 15, 1, (952), (2014).
- Dolores Abarca, Alberto Pizarro, Inmaculada Hernández, Conchi Sánchez, Silvia P Solana, Alicia del Amo, Elena Carneros, Carmen Díaz-Sala, The GRAS gene family in pine: transcript expression patterns associated with the maturation-related decline of competence to form adventitious roots, BMC Plant Biology, 10.1186/s12870-014-0354-8, 14, 1, (2014).
- Pedro Perdiguero, Carmen Collada, Ãlvaro Soto, Novel dehydrins lacking complete K-segments in Pinaceae. The exception rather than the rule, Frontiers in Plant Science, 10.3389/fpls.2014.00682, 5, (2014).
- Adeline Becquer, Jean Trap, Usman Irshad, Muhammad A. Ali, Plassard Claude, From soil to plant, the journey of P through trophic relationships and ectomycorrhizal association, Frontiers in Plant Science, 10.3389/fpls.2014.00548, 5, (2014).




