Functional and morphological evolution in gymnosperms: A portrait of implicated gene families

Abstract Gymnosperms diverged from their sister plant clade of flowering plants 300 Mya. Morphological and functional divergence between the two major seed plant clades involved significant changes in their reproductive biology, water‐conducting systems, secondary metabolism, stress defense mechanisms, and small RNA‐mediated epigenetic silencing. The relatively recent sequencing of several gymnosperm genomes and the development of new genomic resources have enabled whole‐genome comparisons within gymnosperms, and between angiosperms and gymnosperms. In this paper, we aim to understand how genes and gene families have contributed to the major functional and morphological differences in gymnosperms, and how this information can be used for applied breeding and biotechnology. In addition, we have analyzed the angiosperm versus gymnosperm evolution of the pleiotropic drug resistance (PDR) gene family with a wide range of functionalities in plants' interaction with their environment including defense mechanisms. Some of the genes reviewed here are newly studied members of gene families that hold potential for biotechnological applications related to commercial and pharmacological value. Some members of conifer gene families can also be exploited for their potential in phytoremediation applications.


| 211
DE LA TORRE ET AL.
The most noteworthy differences between angiosperms and gymnosperms certainly occur at the morphological level. Flowers, the major functional innovation in angiosperms, are assumed to have evolved through the transformation of gymnosperms' separate male and female structures into an integrated hermaphrodite structure (Niu et al., 2016;Pires & Dolan, 2012). Similarly, angiosperms developed a more efficient method of water transport through vessels, while tracheids are present in gymnosperm species (with the exception of gnetales), but also in the basal angiosperm Amborella trichopoda. Gene families involved in secondary metabolism such as terpene biosynthesis or various alkaloid biosynthesis pathways evolved differently in gymnosperms and flowering plants (Chen, Tholl, Bohlmann, & Pichersky, 2011;Hall, Zerbe, et al., 2013b). In this review paper, we aim to understand how a subset of well-studied genes and gene families have contributed to the evolution of major morphological and functional differences between angiosperms and gymnosperms including their reproductive biology, water-conducting xylem tissues, secondary metabolism and stress, and noncoding and small RNAs. In addition, we analyzed the gene family evolution of the pleiotropic drug resistance (PDR) proteins, known to play important roles in plant-environment interactions in angiosperms. Some of the gymnosperm genes reviewed here are newly studied members of gene families such as PDR that hold potential for biotechnological applications with commercial and pharmacological value. Some members of conifer gene families have potential to be exploited for improved growth on marginal or disturbed soils, by increasing the detoxification potential of spruces in phytoremediation applications.

| G ENOMIC E VOLUTIONARY D IFFEREN CE S B E T WEEN ANG I OS PERMS AND GYMNOS PERMS
Before the extensive radiation of flowering plants during the late Cretaceous, gymnosperms dominated the world flora for almost 200 million years (Pennisi, 2009;Pires & Dolan, 2012). Extreme climatic shifts over the Cenozoic resulted in major extinctions in the gymnosperm lineage, which may account for the low diversity of extant gymnosperms in comparison with their sister seed plant clade (Crisp & Cook, 2011;Leslie et al., 2012). Extinctions were more pronounced in the Northern hemisphere in which older lineages were replaced by those better adapted to cooler and drier environmental conditions, resulting in higher species turnover rates in Pinaceae and Cupressaceae, compared to southern lineages (Leslie et al., 2012).
More recently, climatic changes during the last Glaciation strongly shaped species distributions and patterns of speciation and adaptation for many Northern hemisphere gymnosperms which went through cycles of contraction and expansion from refugia (Shafer, Cullingham, Côté, & Coltman, 2010).
While angiosperm evolution has been shaped by whole-genome duplication (WGD) events leading to higher speciation rates and the development of key functional innovations, gymnosperm genomes have been less dynamic (Landis et al., 2018;Soltis & Soltis, 2016;Vanneste, Maere, & Van de Peer, 2014). The rarity of WGD, paucity of chromosomal rearrangements, and slow mutation rates have led to low levels of structural genomic and morphological variation among species, and low speciation rates in gymnosperms (De La Torre et al., 2014;De La Torre, Li, Van de Peer, & Ingvarsson, 2017;Jaramillo-Correa, Verdu, & Gonzalez-Martinez, 2010;Leitch & Leitch, 2012;Pavy et al., 2012). In the presence of polyploidy and retro-transposition, angiosperms have developed mechanisms to counteract the increase in genomic DNA by replication or recombination-based errors generating indels, and unequal recombination between sister chromosomes (Grover & Wendel, 2010;Leitch & Leitch, 2012).
Although polyploidy is largely absent in gymnosperms (exceptions are Sequoia, Pseudolarix, and Ephedra), a combination of a massive accumulation of long-terminal repeat retrotransposons (LTR-RTs), together with limited removal of transposable elements through unequal recombination, has resulted in very large genome sizes (mean 1C = 18.8 pg;De La Torre et al., 2014;Leitch & Leitch, 2012;Nystedt et al., 2013). Recent studies revealed transposable elements make up to 74%, 76.58%, 79%, and 85.9% of the genomes of Pinus taeda, Ginkgo biloba, Pinus lambertiana, and Gnetum montanum, respectively (Guan et al., 2016;Neale, Martínez-García, Torre, Montanari, & Wei, 2017;Neale et al., 2014;Wan et al., 2018;Wegrzyn et al., 2014). A comparative analysis among six diverged gymnosperms suggested the diversity and abundance of transposable elements is widely conserved among gymnosperm taxa (Nystedt et al., 2013). However, a more recent study focused on gnetophytes (Gnetum, Welwitschia, Ephedra) suggests higher frequencies of LTR-RT elimination due to recombination-based processes of genome downsizing may explain the smaller sizes of gnetophytes in comparison with other gymnosperm genomes (Wan et al., 2018). Despite significant variations in noncoding regions of angiosperm and gymnosperm genomes, both plant lineages have comparable numbers of genes and gene families. Sequence similarities of expressed genes are 58%-61% between conifers and angiosperms, and 80% within Pinaceae (Prunier, Verta, & MacKay, 2016;Rigault et al., 2011). This suggests that functional differences observed between seed plant lineages may have evolved as a consequence of differences in rates of nucleotide substitution, frequency of copy number variant (CNV) formation Prunier, Caron, Lamothe, et al., 2017; for a discussion of poplar vs.
spruce CNVs see Prunier et al., 2019), and/or differential gene family expansion or contraction (Zhou et al., 2019). A recent analysis of protein-coding genes across a broad phylogeny suggested slower rates of molecular evolution (number of synonymous substitutions dS and mutation rates), but higher substitution rate ratios (dN/dS) in gymnosperms than in angiosperms . Higher levels of dN/dS in gymnosperms suggest stronger and more effective selection pressures probably due to larger effective population sizes, especially in the Pinaceae . In addition, gymnosperms generally present high levels of within-population genetic diversity, while long-distance gene flow of wind-dispersed pollen between highly outcrossed populations leads to rapid decay of linkage disequilibrium and low among-population genetic diversity (De La Torre et al., 2014Porth & El-Kassaby, 2014). Higher gene turnover, which probably explains a higher species turnover, has been observed in Pinaceae. Although the cause of this is unknown, it is being suggested that this trend might be explained by an elevated frequency of gene CNVs, although rates of CNV formation in Pinaceae or any other gymnosperms are unknown (Casola & Koralewski, 2018).
Because gymnosperms predate angiosperms, most differential gene family expansions between angiosperms and gymnosperms seem to have occurred either by loss of genes in angiosperms (most likely scenario) or gain in gymnosperms (by neofunctionalization or subfunctionalization). Large expanded paralogous gene families such as leucine-rich repeats, cytochrome P450, MYB, and others (Table 1) have been observed in gymnosperms (De La Torre, Lin, Van de Peer, & Ingvarsson, 2015;Neale et al., 2014;Pavy et al., 2013;Porth, Hamberger, White, & Ritland, 2011;Warren et al., 2015). While comparing differentially expanded gene families using whole-genome data, our study found that Picea abies' larger gene ontologies, compared to those of Arabidopsis thaliana, are the consequence of the species' ability to respond to diverse stimuli (biotic and abiotic stress), transport mechanisms, and a variety of specific metabolic and biosynthetic processes (Figure 1).

| REPRODUC TIVE B IOLOGY
The reproductive biology in gymnosperms is characterized by a largely outcrossing mating system, predominant anemophily (wind pollination) and wind-mediated seed dispersal. Other characteristics that differ between gymnosperms and angiosperms are the presence of uncovered seeds (lack of fruit), a haploid nourishing tissue (megagametophyte) surrounding the diploid embryo in the developing seed, and temporary polyembryony.
In order to facilitate pollen release and dispersal through wind travel, gymnosperms' male reproductive structures (male cones; pollen grains) have evolved an impressive diversity of male cone positioning, and grain shapes (Lu et al., 2011). This is seen as a necessity to overcome the innate constraints from gymnosperm's heavy (but not always exclusive) reliance on anemophily. In contrast, angiosperms evolved flowers with attractive colors and fragrances as signals for pollination by insects and other animals, pollen and nectar rewards as food source for the pollinator, as well as fruits for their seeds' protection but also dispersal by animals. Ovular secretion is also crucial to reproduction in gymnosperms as it fosters pollen germination and pollen tube growth, eventually leading to fertilization of an egg cell within gymnosperms' archegonia. Yet, virtually nothing is known about the molecular genetic basis of ovular secretion, an important characteristic in gymnosperms' megagametophytes (Zhang & Zheng, 2016). A recent study on thirteen species representing all five main lineages of extant gymnosperms (Nepi et al., 2017)  also been an ancient event in angiosperms due to its importance in pioneer habitats (Gottsberger, 1988).
Archegonia develop from initial cells within the female gametophyte of the ovule through subsequent rounds of divisions giving rise to (outward) neck cells, and the central cell. This later develops into the large egg cell and the small ventral canal cell, which degenerates as the egg cell matures. Despite its importance, the molecular regulation of archegonia development in the ovule of cone-bearing gymnosperms has not been extensively studied, and the role of archegonia in egg fertilization is largely unknown (Zhang & Zheng, 2016). The major challenge for such studies is the long duration of the pollination process (up to 13 months for pines) compared to the short period of time required for zygote formation. Archegonia were not found to produce pollen-specific signals, but neck cells might produce these (Zhang & Zheng, 2016). Evidence from lower archegoniatae such as ferns and bryophytes suggests auxin-responsive genes might be involved in reproductive organ morphogenesis, differentiation, and cell turnover related to archegonial development (Zhang & Zheng, 2016). Some evidence also hints at a role for the Finally, a single fertilization event within the ovule produces a diploid embryo that develops within a haploid female gametophyte.
At early seed development, polyembryony is also an important reproductive feature in conifers, whereby multiple archegonia can be fertilized by different pollen grains. In all cases though, only the dominant embryo persists and matures while all others are aborted.
The molecular basis of embryo persistence is unknown (Cairney & Pullman, 2007). The embryo suspensor stage is a critical stage in early embryonic development as it helps the embryo to grow within the female gametophyte, and to benefit from nutrient absorption while it enlarges. Gymnosperms contain genes of very similar sequence to angiosperm embryogenesis-regulating genes (Cairney & Pullman, 2007). The comparative synthesis by Cairney and Pullman revealed that in gymnosperm embryogenesis, subtle molecular interactions, spatially and temporally controlled gene expression, and few unique regulatory proteins can achieve differences in embryonic structure and development. One important example is the abovementioned WOX transcription factor genes.
Recently, transcriptomic studies on embryogenesis in pines (P. sylvestris; Pinus pinaster; P. lambertiana; Araucaria angustifolia) and spruces (P. abies; Picea balfouriana) have been published (reviewed in Rodrigues et al., 2018). Nevertheless, in order to better understand gymnosperms' unique regulatory networks, any functional analysis of conifer developmental genes must be conducted by expressing these genes in a conifer. Therefore, the development of a robust, easy-to-use and broadly applicable transformation system for conifers constitutes a prerequisite to a better understanding of several aspects of this phylum's cell and molecular biology (Cairney & Pullman, 2007). Up until now, this has not been achieved. More recent studies revealed a crucial role for small noncoding RNAs, and some of their target genes were revealed for the regulation of seed development and in embryo development (Rodrigues & Miguel, 2017). Niu et al. (2015) identified such sRNAs specifically for male and female cones of P. tabuliformis, and with higher activities in the female than in the male reproductive structures. The miR156-SPLs, miR159-MYBs, miR172-AP2Ls, miR319-TCP, and miR396-GRFs interacting pairs found for this pine species coincided with those in angiosperms' reproductive development, suggesting ancient evolutionary histories of these sRNA regulatory pathways (Niu et al., 2015). genes were found in Pinus radiata (Krauskopf, Harris, & Putterill, 2005), 17 CesAs in Cunninghamia lanceolata (Huang et al., 2012), six in P. taeda , and nine CesAs in G. montanum (Wan et al., 2018), in comparison with 10 and 18 CesA genes in Arabidopsis and Populus trichocarpa genomes, respectively (Suzuki et al., 2006).

| Cellulose/hemicelluloses synthases and their regulation
Recently, in the G. montanum genome reference paper, it was suggested that large expansions in the CslB/H subfamilies may explain the distinct growth characteristics in Gnetum when compared to other gymnosperms (Wan et al., 2018). It is interesting that CslB/ E/H/G that evolved from ancestral genes in ferns were lost in many gymnosperms, such as P. abies, P. taeda, G. biloba, and other species (Yin et al., 2014).
Cellulose and hemicellulose biosyntheses are regulated at the transcriptional level (Li, Bashline, Lei, & Gu, 2014). In angiosperms, for example, at least 13 out of 126 MYB transcription factors were reported to be involved in cellulose formation by regulating CesA/ Csl gene expression directly or indirectly in Arabidopsis (Zhang, Nieminen, Serra, & Helariutta, 2014). However, in gymnosperms, only 13 Picea glauca and five P. taeda MYB genes were identified, suggesting a much lower number of MYB genes than in Arabidopsis and Populus (Bedon, Grima-Pettenati, & Mackay, 2007). Some gymnosperm MYB genes, which have conserved functions (e.g., PtMYB1 and PtMYB4 in P. taeda), are expressed in the secondary xylem and involved in lignin biosynthesis as their homolog in Arabidopsis (Bedon et al., 2007;Patzlaff et al., 2003). Whether cellulose biosynthesis is regulated by MYB transcription factors is not clear in gymnosperms.
However, the CesA genes' regulation network in gymnosperms might be less complex than in angiosperms. Cellulose biosynthesis is also affected by the content of lignin, another component of the plant cell wall (Endler & Persson, 2011). In Populus, artificial lignin biosynthesis inhibition is coupled with cellulose production and higher growth, suggesting cellulose synthase activity is restricted by substrate content (Hu et al., 1999). In P. taeda, spontaneous mutations in lignin biosynthesis (Songstad, Petolino, Voytas, & Reichert, 2017) also caused fast stem growth, suggesting cellulose synthase activity may be naturally regulated by lignin content in gymnosperms (Gill, Brown, & Neale, 2003).

| Vascular NAC domain
The difference between water-conducting xylem tissues (tracheids vs. vessels) is one of the main differences between gymnosperms and angiosperms (Wan et al., 2018). Tracheids, whose dual function is water transport and mechanical support, constitute the xylem tissue in gymnosperms. In angiosperms, xylem tissue is more complex and consists of vessels, fibers, and rays ( formation (Wan et al., 2018), and at least, the dominant repression of VND7 showed a more severe phenotype than the dominant repression of VND6 (Kubo et al., 2005). In the second one, vessel formation requires VND gene expansion and their co-expression (Nystedt et al., 2013). Although the seven VNDs in Arabidopsis had conserved expression patterns and downstream genes, the expression level in vessels of different organs and activation strength were different (Zhou et al., 2014), suggesting the seven VNDs might coordinately work to regulate vessel formation.

| Abiotic stress-Dehydrins
Dehydrins are a group of proteins belonging to the late embryogenesis abundant (LEA) gene family that are highly hydrophilic and are commonly associated with acclimation to low temperature and other environmental stresses involving cellular dehydration in plants (Rorat, 2006 (Wachowiak et al., 2009).
The specific mode of action of dehydrins is unclear, but some studies suggest that dehydrins stabilize membranes and macromolecules in conditions of low water availability (Hanin et al., 2011). The size of the dehydrin gene family is highly variable ranging from two members in Amborella to more than 12 in Malus domesticus in angiosperms. Gymnosperms are less studied, but within Pinaceae, the dehydrin family appears to be much larger relative to angiosperms, with a total of 53 having been identified in P. glauca (Stival Sena, Giguère, Rigault, Bousquet, & Mackay, 2018). Subfunctionalization is thought to be the primary driver for the increased diversity of dehydrins in conifers over angiosperms (Stival Sena et al., 2018). In contrast, extant species of Gnetum have reduced numbers of LEA genes (and dehydrins) when compared to other gymnosperms (Wan et al., 2018). Gnetum also differs from other gymnosperms in that it only exists in warm, mesic habitats (Wan et al., 2018), lending more evidence to the role dehydrins play in adaptation to water stress.
Moreover, crucial biosynthetic genes for pest resistance (e.g., 3CAR; CYP720B4) feature high content of repetitive sequence regions and transposable elements, suggesting that diversification of the conifer TPS and P450 gene families may have been achieved by DNA transposon-mediated translocation mechanisms (Hamberger et al., 2009).
Another important feature of conifer TPSs is their high potential for functional plasticity such that few changes in amino acids can create new potent defense molecules (Keeling, Weisshaar, Lin, & Bohlmann, 2008).
Because plants have a long evolutionary history of interaction with herbivores, hosts have acquired coevolved defenses (Futuyma & Agrawal, 2009). A special case is the gymnosperm G. biloba, which is largely herbivore-free. Ginkgo's foliage produces ginkgolides, a class of terpene trilactones known as a potent antifeeding defense (Mohanta et al., 2012;Pan, Ren, Chen, Feng, & Luo, 2016). In general, the most effective host tree defenses exist against local pests and pathogens, while host defenses weaken under relaxed or absent pathogen pressure. This is a recurrent problem with introduced foreign pest and pathogens, but also with native pests and pathogens expanding their natural ranges. As climate warms, these native species may expand their ranges northwards or to higher altitudes, where they may encounter "naïve" hosts. Moreover, native species may change their metabolism to support a more aggressive behavior, leading to unprecedented population growth and range expansions, and threatening local and new host trees in a pest's newly invaded habitat. A widely publicized example of current range expansion is the mountain pine beetle (Dendroctonus ponderosae Hopkins). This pest epidemic in western North America is now threatening the boreal forest (Cullingham et al., 2011).
Trees have developed different lines of defense that are more or less effective, and also alternative strategies such as tolerance.
Anatomical and the associated chemical defenses in conifer bark have been described (Franceschi, Krokene, Christiansen, & Krekling, 2005). Strength and rapidity of traumatic resinosis (direct defense) has often been associated with resistance. The physical structures studied in most detail are the parenchyma cells (locations of synthesis and storage of polyphenols), and the resin ducts (synthesis and storage of terpenes) that are located in the secondary phloem and the cambium. The traumatic resin canals are formed in the secondary xylem as a way of active defense. Upon attack, reallocation of resources from primary processes to active defense, or the mobilization of the resources for host tolerance, takes place. Indirect tree defense responses that involve the attraction of predators or herbivore parasitoids have also been documented. Moreover, trade-offs involving defense strategies involve display of chemical defenses, or rely on tolerance (Futuyma & Agrawal, 2009). In a recent study on the genomics of host defenses against the spruce shoot weevil (Pissodes strobi Peck), Porth et al. (2018) concluded that well-established terpenoid-related spruce defenses and tolerance to this herbivore might be mutually exclusive.
It has been postulated that drought-stressed conifers whose metabolism is diverted from growth to secondary compounds can rely more on constitutive, preformed defenses (Turtola, Manninen, Rikala, & Kainulainen, 2003). Also, it is well known that fast growing individuals are biased toward induced defenses (Steppuhn & Baldwin, 2008). Therefore, trade-offs between already established and induced defenses can be expected. These dynamics under different environmental conditions need to be better studied in the future, while current genomic studies usually represent a snap-shot situation aiming to identify few highly upregulated candidate genes from well-annotated conifer defense metabolic pathways such as the phenylpropanoid and methylerythritol phosphate/mevalonate (Hall, Yuen, et al., 2013a;Keeling et al., 2011;Porth et al., 2011;Shalev et al., 2018;Warren et al., 2015;Zhou et al., 2019). In addition, the genetic networks between defenses in conifers and their reproductive development seem to be intricate. With few exceptions, this important relationship has been largely ignored in conifer defense studies, mainly because the conifer reproductive genes (many are also gene family members) were under-studied; thus, their exact functioning remains elusive (see section on Reproductive Biology).
Given the current knowledge about defensive gene family expansion in gymnosperms (Porth et al., 2011(Porth et al., , 2012Warren et  phenolic (Li et al., 2012) compounds to better target effective tree defenses in the future.

| A case study of functional pleiotropy with defense: the PDR ABC transporter family
Pleiotropic drug resistance (PDR) genes belong to a fungi and plantspecific gene family within the ATP Binding Cassette (ABC) gene superfamily (Crouzet, Trombik, Fraysse, & Boutry, 2006;Higgins, 1992;Lamping et al., 2010). The PDR gene family was named following the observation that members of its family confer resistance to various drugs; however, PDR genes are also involved in the transport of substrates not related to cell detoxification (Ito & Gray, 2006;Nuruzzaman, Zhang, Cao, & Luo, 2014;Pierman et al., 2017;Sasse et al., 2016). Three recent and completely independent studies on two spruces (P. glauca; P. glauca × engelmannii) and P. taeda are suggesting specific PDR genes as important key players in defense mechanisms against different herbivores (Mageroy et al., 2015;Porth et al., 2018) and pathogens (De la Torre et al., 2018). For example, research on spruce budworm (C. fumiferana) resistance identified gene WS0269_ K02 with high statistical support for its expression upregulation in budworm resistant versus nonresistant white spruces (Mageroy et al., 2015; information drawn from their Table S1). The same WS0269_ K02 gene was found in spruce shoot weevil (P. strobi) resistance (Porth et al., 2018, Figure 2). In pine, a closely related gene family member was identified for pitch canker disease (Fusarium circinatum) resistance (De la Torre et al., 2018). Because these genes' expressions were also correlated with drought resistance (De la Torre et al., 2018) and growth rate (Porth et al., 2018), genetic pleiotropic functioning of conifer PDR genes could be implied. Drought resistance and growth might share a genetic relationship to a certain extent, as trees impaired in drought tolerance and succumbing to drought stress are expected to show decreased growth (Salmon et al., 2019).
It has further been postulated that drought-stressed conifers rely more on constitutive than on induced defenses (Turtola et al., 2003).
Our study found that the size of the PDR family in gymnosperms is smaller compared to angiosperms. This may indicate that gymnosperm species require less PDR transporters than angiosperms to cope with their environment. The identified conifer PDR gene sequences were further mapped to the PDR genes' phylogenetic tree for improved annotations (Figure 3). In the case of the white spruce F I G U R E 2 The white spruce PDR gene family member WS0269_K02 identified as a core gene. Spruce PDR gene (ABC transporter, blue dot) identified as "core gene" (Porth et al., 2018) in the gene regulatory network with growth (yellow dots) or defense phenotypes (against the stem-boring spruce shoot weevil Pissodes strobi; green dots) gene (identified by Mageroy et al., 2015 andPorth et al., 2018), WS0269_K02 mapped to cluster IV, a gymnosperm-specific clade, and it was found to be putatively identical to the P. abies gene Pab_ MA_17319g0010, in enzymes involved in the transport of these metabolites (Yazaki, 2006). Secondly, the differential expression of PDR genes in different tissues or during different developmental stages might have promoted their diversification (subfunctionalization). To fully grasp the evolution of the PDR gene family, more PDR gene sequences from additional species across the plant kingdom are needed to better resolve PDR gene evolution and relationships (this was beyond the scope of the present study).

| NON COD ING AND S MALL RNA S
Noncoding RNAs are a class of RNAs not involved in protein coding, but with very important functions as regulators in plant life cycle, response to the environment, and phenotypic plasticity (Borges & Martienssen, 2015;Shin & Shin, 2016). Noncoding RNAs can be divided into two categories, long noncoding RNAs (>200 nucleotides, nt) and small noncoding RNAs (sRNA) (20-24 nt) (Arikit, Zhai, & Meyers, 2013). Differences in sRNA size distribution can be observed between gymnosperms and angiosperms. The 21-nt sRNAs are dominant in gymnosperms such as P. abies (Nystedt et al., 2013), Pinus contorta , Larix leptolepis (Zhang et al., 2013), and P. tabuliformis (Niu et al., 2015), whereas 24-nt sRNAs represent the majority in angiosperms Morin et al., 2008). For a long time, 24-nt sRNAs were thought to be absent from gymnosperms, and now, we know they occur at low frequencies and are mainly restricted to reproductive tissues (Niu et al., 2015;Nystedt et al., 2013;Zhang et al., 2013). Therefore, the presence of 24-nt sRNAs may be important in the regulation of reproduction in gymnosperms (Niu et al., 2015). Because 21-nt sRNAs are involved in target gene silencing or protein translation inhibition and 24-nt sRNAs are functional on chromatin remodeling (Borges & Martienssen, 2015), it seems that sRNAs may play different regulatory roles in gymnosperm and angiosperm development, respectively.  Table S1. Node support from 1,000 replicates is indicated for the basal nodes defining the nine putative PDR sequence clusters. For further details, see Appendix S2 DCL1 might be the reason why 21-nt sRNAs are dominant in gymnosperms, although the relationship between conifer-specific 21-nt sRNA and short DCL1 is unclear (Gonzalez-Ibeas et al., 2016). A conifer-specific set of DCL1 proteins was found in P. glauca, P. abies, and P. lambertiana Gonzalez-Ibeas et al., 2016). DCL3, which is involved in 24-nt sRNAs biogenesis, was characterized through P. lambertiana transcripts, primarily expressed in reproductive tissues (Gonzalez-Ibeas et al., 2016). Truncated DCL3 was also discovered in P. glauca, and its expression level upregulated in seed development indicated that the DCL3 variant and its expression level are responsible for 24-nt sRNA generation in P. glauca . The discovery of variant DCL partly explained the different sRNA size distribution between gymnosperms and angiosperm, although further confirmation is needed. The 24-nt sRNAs direct DNA methylation and affect histone modification which are related to chromatin condensation and silencing of transposable elements (Leitch & Leitch, 2012). The different silencing mechanisms were correlated with differences in genome sizes of angiosperms and gymnosperms Leitch & Leitch, 2012).

| APPLI C ATI ON S OF THE S TUDY OF GYMNOS PERM G ENE FAMILIE S
Plant defense molecules are highly complex traits with nutritional value, flavor, and use in traditional medicine (Hamberger & Bak, 2013). Genes encoding natural product pathways often group together in biosynthetic gene clusters (Nützmann, Huang, & Osbourn, 2016). Some of the genes reviewed in this study are newly studied members of gene families that hold great potential for biotechnological applications related to commercial and pharmacological value.
Environmental and developmental factors affect the terpenoid pathway flux; understanding the complexity of the terpenoid pathway network in plants and its regulation remains a major challenge in terpenoid research but will facilitate future molecular breeding of agronomically useful traits (Vranová et al., 2012).
Some members of conifer gene families (such as the PDR gene family) can also be exploited for their potential to improve conifer tree growth on marginal or disturbed soils, thus providing an improved detoxification potential to employ conifers (i.e., spruces) in phytoremediation applications. In addition, functional characterization of PDR genes is required before biotechnology applications can be performed on the PDR gene family, particularly for long-lived trees (Lefevre, Baijot, & Boutry, 2015). Because PDRs have been shown to act in a variety of plant organs, above ground (foliage and reproductive structures) and below ground (in roots; Crouzet et al., 2006), one of the most intriguing applications besides phytoremediation is the PDR's potential in conferring improved resistance to biotic stressors (De la Torre et al., 2018;Mageroy et al., 2015;Porth et al., 2018). Also, a better knowledge of the genes and gene families conferring phenotypic variation is the first step to create plantations with improved varieties through marker-assisted breeding, genomic selection, or genetic modifications (CRISPR). For species with ecological importance, the identification of genes families involved in abiotic and biotic stress may contribute to identify species that are candidates to ecological restoration, or that may present increased potential to adapt to specific or changing climatic conditions.

| CON CLUS IONS
In this paper, we aim to understand how genes and gene families

S U PP O RTI N G I N FO R M ATI O N
Additional supporting information may be found online in the Supporting Information section at the end of the article.