Douglas‐fir LEAFY COTYLEDON1 (PmLEC1) is an active transcription factor during zygotic and somatic embryogenesis

Abstract Douglas‐fir (Pseudotsuga menziesii) is one of the world's premier lumber species and somatic embryogenesis (SE) is the most promising method for rapid propagation of superior tree genotypes. The development and optimization of SE protocols in conifers is hindered by a lack of knowledge of the molecular basis of embryogenesis and limited sequence data. In Arabidopsis, the LEAFY COTYLEDON1 (AtLEC1) gene is a master regulator of embryogenesis that induces SE when expressed ectopically. We isolated the LEC1 homologue from Douglas‐fir, designated as PmLEC1. PmLEC1 expression in somatic embryos and developing seeds demonstrated a unique, alternating pattern of expression with the highest levels during early stages of embryogenesis. PmLEC1 protein accumulation during seed development correlated with its transcriptional levels during early embryogenesis; however, substantial protein levels persisted until 2 weeks on germination medium. Treatment of mature, stratified seeds with 2,4‐epibrassinolide, sorbitol, mannitol, or NaCl upregulated PmLEC1 expression, which may provide strategies to induce SE from mature tissues. Sequence analysis of the PmLEC1 gene revealed a 5′ UTR intron containing binding sites for transcription factors (TFs), such as ABI3, LEC2, FUS3, and AGL15, which are critical regulators of embryogenesis in angiosperms. Regulatory elements for these and other seed‐specific TFs and biotic and abiotic signals were identified within the PmLEC1 locus. Most importantly, functional analysis of PmLEC1 showed that it rescued the Arabidopsis lec1‐1 null mutant and, in the T2 generation, led to the development of embryo‐like structures, indicating a key role of PmLEC1 in the regulation of embryogenesis.


Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) is an economically
valuable conifer that is native to western North America and cultivated throughout Europe, New Zealand, and Australia for its superior growth performance, desirable wood qualities, and ability to withstand the abiotic stresses of a changing climate (Spiecker, 2019).
Douglas-fir is in high demand, but reforestation and afforestation initiatives are severely strained by its long reproductive cycle (17 months), unreliable seed production, and germination efficiencies as low as 40% (Allen & Owens, 1972).
Dedifferentiated calli proliferate as proembryogenic masses (PEMs), and the embryogenic sequence is recapitulated via subculturing in stage-specific combinations of PGRs and media to produce singulated, precotyledonary, cotyledonary, and mature embryos (Gupta et al., 1991). Singulation is a process unique to Douglas-fir SE, whereby the callus separates into individual somatic embryos. SE ensures consistent and unlimited production of quality embryos irrespective of the time of year and shortens embryo production from 17 months to 4-6 months (Gupta et al., 1991). Forests composed of superior genotypes will yield genetic gains much faster than those that could occur by natural selection (Pullman et al., 2003). However, the molecular regulatory mechanisms that induce SE in conifers remain unknown (Gautier et al., 2019;Pullman et al., 2015). Many conifer genotypes are recalcitrant to SE induction (Pullman et al., 2005). More than 50% of Douglas-fir SE cultures discontinue growth after 6 months, and significant losses occur during the progression from culture initiation to somatic seedling establishment (Gupta et al., 1991). Moreover, conifer SE can be induced only from juvenile tissues (e.g., megagametophytes, zygotic embryos) that are not readily available throughout the year or when the superiority of a tree is proven.
The paucity of knowledge about the genes and molecular events responsible for embryogenesis in conifers, and the lack of molecular markers of embryogenic potential, have permitted only trial-anderror tests of medium components, PGRs and stress treatments for improving SE. Researchers urgently need gene expression profiles to understand natural embryo development (Pullman et al., 2015), and the characterization of the embryogenic state at the molecular level, before the full potential of SE can be realized (Gautier et al., 2019).
Research efforts are hampered by limited sequence information due to large conifer genomes , lengthy life cycles, the absence of a reliable transformation system to study reverse genetics, and small embryos being buried in maternal tissues (Cairney & Pullman, 2007). Progress in conifer embryogenesis depends on angiosperm model plants combined with rigorous investigation and inference. Angiosperm SE is easily manipulated, and hundreds of genes that function during angiosperm embryogenesis have been characterized. EST studies suggest that 70% of embryo-specific genes are shared with conifers (Cairney & Pullman, 2007).
In angiosperms, the most prominent transcription factor (TF) with a high-order regulatory role in embryogenesis is Arabidopsis LEAFY COTYLEDON1 (AtLEC1). AtLEC1 is a subunit of a CCAAT box-binding TF and is also named AtHAP3 or AtNF-YB9; the NF-YB family comprises 13 members. AtLEC1 acts during both early and late embryogenesis and is a central regulator of embryo and endosperm development in Arabidopsis (Lotan et al., 1998). AtLEC1 promotes epigenetic reprogramming and controls morphogenesis, photosynthesis, and storage compound accumulation by regulating different sets of genes at specific stages of embryogenesis (Jo et al., 2019).
AtLEC1 rescues the Arabidopsis lec1-1 null mutant, and its ectopic expression induces embryonic programs and leads to spontaneous formation of embryo-like structures from vegetative tissues (Lotan et al., 1998).
We hypothesized that Douglas-fir has its own homologue of the AtLEC1 gene that plays a critical role in conifer embryogenesis and might be used to induce SE. Hence, the aim of our study was to identify, isolate, and characterize the LEC1 gene from Douglasfir and evaluate its potential as a candidate gene for applications in SE. Phylogenetic analysis of the putative amino acid sequence deduced from PmLEC1 cDNA revealed that PmLEC1 grouped with the AtLEC1 clade. PmLEC1 gene expression during seed development demonstrated a unique, alternating pattern, with the highest levels occurring during early embryogenesis. Conversely, PmLEC1 protein accumulation persisted at substantial levels until the germination stage. PmLEC1 expression was upregulated in mature, stratified seeds by treatment with 2,4-epibrassinolide, sorbitol, mannitol, or NaCl. Acquisition of the flanking upstream region and sequence analysis of the PmLEC1 locus revealed a 5′ UTR intron-containing binding sites for TFs known to interact with LEC1 and critical to angiosperm embryogenesis. Thus, homologous TFs are expected to participate in conifer embryogenesis. Functional analysis of PmLEC1 via ectopic expression in the Arabidopsis lec1-1 null mutant showed that PmLEC1 complemented the mutation, generated embryo-like structures in T2 seedlings, and induced embryonic programs in vegetative tissues. These findings expand our understanding of the molecular biology of conifer embryogenesis and may lead to improved SE protocols for Douglas-fir and other gymnosperms.
Developing seeds, vegetative buds, and pollen cones were collected from open-pollinated coastal Douglas-fir trees growing in Saanich, BC, Canada, during May-August, immediately frozen in liquid nitrogen and stored at −80℃.
Douglas-fir seeds for imbibition, stratification, and germination were obtained from an open-pollinated seed orchard (Sorrento Nurseries Ltd., Sorrento, BC, Canada). The seeds were imbibed in distilled water with slow agitation for 24 hr at 4℃, then stratified at 4℃ for 3 weeks between layers of Whatman Grade 1 filter paper placed over water-soaked sponges. To expose stratified seeds to germination conditions, the seeds were placed over Kimpack absorbent cellulose wadding (LPS Industries) cut to the size of 9-cm Petri dishes, soaked with water and covered with two layers of filter paper.
The seeds were placed on top of the filter paper. The petri dishes were sealed with Parafilm (Beemis, Neenah, WI, USA) and placed in a growth chamber under a 16-hr light photoperiod (16 μmol/m 2 s −1 ) at 24℃. Germinating seeds were collected after 2, 6, 10, 12, 14, 45, and 90 DAEG, frozen in liquid nitrogen and stored at −80℃ until further analysis.

| Primers
PCR primers used in this work were designed by MV and synthesized at Invitrogen and are listed in Table S2.

| RNA isolation
Total RNA was isolated from somatic embryos, developing seeds, imbibed seeds, and stratified seeds by the TRIzol method modified for plants (Invitrogen). Isolation of total RNA from seeds exposed to germination conditions, seedlings, vegetative buds, and pollen cones was performed using the modified hot-phenol extraction (Verwoerd et al., 1987).

| Isolation and cloning of the conserved LEC1 sequence
Total RNA from somatic embryos at the maintenance and singulated stages, and immature zygotic seeds, was used in separate RT-PCR reactions. Total RNA was treated with Invitrogen Amplification Grade DNase I, and the absence of DNA contamination was confirmed by conventional PCR. First-strand cDNA synthesis was performed with 5 μg total RNA, 1 μl oligo(dT) 12 VN (V = A or C or G, N = A or C or G or T), and SuperScript II RNase Hreverse transcriptase (Invitrogen).
Degenerate PCR primers based on the conserved sequence of Arabidopsis thaliana ecotype WS AtLEC1, and an EST sequence of Pinus taeda, were designed to amplify a homologous Douglas-fir sequence from PEM-derived cDNA. PCR was performed using the QIAGEN Taq PCR Master Mix (Qiagen, Mission, ON, Canada) with the following thermocycle program: 5 min at 94℃, 40 cycles of denaturation (94℃ for 30 s), annealing (60℃ for 30 s), and elongation (72℃ for 1 min), followed by 10 min extension at 72℃. The products were resolved by agarose gel electrophoresis and visualized after ethidium bromide (EtBr) staining.
The single, amplified PCR product (~200 bp) was extracted from the gel using the QIAquick Gel Extraction Kit (Qiagen), ligated into the pCR2.1 TOPO vector, and transformed into E. coli using the TOPO TA cloning kit (Invitrogen). Plasmid DNA was purified from transformed colonies using QIAprep Spin Miniprep kit (Qiagen), and the inserted DNA was sequenced at the University of Victoria DNA Sequencing Centre. Database searches for similar sequences were performed using BLAST.

| Northern blot analysis
RNA (20 μg) was resolved by denaturing formaldehyde gel electrophoresis and checked for equal loading and integrity by visualizing after ethidium bromide staining. The RNA was transferred to a Biodyne B nylon membrane and hybridized with a 32 P-labeled PmLEC1 cDNA probe.

| RACE-PCR for obtaining full-length cDNA
To isolate sequences upstream and downstream of the conserved domain, RACE-PCR was performed with 1 μg total RNA isolated from ~200 mg singulated somatic embryo masses harvested from liquid medium. The SMART RACE cDNA Amplification Kit (BD Biosciences) was used according to the manufacturer's instructions.
The 5′-and 3′-RACE-PCR products were resolved in 1% agarose gels, purified, cloned, and sequenced, as described above. A new set of primers was designed based on the start and stop codons of individual RACE-PCR sequences. RT-PCR was performed with RNA isolated from zygotic developing seeds. The product was resolved and visualized on an agarose gel, extracted from the gel, cloned, and sequenced, as described above.

| Phylogenetic analysis
The full-length PmLEC1 sequence was queried against the TBLASTX database (Altschul et al., 1990). Full-length LEC1 or LEC1-like putative protein sequences were aligned with MUSCLE (http://www. ebi.ac.uk/Tools). Phylogenetic tree construction was performed by importing the MUSCLE alignment into MEGA X (Kumar et al., 2018) and applying the neighbor joining method and bootstrapping with 1,000 replicates.

| Real-time quantitative reverse transcription PCR (qRT-PCR) analyses
First-strand cDNA synthesis was performed with 1 μg DNase I-treated RNA, 1 μl oligo(dT) 12 VN primer and Invitrogen SuperScript II RNase H -Reverse Transcriptase in 20μl reactions of five biological replicates. The cDNA was diluted 20-fold, and 2 μl of the dilution was used in 14μl reactions with Invitrogen Platinum SYBR Green qPCR SuperMix-UDG, performed in quadruplicate (4 technical replicates), in a Stratagene Mx4000 real-time thermocycler (Stratagene). Amplification program: 9-min enzyme activation at 94℃, and 40 cycles of denaturation (94℃ for 15 s), annealing (60℃ for 30 s), and elongation (72℃ for 45 s). Reaction specificity was confirmed with melting curve analysis. Data were normalized to the expression of the invariant ribosomal protein L8 gene, and relative expression was calculated using the 2 −ΔΔCt method. Statistical analyses were performed with SPSS (SPSS Version 12.0) using Kruskal-Wallis and two-tailed Mann-Whitney U tests for nonparametric data.

| Treatment of stratified seeds with stress factors and plant growth regulators; qRT-PCR
Stratified seeds were placed in petri dishes on filter paper soaked with each treatment compound and placed in the dark at 24℃ for 24 hr. The treatment solutions were 0.7 M sorbitol, 0.7 M mannitol, 0.7 M sucrose, 0.3 M NaCl, 0.6 mM CdCl 2 , 23.75 mM PEG 8000, 10 μM 2,4-epibrassinolide, 50 μM 2,4-D in combination with 20 μM BAP, and 7.2 μM GA 3 in combination with 38 μM ABA. For the hypotonic treatment, seeds were fully immersed in water. Watermoistened filter paper served as the control treatment. Following treatment, seeds were excised from the seed coats, immediately frozen in liquid nitrogen and stored at −80℃ until RNA isolation.
Each treatment group consisted of three seeds (three biological replicates). RNA was isolated separately from each seed and used in quadruplicate qRT-PCR reactions (four technical replicates) with the ribosomal protein L8 gene as the normalizer. qRT-PCR and statistical analyses were performed as described above.

| Antibody production
A peptide corresponding to the first 18 amino acids of the putative PmLEC1 protein, with an additional cysteine residue at the Cterminus (MMSEVGSPTSQDSRNSEDC) and coupled to the KLH carrier protein was synthesized by GenScript Corporation. Antibody production was performed at Immuno-Precise Antibodies, Ltd. Four Balb/C mice were each immunized with 25 μg of the KLH-coupled peptide and mixed with Freund's complete adjuvant. Six additional immune boosts of 25 μg peptide-KLH in Freund's incomplete adjuvant followed at 3-week intervals. Dilutions of the polyclonal mouse antiserum were tested by ELISA against the free peptide and protein extracts from Douglas-fir developing seed. The polyclonal antiserum from two mice showed a significant response against the peptide when used at a dilution of 1:1,000. Blood was drawn from these two mice on four dates over a 2-month period. The antisera were obtained by centrifugation and combined for a total of 6 ml of polyclonal antiserum used in western blotting.

| Western blot analysis
Total proteins were extracted from frozen and ground developing seeds, imbibed seeds, stratified seeds, seeds exposed to germination conditions, vegetative buds and pollen cones, and suspended in extraction buffer (65 mM Tris (pH 6.8), 1% SDS, 5% glycerol and 2.5% EtSH) at 1 mg/3 μl. Protein concentrations were determined by the Bradford assay (Bradford, 1976

| Isolation of the genomic sequence upstream of the coding region
Total DNA was isolated from somatic embryos using the Sigma GenElute Plant Genomic DNA kit (Sigma) and from Douglas-fir needles using the modified cetyltrimethylammonium bromide method (De Verno et al., 1989).
The Genome Walker Universal Kit (Clontech) was used with Douglas-fir genomic DNA to create four genomic libraries corresponding to digestion by restriction enzymes Dra I, Eco RV, Pvu II, and Stu I. PCR reactions were performed using QIAgen Taq PCR Master Mix (Qiagen). Only the Dra I and Pvu II libraries yielded specific, major products in the secondary PCR reaction. These two DNA products were purified from agarose gels, cloned, and sequenced as described above. The sequences were identical where they overlapped, but the sequence from the Dra I library was longer.

| Sequence analysis for cis-and trans-acting regulatory elements
The Dra I sequence, GenBank accession number FJ418169, provided the longest nucleotide sequence upstream of the coding region.

| Arabidopsis plant material and generation of transgenic plants
Wild-type and heterozygous LEC1/lec1 seeds of A. thaliana (L.) Heynh ecotype Wassilewskija, stock numbers CS2360 and CS8101, respectively, were obtained from the Arabidopsis Biological Resource Center. Homozygous lec1-1 plants were generated by planting the heterozygous seeds and growing them in a greenhouse under standard conditions. Plants self-pollinated to produce homozygous progeny. Immature seeds were removed from green siliques and cultured on half-strength MS medium (Murashige & Skoog, 1962). Only homozygous lec1-1 seeds germinate under these conditions. The phenotype was confirmed by PCR analysis using AtLEC1 gene-specific primers.
Two expression cassettes were constructed, one containing the PmLEC1 gene and another containing the AtLEC1 gene. The PmLEC1 and AtLEC1 coding sequences were directionally inserted into the corresponding restriction sites of the pBI221 vector (Jefferson et al., 1987) between the cauliflower mosaic virus duplicated enhancer and the alfalfa mosaic virus untranslated leader sequence (35S-35S-AMV) promoter (Datla et al., 1993), and the NOS terminator. Each vector contained the NPTII kanamycin-resistance gene for plant selection. The vectors were transferred into the Agrobacterium tumefaciens strain M90 as described by Datla et al. (1993). The transformed colonies were selected on antibiotic-containing medium, and the presence of the insert was confirmed by sequencing.
Arabidopsis lec1-1 null mutant and wild-type plants were transformed using the floral dip method (Clough & Bent, 1998). Flowering plants (T0 generation) were dipped into Agrobacterium suspension and produced the first generation of transgenic seeds (T1).
Transgenic T1 plants were selected on kanamycin-containing MS medium, and transgene integration was confirmed by PCR. T1 plants were grown in a greenhouse and self-pollinated to produce T2 seeds.
T2 plants were selected on kanamycin-MS medium, genotyped, and used in gene expression studies. The 5′ UTR contained a stop codon but no start codons. The absence of start codons within the 5′ UTR correlates with strong translational efficiency (Rogozin et al., 2001).

| Accession numbers
In Arabidopsis, two of the NF-YB genes, LEC1 and LEC1-LIKE (L1L or NF-YB6), are active during embryogenesis and share significant sequence identity within the central, conserved B domain, including the specific residues that are necessary for activity in embryogenesis . L1L is expressed in all vegetative organs, and its expression peaks later in embryogenesis (Kwong et al., 2003). The putative PmLEC1 amino acid sequence showed 55% identity and 71% similarity with AtLEC1, and 53% identity and 70% similarity with AtL1L. Phylogenetic analysis found PmLEC1 to be more closely related to AtLEC1 than to AtL1L, and the conifer sequences grouped B-domain Asp-55 residue required for LEC1 activity in embryogenesis  was found in PmLEC1 and all aligned sequences ( Figure 2). In the C domain, Gln, Asp, and Glu residues are critical for protein-protein interactions and have roles in development (Li et al. 1992). Comparison of the C domain of PmLEC1 with those of AtLEC1 and AtL1L revealed that PmLEC1 shares 41% identity with AtLEC1 but only 19% with AtL1L. MUSCLE (Edgar, 2004) alignment of the LEC1 clade revealed six Gln, Asp, or Glu residues that are conserved between conifers and Arabidopsis (α), and three that are unique to conifers (β) (Figure 2). In the Arabidopsis C-domain, there were three segments of one, two, and three residues that were absent in conifers (γ) (Figure 2).

| PmLEC1 expression alternates between high and low levels during zygotic embryogenesis
To accurately quantify the expression levels of the PmLEC1 gene, qRT-PCR analysis was performed using RNA isolated from tissues representing each stage of zygotic embryogenesis (ZE), mature seeds, seeds exposed to germination conditions, vegetative buds, and pollen cones (Figures 3 and 4). The highest levels of expression occurred during seed development, but trace amounts were observed in juvenile tissues, such as seeds exposed to germination PmLEC1 transcripts were detected in 1.5-month-old seedlings, the F I G U R E 1 PmLEC1 is more closely related to AtLEC1 than to AtL1L. Evolutionary relationships among LEC1-type sequences deduced from crop and tree species were determined via alignment in the MUSCLE program and the neighbour-joining tree method in MEGA X. AtLEC1 (NF-YB9) and AtL1L (NF-YB6) each have two sequences available in GenBank. The A. thaliana Wassilewskija ecotype has a LEC1 sequence of 208 amino acids while the Colombia ecotype has a 238 amino acid LEC1 sequence. The two published sequences of AtL1L comprise 234 and 205 amino acids. Bootstrap values were obtained from 1,000 replications. The conifer sequences grouped within the LEC1 clade F I G U R E 2 MUSCLE alignment of deduced LEC1 sequences highlights the conservation of the B domain, residues important for proteinprotein interactions in the C domain and potential post-translational modification sites. α: Gln, Asp and Glu residues conserved between conifers and Arabidopsis. β: Gln, Asp and Glu residues present in conifers but not Arabidopsis. γ: Gln, Asp and Glu residues present in Arabidopsis but not conifers. G: N-glycosylation site. M: N-myristoylation site. P: phosphorylation site youngest pollen cones and vegetative buds. No PmLEC1 expression was observed in 3-month-old seedlings, some in 1.5-month-old seedlings, pollen cones >2 mm in diameter or vegetative buds  Figure S1).

F I G U R E 4
qRT-PCR analysis of PmLEC1 expression in Douglas-fir embryonic and vegetative tissues. Total RNA was isolated from 5 biological replicates at each developmental stage. DNase I treatment and reverse transcription were carried out with 5 samples of 1 μg from each stage. The qRT-PCR reactions were performed in quadruplicate with Douglas-fir ribosomal L8 transcripts as internal controls. Relative gene expression was calculated according to the 2 -ΔΔCt method (Dorak, 2006) by comparison to transcript levels in pollen cones, which showed the lowest expression. Statistical analysis was performed on the pooled groups by the Mann-Whitney U test. Results are shown as means ± SE (n = 5). Asterisks indicate PmLEC1 transcript levels that are significantly different (p < .05) from pollen cones. DAEG: days after exposure to germination conditions

| PmLEC1 protein persists at substantial levels until germination
To define the temporal PmLEC1 protein distribution and determine the relationship between transcript and protein levels, we performed western blot analysis on protein extracts from all stages of ZE; imbibed, stratified, and germinating seeds; 1.5-and 3-month-old seedlings; vegetative buds; and pollen cones. During early embryo development and late germination, the antiserum reacted with a 59-kD molecular species (Figure 7, Figure S1). A transition to 36 kD was apparent on July 17 (Figure 7, Figure S1), with three species being recognized. The 36-kD MW form was recognized starting with late F I G U R E 5 Northern blot analysis of PmLEC1 gene expression demonstrates an alternating pattern of induction and repression during both ZE and SE. Total RNA samples (20 μg each) were resolved by denaturing formaldehyde-agarose gel electrophoresis, blotted onto nylon membranes and hybridized with 32 P-labeled PmLEC1 cDNA. A single band was visible at approximately 1.2 kb. Labels above the blots indicate the stage of development. Labels below the blots represent dates of seed collection during ZE, and the media and timing for generating somatic embryos during SE. PEM: proembryogenic masses. BM4: culture media in which somatic embryos reach the indicated stage after the specified number of days F I G U R E 6 Evaluation of the polyclonal anti-PmLEC1 antiserum for specificity and cross-reactivity. Transgenic lec1-1 AtLEC1 and WT PmLEC1 plants were generated via the floral dip method and grown to T2 generation. Total protein extracts (20 μg each) from leaf and stem tissues of A. thaliana transgenic and wild type plants, as well as from developing seeds harvested from Douglas-fir trees at the indicated dates, were resolved by SDS-PAGE, transferred to PVDF membranes and detected with anti-PmLEC1 antiserum. The arrows indicate PmLEC1-specific bands observed in lanes of WT PmLEC1 plants at ~34 kD, and in Douglas-fir developing seeds at ~36 kD and ~59 kD. WT: Arabidopsis wild type plant. lec1-1: Arabidopsis lec1-1 null mutant. lec1-1 AtLEC1 : Arabidopsis lec1-1 mutant transformed with AtLEC1. WT PmLEC1 : Arabidopsis lec1-1 mutant transformed with PmLEC1 embryogenesis and until 10 days after exposure to germination conditions (DAEG); there were no reactive bands in the stratified seed samples (Figure 7). At 12 and 14 DAEG, the 59 kD species was again detected. No protein accumulation was observed in 1.5-and 3-month-old seedlings, vegetative buds, or pollen cones (Figure 7).
In conclusion, the PmLEC1 protein appears to be abundant during all phases of embryogenesis (morphogenesis, maturation, desiccation, and germination), with a possible change in MW during the time of seed dormancy (when storage protein accumulation and desiccation take place).
Accumulation of PmLEC1 protein correlated with mRNA levels during early embryogenesis: high, low, and high levels on May 24, June 7, and June 20, respectively, of both protein (Figure 7) and RNA ( Figure 5). During late embryogenesis, from July 31 until seed imbibition, protein accumulation remained at similarly high levels (Figure 7).
No protein was detected in stratified seeds. Stratification is a process whereby cold and moist conditions encourage seed germination when the ambient temperature changes from 4°C to 24°C. Imbibed seeds, stratified seeds, and seeds exposed to germination conditions had very low levels of PmLEC1 transcripts (Figures 4 and 5), whereas protein levels at these stages oscillated from high to low and also changed in apparent MW (Figure 7). Evidence exists for intron-mediated enhancement (IME) affecting translation more profoundly than transcription (Laxa, 2017). Protein levels also appeared to cycle between high and low levels from 2 to 14 DAEG and shifted to the 59 kD form at 12 DAEG (Figure 7). In contrast, beginning on July 11, the RNA levels exhibited another cycle of repression, induction, and repression and then gradually declined to undetectable levels by August 30 (Figure 4).
The protein-RNA discordance at the later stages of seed development may be explained by the need to stabilize PmLEC1 during desiccation and seed dormancy to allow it to perform its functions.

| Sorbitol, mannitol, NaCl, and brassinosteroid treatments induce PmLEC1 expression in stratified seeds
In conifers, embryogenic competence is restricted to specific tissues and genotypes, and SE is most easily induced from immature or cotyledonary zygotic embryos, both of which exhibit high PmLEC1 expression (Figures 4 and 5). Tissues of increasing maturity, such as harvested seeds or vegetative tissues, are progressively more recalcitrant to SE, which is a major obstacle to the commercialization of SE technology. In plants, SE induction is facilitated by 2,4-D/BA, gibberellic acid/abscisic acid (GA 3 /ABA), brassinosteroids, plasmolysing stressors (sorbitol, mannitol, or sucrose) and other stressors such as heavy metals or salt.
To assess whether PmLEC1 expression may be upregulated in recalcitrant tissues, stratified seeds were exposed to stress and PGR treatments, followed by qRT-PCR analysis. Each treatment group consisted of three individual seeds held at room temperature for 24 hr on filter paper soaked with specified compounds.  (Figure 8a).

F I G U R E 7
Western blot analysis of PmLEC1 protein during embryonic and vegetative growth reveals a ~59 kD form during early embryogenesis and late germination, and a ~36 kD form during late embryogenesis and early germination. Total protein was extracted from developing seeds harvested from Douglas-fir trees at the indicated dates and ZE stages, mature imbibed seeds, mature stratified seeds, mature seeds incubated on germination medium for 2, 6, 10, 12, 14, 45 and 90 days, as well as from vegetative buds and pollen cones. The protein extracts (20 μg each) were resolved by SDS-PAGE, transferred to a PVDF membrane and probed with PmLEC1-specific polyclonal antibodies. DAEG: days after exposure to germination conditions To verify whether these treatments were perceived as stresses, we measured the transcript levels of the heat shock protein, PmHSP18.2A, which shows specific responses to PGRs and stress after four days (Kaukinen et al., 1996). Treatment with 2,4-D/BA up- Total RNA was isolated from 3 seeds of each treatment group, treated with DNase I and used for reverse transcription (1 mg RNA each). The qRT-PCR reactions were carried out in quadruplicate with Douglas-fir ribosomal L8 transcripts as internal controls. Relative gene expression was calculated according to the 2 -ΔΔCt method (Dorak, 2006) by comparison to transcript levels in mature stratified seeds. Statistical analysis was performed on the pooled groups (n = 3) by the Mann-Whitney U test. Results are shown as means ± SE (n = 3). Asterisks indicate transcript levels that are significantly different (p < .05) from stratified seeds F I G U R E 9 Complete PmLEC1 nucleotide sequence, 5′ UTR intron and deduced protein sequence with putative sites for post-translational modifications. The nucleotide sequence upstream of the coding region was obtained by genome walking. The green line corresponds to the complementary DNA sequence and represents GSP1, employed in the primary PCR reaction of genome walking, while the blue line corresponds to the complementary DNA sequence and represents GSP2 employed in the secondary PCR reaction. The 5′ UTR and 3′ UTR sequences are displayed in bold. The 5′ UTR is split into two sections by the 5′ UTR intron. The potential intron splice sites are in green. The deduced protein sequence is shown in red and numbered on the right. Putative sites of post-translational modifications are underlined amino acid residues: darkgreen, N-myristoylation; light-blue, phosphorylation; violet, N-glycosylation F I G U R E 1 0 The Douglas-fir PmLEC1 gene map with regulatory elements and corresponding transcription factors. The nucleotide sequence of the sense strand is in black and numbered on the right in black. +1, transcriptional start site. The 5′ and 3′ UTRs are represented by bold letters. Intron splice site sequences are shown in green. Regulatory elements (sequences in blue, elements on coding strand; sequences in black, elements on template strand) and the corresponding names of binding factors identified via PlantPAN3.0. The putative protein sequence is shown in pink and numbered on the right in pink 3.6 | The regulatory landscape of PmLEC1 contains a 5′ UTR intron, regulatory elements for physiological,

biotic, and abiotic signals and seed-specific transcription factors
To gain insight into the transcriptional regulation of PmLEC1, we isolated the genomic DNA sequence 1,400 bp upstream of the coding sequence. Alignment of the genomic sequence with the PmLEC1 cDNA sequence revealed the presence of a 5′ UTR intron (Figure 9).
An intron in the initial cDNA sequence was also indicated by an in-frame stop codon within the 5′ UTR, signifying a splicing event (Soccio et al., 2002).
Because 5′ UTR introns demonstrate promoter-like activity (Kim et al., 2006), we queried the PmLEC1 locus in PlantPAN3.0 (Chow et al., 2019), a resource for identifying regulatory elements in plant promoters. Putative TATA boxes were identified at −28 bp and +998 bp (Figure 10). Over 700 cis-regulatory elements, with high similarity scores, were identified within the PmLEC1 genomic sequence. The full list of regulatory elements, their positions and similarity scores, their associated TFs, source organisms, and the TF gene ontology biological processes are presented in Table S1.
Approximately 50 cis-regulatory elements or TFs with wellcharacterized functions in angiosperm embryogenesis were identified (Figure 10), suggesting potential culture medium manipulations that may facilitate conifer SE and identifying TFs that regulate PmLEC1 transcription. The gene ontology biological processes represented by these regulatory elements and associated TFs include responses to salt stress, water stress, osmotic stress, metal ions, sugars, PGRs, defense responses to microbes and pests, chromatin remodeling, nucleosome assembly, and intron splicing (Table 1,   Table S1). Photoperiodism, responses to specific wavelengths of light, circadian rhythm, and photomorphogenesis are also prominent. Several regulatory elements for TFs that have been well characterized in angiosperms and participate in SE, embryo development and seed maturation include AGL15, ABI3, LEC2, FUS3, WUS, VP1, and MYB118 (Table 1, Figure 10). Astonishingly, the cis-elements for ABI3, LEC2, and FUS3 are collocated on the same nucleotide segment in four instances in the PmLEC1 gene ( Figure 10).

| Functional analysis of PmLEC1 via ectopic expression in the Arabidopsis lec1-1 null mutant confirms somatic embryogenesis induction by PmLEC1
To elucidate the function of PmLEC1, we transformed the A. thaliana lec1-1 null mutant with a DNA construct, designed to express the PmLEC1 gene under the control of a strong constitutive promoter.
The lec1-1 null mutant is embryo-lethal due to the deletion of the AtLEC1 gene, which renders lec1-1 seeds intolerant to desiccation and nonviable (Lotan et al., 1998).
The PmLEC1 transgene complemented the lec1-1 mutation and produced viable, desiccated T1 seeds from transformed lec1-1 mutants. Transgenic T1 seeds that produced viable and fertile plants were self-pollinated to produce T2 seeds, which were germinated on kanamycin-containing medium. We performed molecular and morphometric analyses of this T2 generation, T2 lec1-1 PmLEC1 . PCR analysis confirmed stable integration of the PmLEC1 transgene into all selected lec1-1 PmLEC1 lines. The T2 lec1-1 PmLEC1 lines displayed various phenotypes, ranging from normal morphological characteristics of wild-type A. thaliana to seedlings with recurrent, embryo-like structures on vegetative tissues (Figures 11 and 12). These embryolike lec1-1 PmLEC1 seedling phenotypes were similar to those observed by Lotan et al. (1998)

| D ISCUSS I ON
The LEC1 homologue from Douglas-fir, PmLEC1, was isolated and characterized to gain insight into conifer embryogenesis.
Phylogenetic analysis revealed that PmLEC1 was most closely related to AtLEC1, a central regulator of embryogenesis in Arabidopsis.
Similarly to AtLEC1, ectopic expression of PmLEC1 in the Arabidopsis lec1-1 null mutant rescued the mutant, led to the appearance of embryo-like structures and embryo-like seedlings in the T2 generation, and induced embryonic programs in mature T2 tissues. Unique features of PmLEC1 include a 5′ UTR intron, an alternating pattern of gene expression and protein levels that persist until germination stages. Our results identify PmLEC1 as an embryo-regulatory TF whose expression in mature tissues is upregulated by brassinosteroids, sorbitol, mannitol, and NaCl. The isolation and characterization of PmLEC1 is a crucial first step in not only understanding the molecular basis of conifer embryogenesis but also overcoming recalcitrance to SE induction. Homologs of specific TFs critical to angiosperm embryogenesis deduced from the cis-regulatory elements in the 5′ UTR intron and are likely to be involved in conifer embryogenesis. Their identification and characterization may unlock the enigmatic process of conifer embryogenesis that has eluded researchers for the last 50 years. The cis-regulatory elements also provide pertinent insights into culture medium components and TFs that may induce SE or promote robust embryo production.

| PmLEC1 as a master regulator of embryogenesis in Douglas-fir
Functional analysis of PmLEC1 via ectopic expression in the Arabidopsis lec1-1 mutant confirmed that PmLEC1 complemented TA B L E 1 Regulatory elements of the PmLEC1 locus and associated transcription factors most relevant to embryo development and somatic embryogenesis (SE) Expression of the maturation program in seed development; potentiation of seed-specific hormone response.

MERISTEM L1
A. thaliana Cotyledon development; plant epidermal cell differentiation; seed germination; cell specification and pattern formation during embryogenesis.

MYB-related CCA1
A. thaliana Circadian rhythm; long-day photoperiodism, flowering; response to ABA, auxin, cadmium ion, cold, ethylene, jasmonic acid, GA, organonitrogen compound, salicylic acid, salt stress.   (Stasolla et al., 2004;van Zyl et al., 2003) that closely resembles the alternating pattern of PmLEC1 gene expression we observed (Figures 4 and 5). The relationship between PmLEC1 activity and embryogenic transcript levels will provide insight into its regulatory capacity. The functional differences between PmLEC1 and AtLEC1 remain to be discovered, yet the use of PmLEC1 is a promising strategy for SE induction.

| PmLEC1 expression as a molecular marker of embryogenic potential
The mechanisms underlying pattern formation during plant embryogenesis remain unknown, and the characterization of essential genes should remain a focus of research. Because many of these genes are active for the duration of embryogenesis, the knowledge of precise patterns of gene expression will help to elucidate "normal" embryogenesis and/or identify embryogenic tissues.
A precise and consistent pattern of PmLEC1 gene expression was demonstrated in both ZE and SE by northern blot analysis and qRT-

PCR (Figures 4 and 5), with peaks in transcript levels corresponding
to the beginning of the appearance of morphological stages (proembryo, early embryo, late embryo) of ZE. The large error bar on June 20 ( Figure 5) may reflect asynchronous fertilization leading to embryos that are not uniformly at the same stage of development (Owens, 1995). The alternating pattern of induction and repression observed in both ZE and SE is reminiscent of a phenomenon described by Cairney and Pullman (2007) Arabidopsis seedlings (Huang et al., 2015). However, the expression of trace amounts of LEC1 in germinating seeds and the youngest vegetative buds and pollen cones, may also be an indicator of their embryogenic competence and the window for inducing SE from these juvenile tissues.

| 5′ UTR introns enhance gene expression and protein accumulation
Our results are the first to describe a LEC1 gene with a 5′ UTR intron. Some of the known LEC1 genes contain introns within the coding sequence, but no 5′ UTR introns have been evident (Cagliari et al., 2014;Han et al., 2017). Only a few conifer genomes have been sequenced; most LEC1 sequence data are derived from cDNAs and ESTs. 5′ UTR introns are associated with promoter-like features, developmental regulation and increased gene expression via IME Kim et al., 2006).
Normally, 5′ UTR introns are located 80-300 bp from the beginning of a 5′ UTR and 1-40 bp upstream of the start codon and are usually longer than introns within coding sequences . Large 5′ UTR introns function as spacers within the genome and provide AT-rich stimulatory elements between coding and noncoding sequences of the genome . The PmLEC1 5′ UTR intron splice sites are located 123 bp from the beginning of the 5′ UTR and 30 bp upstream of the start codon. The PmLEC1 intron is 1,176 bp in length, more than twice the size of the 543 bp coding sequence.
Multiple factors and signals appear to regulate the PmLEC1 locus ( Figure 9, Table 1). Large numbers of elements mediating responses F I G U R E 1 2 Ectopic expression of PmLEC1 in the A. thaliana lec1-1 null mutant produces diverse T2 phenotypes and induces expression of embryo-specific genes in mature tissues. (a) Plant morphology ranged from normal phenotype of A. thaliana (1) to seedlings with cotyledon-like organs at positions normally occupied by leaves (2), and seedlings with masses of recurrent embryo-like structures emerging from the cotyledons (3). (b) Gene expression analysis. Total RNA was isolated only from vegetative tissues (leaves and stems). RT-PCR was performed with 0.1 μg DNase I-treated RNA to evaluate gene expression of AtLEC1, PmLEC1, oleosin and cruciferin. Actin served as the control to a variety of signals couple brief stimuli to long-term adaptive responses (Finkbeiner, 2001). Therefore, the use of multiple signals for synergistic activation may be one strategy to induce SE at any time and in any tissue. Multiple regulatory sites on introns lead to synergistic activation rather than additive activation, and specific biological responses may be achieved combinatorially (Finkbeiner, 2001).

| PmLEC1 protein profile, electrophoretic mobility, and possible posttranslational modifications
The estimated sizes of PmLEC1, 59 kD and 36 kD, as observed via electrophoretic mobility, are greater than the 21 kD MW, initially predicted for the 180 amino acid residues, but they are likely correct assuming predicted posttranslational modifications of PmLEC1.
Similarly, murine HAP3 (NF-YB) proteins are known to migrate at higher-than-expected MWs. Both rat and mouse HAP3 proteins contain 207 amino acids and have expected MWs of 25 kD, but they migrate at 32 kD and 36 kD, respectively (Gilthorpe et al., 2002;Vuorio et al., 1990). The SDS-PAGE mobility of PmLEC1 protein extracted from vegetative tissues of transgenic WT PmLEC1 Arabidopsis plants is 34 kD and is comparable to that of HAP3 proteins purified from other species.
The change in PmLEC1 electrophoretic mobility from July 31 until 10 DAEG ( Figure 8) coincides with the maturation and desiccation phases of embryogenesis, and seed dormancy. The maturation phase is characterized by an interruption in patterning, proliferation and differentiation, accumulation of storage proteins and lipids and a reduction in water content (Jo et al., 2019). In Arabidopsis, AtLEC1 inhibits precocious germination and is necessary for the acquisition of desiccation tolerance and accumulation of storage molecules (Jo et al., 2019). The embryo remains developmentally arrested until conditions become favorable for germination (Jo et al., 2019), which involves increased water uptake and metabolic activity. The change from 36 to 59 kD occurring at 12 DAEG could signify that favorable germination conditions were finally perceived.
Posttranslational modifications would allow PmLEC1 to function during all stages of embryogenesis. The capture of a sequential transition is likely due to seeds at different stages of late embryogenesis (each sample contained proteins from several seeds), and the two visible steps on July 17 and July 31 are suggestive of the two predicted N-myristoylations. N-myristoylation functions in protein-protein interactions, protein stability and adaptation to salt stress, cellular localization, targeting to membranes, and acting as a molecular switch (de Jonge et al., 2000;Wright et al., 2010).

| Potential regulators of PmLEC1 suggest molecular events coordinating conifer embryogenesis
Regulatory elements within the 5′ UTR intron are specific for TFs that have well-characterized functions in angiosperm embryogenesis and imply that homologues of AGL15, LEC2, FUS3, VP1, ABI3, and/or WUS regulate PmLEC1 expression. Because AGL15 is present prior to and during early embryogenesis, it may be the most significant factor responsible for LEC1 induction. In Brassica, maize and Arabidopsis, AGL15 is the only MADS box protein that preferentially accumulates in developing embryos and associated maternal tissues, as well as in somatic embryos (Harding et al., 2003;Perry et al., 1996;. Ectopic expression of AGL15 in angiosperms promotes and supports somatic embryo production (Harding et al., 2003).
During middle and late angiosperm embryogenesis, transcripts of the seed-specific transcriptional activators VP1 and ABI3 are the most abundant (Kurup et al., 2000;Suzuki et al., 2007).
The presence of regulatory elements for brassinosteroids, salt, and mannitol supports our findings on PmLEC1 induction by these compounds (Figure 8). For example, RAV1, HBI1, and WRKY46 mediate responses to brassinosteroids and these target sites are also present in PmLEC1 (Table 1). Computational analysis of the PmLEC1 locus identified key TFs that are likely to have roles in conifer embryogenesis and should be characterized in future studies.
In summary, PmLEC1 is the first early embryo regulatory gene characterized in Douglas-fir. This gene is highly expressed during early embryogenesis and may be induced by salt, osmotic stress, and brassinosteroids in mature tissues. The induction of PmLEC1 in mature tissues may lead to improved SE protocols and, most importantly, the induction of SE from vegetative conifer explants. The 5′ UTR intron sequence of PmLEC1 identifies potential genes involved in conifer embryogenesis. The function of PmLEC1 may involve integrating multiple signals perceived through the intron and initiating major reprogramming to coordinate development and responses to the environment. Finally, since LEC1 is known to be part of a regulatory network (Han et al., 2017;Rupps et al., 2016), other TFs must be characterized before we may fully control embryogenesis in conifers.

ACK N OWLED G M ENTS
We thank the Yevtushenko and Misra lab members at the University of Lethbridge and the University of Victoria, respectively, for technical assistance and stimulating discussions during this work. We also thank Dr. David Pearce (University of Lethbridge) for critical reading of our manuscript.

CO N FLI C T O F I NTE R E S T
The authors are not aware of any conflicts of interest.