Notice: Wiley Online Library will be unavailable on Saturday 27th February from 09:00-14:00 GMT / 04:00-09:00 EST / 17:00-22:00 SGT for essential maintenance. Apologies for the inconvenience.
Author for correspondence: Neil W. Ashton Tel: +1 306 585 4252 Fax: +1 306 585 4894 Email:email@example.com
• Structural and phylogenetic studies of KNOX genes identified in the bryophyte Physcomitrella patens are reported here, to provide insights into the evolution of class 1 and class 2 KNOX genes.
• Three KNOTTED1 -like homeobox ( KNOX ) genomic clones were isolated and sequenced from P. patens . Corresponding cDNAs from a library, prepared from mRNA transcripts isolated from gametophytic tissues, were also sequenced.
• Conceptual translation and analysis of the bryophyte coding sequences revealed a domain pattern and secondary structures typical of higher plant KNOX proteins. Intron number and positions within the genes were also highly conserved between moss and angiosperm loci, providing further support for their homology. Structural and phylogenetic analyses indicated that moss clones ( MKN2 and MKN4 ) represent class 1 KNOX genes and the remaining clone ( MKN1–3 ) is a class 2 KNOX gene.
• We conclude that the observed protein domain pattern is encoded by homeobox genes that evolved after separation of the plant lineage from that of fungi and animals, and must have been present in the common ancestor to mosses and seed plants. It is proposed that gene duplication and diversification, which created class 1 and 2 KNOX gene subfamilies, occurred after separation of this common ancestor from its algal progenitor (since a characterized algal KNOX gene cannot be assigned to class 1 or 2), but before the moss and higher plant lineages diverged.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
The homeobox was identified as a conserved sequence motif shared by several Drosophila homeotic genes (McGinnis et al., 1984) with crucial regulatory roles in morphogenesis. Since then, several thousands of homeobox genes have been discovered in fungi, animals and plants (for a review see Gehring, 1994). All possess a conserved homeobox encoding an approx. 60 amino acids long homeodomain within proteins, which are believed to be transcription factors. The first plant homeobox gene to be cloned was KNOTTED1 (KN1) from maize (Vollbrecht et al., 1991). Many homeobox genes have since been isolated from various plant groups and, based upon structural considerations and phylogenetic analyses, have been classified into four distinctive groups (Kerstetter et al., 1994). Genes belonging to two of these groups, KNOX and BELL, are distinguished from the other gene types by encoding proteins with homeodomains that are 63 amino acids long. The additional three residues occur between the first and second helices of the homeodomain. Thus, KNOX and BELL genes belong to the three amino acid loop extension (TALE) superclass of homeobox genes that also includes genes representing metazoa, for example MEIS genes, and fungi, for example CUP genes (Bürglin, 1995, 1997; Bharathan et al., 1997). KNOX and MEIS proteins possess large domains upstream from the homeodomain, which are conserved within each class. The two domains share significant sequence similarity, indicating that they may have evolved from a common ancestral (MEINOX) domain (Bürglin, 1997, 1998), already present when plants and animals diverged (Bharathan et al., 1997; Bürglin, 1997, 1998).
KNOX proteins possess a common structural organization consisting of six regions arranged in the following sequence: a non-conserved N-terminal region characterized in some, although not all, cases by homopolymeric regions (9–14 amino acids long) comprising proline, glutamine, histidine or alanine residues (Vollbrecht et al., 1991; Tamaoki et al., 1995; Serikawa et al., 1996; Janssen et al., 1998), perhaps corresponding to proline- or glutamine-rich or other transcriptional activation domains; a conserved KNOX domain with a secondary structure consisting of a helix-loop-helix motif separated from a third helix by a linker sequence. It has been suggested that the third helix may function as a transcriptional activation domain (Sakamoto et al., 1999); a GSE domain defined by a predominance of glycine (G), serine (S) and glutamate (E) residues (Bürglin, 1997). Threonine and proline residues are also common and, in many cases, a proline (P)-, glutamate (E)-, serine (S)- and threonine (T)-rich (PEST) motif can be discerned, indicating that this region may be involved in determining the longevity of KNOX proteins; a highly conserved glutamate (E)-, leucine (L)- and lysine (K)-rich (ELK) domain. Secondary structure analysis predicts that the ELK domain may form an amphipathic α-helix with an offset hydrophobic face (Kerstetter et al., 1994) perhaps facilitating protein–protein interactions (Vollbrecht et al., 1993); a homeodomain responsible for DNA binding; and a short, non-conserved C-terminal region.
KNOX genes can be subdivided on the basis of sequence homology and expression patterns into two classes (Kerstetter et al., 1994). Class 1 genes include maize KNOTTED1 and the Arabidopsis orthologue, SHOOTMERISTEMLESS (STM). They are expressed in vegetative, axillary, inflorescence and floral shoot apical meristems (SAMs) (Long et al., 1996), and also in internodal intercalary meristems (IMs) (Sato et al., 1999). The results of studies on mutants, homozygous for recessive, loss-of-function alleles, have provided clear insights into the developmental roles of class 1 genes (Long et al., 1996; Kerstetter et al., 1997; Sato et al., 1999). They appear to be needed for the generation of SAMs and IMs, and for maintenance of the cells comprising them in an undifferentiated, undetermined state. Class 2 genes also include representives from maize and Arabidopsis as well as from other angiosperms. They exhibit less restricted patterns of expression (Kerstetter et al., 1994; Serikawa et al., 1996). To date, no function has been assigned to them.
The results of phylogenetic analyses of higher plant KNOX gene and protein sequences (Bürglin, 1997; Bharathan et al., 1999) support the classification of KNOX genes based upon the structural and functional considerations discussed above.
This paper reports the cloning of three bryophyte KNOX genes, describes their genetic architecture and the organization of the proteins they encode, and presents the results of phylogenetic analyses of a representative subset of plant KNOX genes incorporating those from P. patens.
Development and evolution are mutually interrelated phenomena, and genes which are significant for the development of particular somatic structures may have been instrumental in the evolution of these structures (Theißen & Saedler, 1995). Therefore, it is reasonable to suppose that the duplication and diversification of ancestral KNOX genes were essential steps in the evolution of higher plant SAMs. It follows that structural, phylogenetic and functional studies of KNOX genes identified in non-spermatophyte plant groups should be illuminating with respect to the evolution of class 1 and class 2 KNOX genes and the roles they have played in the evolution of SAMs and plant form.
Materials and Methods
Physcomitrella patens (Hedw.) Bruch, Schimp and W. Gümbel, recently renamed Aphanoregma patens (Hedw.) Lindb., was cultured using standard medium and conditions ( Ashton et al., 1985 ; Knight et al., 1988 ). Gametophytes (protonemata and gametophores) were grown for 3–4 wk, ground into a fine powder in liquid nitrogen, and stored at −85°C until used for DNA extraction.
Plant origins and GenBank accession numbers are provided in the list of sequences within the legend of Fig. 4.
Genomic DNA (gDNA) extraction
DNA was isolated from gametophytic tissues according to Doyle & Doyle (1990) with the following minor modifications: inclusion in the extraction buffer of 1% polyethylene glycol (PEG) 6000; and incubation of the tissue/extraction buffer mixture at 72°C for 20 min. RNA was removed during DNA extraction with RNase A (100 µg ml−1 extraction buffer) or, after DNA isolation, with RNase A (20 µg ml−1 Tris-EDTA (TE), pH 8, containing the redissolved DNA). DNA samples were purified by anion exchange chromatography using Genomic-tip Maxi DNA Purification Columns (Qiagen Inc., CA, USA) according to the manufacturer's instructions.
Extraction of DNA from λ clones
DNA was extracted from amplified λ clones containing MKN (Moss KNOX) gDNA sequences using a QIAGEN Lambda DNA Midi Preparation kit, according to the manufacturer's (Qiagen Inc.) instructions, and used in sequencing reactions without additional purification.
Using degenerate primers based on the KNOTTED1 homeodomain, five short P. patensMKN gene fragments (MKN1A to MKN5A inclusive) had previously been cloned by PCR, and sequenced (Ashton et al., 2000). Gene-specific primer pairs, each consisting of a plus strand (P) primer and a minus strand (M) primer, were designed for amplification of segments of each of these MKN fragments: MKN1A– MKN1P2 (GTATTGGTGACATGCTGATTTTTC) + MKN1M20 (AATTCCAAACCTGTCTCCTGAATC); MKN2A– MKN25P2 (CGCWCCATCT TGAAGGACTGG) + MKN2M1 (CAAGCCGCAGATTCTTTGCAGG) or MKN2M22 (ATTCTGACAGTGCCCAAGTGAAAC); MKN3A– MKN3P1 (GGAACTACAACAGTCTTG AAGGC) + MKN3M1 (TTACGTGAACTAGAACATAAAGC); MKN4A– MKN4P1 (CGTCAGATCTTGAAGGACTGG) or MKN4P3 (CCTTCTCATTGTAAAGTTGGGACC) + MKN4M2 (TTAAGCCACAGAGTCGCTGC); MKN5A– MKN25P2 (CGCWCCATCTTGAA GGACTGG) + MKN5M1 (TAACTTGGTATGGGCTACTTTAAC). In each case, the 5′ end of the primer is on the left. Typically, each 50 µL PCR reaction mixture contained approximately 100 ng of gDNA (as template), 1.25 units AmpliTaq Gold™ (Perkin-Elmer Corporation, CT USA), PCR Buffer II and 2.5 mM MgCl2 (also provided by Perkin-Elmer), 50 µM each dNTP (Roche Molecular Biochemicals, Laval, QC, Canada), and 500 nM each primer. The thermocycling regime comprised: 9 min at 96°C; 35 cycles of 30 s at 96°C, 45 s at 50°C, 60 s at 72°C; terminated by 5 min at 72°C.
To corroborate the identity of the PCR reaction products, they were resolved electrophoretically using 1.2% agarose gels and visualized by staining with ethidium bromide. Resulting DNA fragments of the predicted sizes were extracted from agarose gels using Wizard® PCR Preps (Promega, WI, USA), cloned into pGEM®-T Vector System II (Promega), propagated in E. coli JM109, and the recombinant plasmids isolated from bacterial clones using Wizard® Plus Minipreps (Promega), each kit being used according to the respective manufacturer's instructions. The resulting plasmids were diluted and used as templates in PCR reactions with M13 lacZ forward and reverse primers in order to produce template for manual sequencing. Unused primers were removed using a QIAquick PCR Purification Kit (Qiagen Inc.).
Non-isotopic MKN probes for use in Southern analysis were prepared by PCR using the gene-specific primers described above with previously amplified, but unlabeled MKN fragments as template. PCR was performed also as described above except for inclusion in the reaction mixtures of DIG (digoxigenin)-11-dUTP, in accordance with the supplier's (Roche Molecular Biochemicals, Laval) instructions.
P. patens gDNA inserts (9–23 kb) in λ FIX II clones were amplified in two parts by using primer pairs, in each case, comprising one of the MKN gene-specific primers referred to above and a vector-specific primer designed from the T3 (AATTAACCCTCACTAAAGGG) or T7 (GTAATACGACTCACTATAGGGC) regions of the λ arms flanking the insert. Typically, each 50 µl PCR reaction mixture contained 1 µl of phage suspension (as template), 2.6 units Expand DNA polymerase (Roche Molecular Biochemicals, Laval), PCR Buffer I and 1.75 mM MgCl 2 (also provided by Roche), 350 µM each dNTP, and 300 nM each primer. The thermocycling regime comprised: 2 min at 94°C; 10 cycles of 10 s at 94°C, 30 s at 52°C, X min at 68°C; 15 cycles of 10 s at 94°C, 30 s at 52°C, X min +20 s cycle −1 at 68°C; 7 min at 68°C. The variable, X, was assigned values between 2 and 15 min depending upon the size of the fragment being amplified.
PCR products from MKNλ clones were purified using a QIAquick PCR Purification Kit (Qiagen Inc.). Alternatively, they were resolved by gel electrophoresis and, following excision of portions of agarose gels containing the desired products, the DNA was purified using a QIAquick Gel Extraction Kit, for 1–10 kb products, or a QIAEX II DNA Purification Kit (Qiagen Inc.), for > 10 kb products. Typically, the purified products were precipitated with ethanol, and resuspended in water prior to sequencing.
New MKN gene-specific primers (sequences available on request), designed from exonic regions of sequenced MKN1–3 and MKN4 gDNAs, were used to amplify segments of MKN cDNAs in a P. patens cDNA Library (Library 2) described previously (Krogan & Ashton, 2000). A 10-µl aliquot of library, diluted 1 : 250, was used as template. The thermocycling regime comprised: 9 min at 96°C; 40 cycles of 30 s at 96°C, 45 s at 55°C, 60 s at 72°C; terminated by 5 min at 72°C. PCR products were purified using a QIAquick Gel Extraction Kit, cloned into pGEM-T Vector System II (Promega) and propagated as described earlier.
Restriction of gDNA
Approx. 10 µg of gDNA were incubated at 37°C for 24 h with 40 units of restriction enzyme and an appropriate buffer in a final volume of 200 µl. The digestion products were precipitated with ethanol, dried and resuspended in water.
Restricted gDNA samples (10 µg) were resolved electrophoretically through 1% agarose gels and transferred to Hybond-N+ membranes (Amersham, Baie d’Urfé, QC, Canada). Hybridization procedures were as described by Engler-Blum et al. (1993). Denatured DIG-labeled probe was used at 5–10 ng of probe : ml−1 of hybridization buffer. Four stringency washes were performed at 65°C. Probe detection following hybridization was also as described by Engler-Blum et al. (1993), with the exception of utilizing the chemiluminescent substrate, CDP Star (Roche Molecular Biochemicals), in place of Lumigen PPD.
Manufacture of a Lambda (λ) FIX II genomic library
Wild-type P. patens gDNA was partially digested with Sau3A1 (New England Biolabs Ltd, Mississauga, ON, Canada). The 5′ sticky-ends (GATC) of the resulting restriction fragments were partially end-filled using dGTP and dATP and DNA Polymerase I Large (Klenow) Fragment (New England Biolabs). Following purification by phenol extraction, the Sau3A1 restriction fragments were size-fractionated by sucrose density (10–40% linear) gradient centrifugation. Genomic DNA fragments, between 9 and 23 kb, were ligated into λ FIX II/XhoI Partial Fill-in vector (Stratagene, La Jolla, CA, USA). The resultant library was packaged using Gigapack III XL Packaging Extract (Stratagene) and amplified in XL1-Blue MRA (P2) cells (Stratagene) according to the supplier's instructions. The amplified library was harvested and stored at –85°C in 7% dimethyl sulfoxide (DMSO). The total number of recombinants in the constructed library, before amplification and storage, was determined to be between 1.4 × 105 and 2.2 × 105 (based on 2 titrations). Given that the genome size for P. patens is 6 × 108 bp (Resci et al., 1994), and the average insert size in the library is 17 kb, the probability that any given sequence is present is between 98% and 99.8%.
Screening of the genomic library to detect MKN clones
Genomic library (typically c. 2 × 105 pfu) was used to infect XL1 Blue MRA cells (Stratagene), which were then grown for 8 h on Lennox L. Broth medium in six 15 cm Petri plates. Following the formation of plaques, duplicate plaque lifts were made from each plate using Hybond-N+ membrane disks (Amersham, Canada). Both disks from each plate were placed successively on chromatography paper soaked in denaturation solution (5 min), neutralization solution (5 min) and 2X saline sodium citrate (15 min) (solution compositions as recommended by the supplier (Stratagene, USA) of the λ FIX II kit). The procedures for hybridization and detection of clones of interest are as outlined for Southern blots. The resulting primary and secondary chemiluminograms were compared to distinguish positive signals from background noise. Positive clones were isolated, amplified and stored at −85°C in 7% DMSO.
DNA sequencing strategies and methods
MKN gDNA and cDNA fragments were partially sequenced manually to establish their identity. Sequencing template was used with an AmpliCycle Sequencing Kit (Perkin-Elmer, USA). The resulting DNA fragments were electrophoresed through a 6% polyacrylamide gel (19 : 1 acrylamide : bis-acrylamide), visualized by silver staining and the sequences read by eye.
Complete sequencing of gDNA and cDNA clones was achieved by automated sequencing in conjunction with a ‘primer walk’ strategy. In the case of gDNA clones, either DNA extracted directly from the λ clones or amplified PCR derivatives were used as sequencing template; for cDNA clones, amplified derivatives cloned in pGEM-T were employed. The MKN gene-specific primers described earlier were utilized and new gene-specific primers were designed as necessary. Complete sequences without gaps were generated by merging the overlapping nucleotide sequences thus obtained. Primer sequences are available upon request.
Analysis of DNA and protein sequences
EUKPROM in PCGene (IntelliGenetics, Inc., Mountain View, CA, USA), version 6.7, was used to search DNA sequences for the most common eukaryotic promoter elements: TATA-box, cap signal, CCAAT-box and GC-box. The PEST program of the same software suite was used to search protein sequences for PEST motifs. CHARGPRO, also in PC Gene (IntelliGenetics Inc.), was used to calculate protein isoelectric points (pI).
Potential α-helical regions in MKN proteins were predicted with the program PHDsec obtained from the world wide web site, http://dodo.cpmc.columbia. edu/predictprotein/ncbi.nlm.hih.gov/(Rost & Sandler, 1993, 1994).
Protein sequences derived by conceptual translation of MKN1–3, MKN2 and MKN4 and a representative subset of KNOX gene sequences, obtained from the GenBank database, were aligned initially by employing the Clustal W program in OMIGA 1.1 (Oxford Molecular, Campbell, CA, USA) and then manually adjusted by eye. The aligned protein sequences were used as a guide for alignment of the corresponding DNA sequences. Aligned sequences were imported into MacClade (Maddison & Maddison, 1992). Nucleotides, corresponding to parts of the KNOX domain (amino acid positions 270–289 and 332–353 – see Fig. 1), and the complete ELK domain and homeodomain (positions 443–527), were utilized in subsequent phylogenetic analyses since they could be aligned accurately with confidence. Nucleotide sequences corresponding to other regions of the proteins were excluded because their unambiguous alignments were extremely difficult or impossible. Partition homogeneity tests (Farris et al., 1995) were performed (1000 replicates), using PAUP 4.0b4a (Swofford, 1999), to validate the legitimacy of combining the several aligned regions chosen for phylogenetic analysis.
Phylogenetic relationships were inferred using standard and weighted (Farris, 1969) maximum parsimony (MP) and neighbor joining (NJ) methods in PAUP 4.0b4a (Swofford, 1999). The robustness of derived trees was assessed by the bootstrap method with 5000 replications. The terms ‘strongly supported’, ‘moderately supported’ and ‘weakly supported’ were used to refer to bootstrap values in the ranges: > 80%, 66–80%, and 50–65%, respectively. Hasegawa, Kishino and Yano's model (HKY85) (Hasegawa et al., 1985) of sequence evolution was used to generate NJ trees. Deletions were ignored. The Heuristic search algorithm in PAUP was used in MP analyses. Gaps were treated as missing data. Following the example of Bharathan et al. (1999), all trees were rooted with BELL1, an Arabidopsis TALE homeobox gene encoding a homeodomain protein that lacks KNOX and ELK domains, and three nonplant TALE genes, MEIS2, XMEIS1–1 and CEH25-A, all encoding proteins that lack an ELK domain but have a homeodomain and MEIS domain. To facilitate alignment of KNOX sequences with the four outgroups, gaps were inserted in the outgroup sequences in regions corresponding to the ELK domain (for all outgroup sequences) and KNOX region (for BELLI).
Gene names are written in uppercase, italicised letters, e.g. KNOTTED1, while the protein products of genes are given in uppercase, non-italicised letters, for example KNOTTED1. A consensus nucleotide sequence is designated as a ‘box’, e.g. a homeobox, while the corresponding consensus amino acid motif is referred to as a ‘domain’, for example a homeodomain.
The conventional use, adopted in this paper, of 5′ and 3′ to distinguish one end of a gene sequence from the other, refers strictly to the respective ends of the plus DNA strand.
Analysis of P. patens MKN gDNA fragments: MKN1A to MKN5A inclusive
Segments of each of five small MKN gene fragments, previously isolated and sequenced (Ashton et al., 2000), were amplified from P. patens total gDNA using fragment-specific primer pairs. Amplified DNA segments of the predicted sizes were sequenced to authenticate their identity. Two of these MKN gene fragments, MKN2A and MKN4A, comprise a portion of homeobox containing a complete intron. MKN1A consists of the 3′ end of an intron followed by the 5′ end of an exon. MKN3A and MKN5A each consists of the 3′ end of an exon followed by the 5′ end of an intron. Since no overlapping sequences are apparent between the 3′ end of MKN3A or MKN5A and the 5′ end of MKN1A, the possibility existed that either MKN3A or MKN5A is part of the same gene as MKN1A. This was tested using the following primer pairs: MKN3P1 (specific for MKN3A) + MKN1M20 (specific for MKN1A); MKN25P2 (specific for the 5′ end of MKN2A and 5 A) + MKN1M20. Only the first primer combination supported amplification from total gDNA, indicating that MKN1A and MKN3A are from the same gene, MKN1–3, and that MKN5A is from a separate gene. This conclusion was confirmed by Southern blot analysis using fragment-specific probes. Probing with MKN1A and MKN3A-specific probes produced identical Southern blot profiles. Probing for MKN2, 4 and 5 sequences yielded three distinct profiles, none of which matched that for MKN1–3 (Fig. 2). All four profiles are consistent with the respective MKN genes being present as single copy sequences.
Isolation and sequencing of complete MKN genomic clones
Preliminary probing of small aliquots of the P. patensλ FIX II gDNA library by PCR, using MKN1–3, 2, 4, and 5 gene-specific primer pairs, indicated that the first three genes are represented in the library but that MKN5 is absent. Subsequently, screening the library yielded one complete MKN1–3 clone (from c. 2 × 105 plaques screened), three incomplete clones of MKN2 (from three separate experiments and a total of c. 8 × 105 plaques), and three complete MKN4 clones (from c. 3 × 105 plaques). No MKN5 clones were detected in c. 3 × 105 plaques screened, an observation consistent with the contention made above that MKN5 may not be represented in this library. Each of the clones obtained was characterized initially by long-distance PCR using primer pairs consisting of an appropriate MKN gene-specific primer together with a vector-specific primer. The most significant results from this analysis were that the three MKN2 clones appear to be identical. Thus, each clone supported amplification of a 2-kb band with the primer pair, MKN2M1 + T7, and a large band of approximately 15 kb with the primer pair, MKN25P2 + T3. Similarly, all three MKN4 clones were apparently identical. Therefore, one MKN2, one MKN4 and the sole MKN1–3 clone were chosen and sequenced (GenBank accession numbers: AF285147, AF284817 and AF285148, respectively). The plus strand of each sequence was translated conceptually in all three reading frames to discern open reading frames (ORFs) and intron-exon junctions (pre-mRNA splice sites).
Confirmation of putative intron-exon junctions
To authenticate intron-exon junctions identified from an examination of gDNA sequences, cDNA fragments were amplified by PCR from a previously prepared cDNA library, Library 2 (Krogan & Ashton, 2000). Three overlapping fragments for MKN1–3, encompassing all putative splice junctions, and two overlapping fragments from MKN4, also spanning all splice junctions, were generated and sequenced. All putative splice sites were shown to be functional and sequencing the cDNAs provided verification of the exonic sequences obtained directly from gDNAs.
Genetic architecture of KNOX gDNAs
The genetic architectures, including the positions and sizes of introns, of MKN1–3, MKN2 and MKN4 are depicted in Fig. 3. The corresponding cDNAs from processed mRNAs lack the introns indicated by vertical arrows.
Each of the sequenced MKN1–3 and MKN4 clones comprises six exons and five introns and includes all protein coding sequences. Thus, both clones contain what is believed to be their entire respective genes, including the promoter regions. The MKN2 clone comprises three exonic regions and two introns. It is an incomplete clone lacking a substantial portion of the KNOX box and other 5′ regions.
The base composition of introns within the MKN genes was compared to that of their exons. The average A/T contents (A + T percentages of the total number of bases) for introns, with corresponding values for exons enclosed in brackets, in MKN1–3, MKN2 and MKN4 are 63.3% (52.8%), 60.5% (51.6%) and 58.6% (54.4%), respectively. The corresponding T-contents are 34.9% (23.1%), 28.2% (21.7%) and 32.1% (23.9%).
Consensus intron-exon junction recognition sequences (splice sites) were calculated from sites identified in MKN1–3, MKN2 and MKN4. The 5′ intron splice site consensus is A/C(67%)G(75%):G(100%)T(75%)A(83%)A(50%)G(67%) T(67%). The 3′ intron splice site consensus is T(50%)G(58%) C(100%)A(100%)G(100%):G(67%)A(42%). Bold letters correspond to nucleotides within introns.
The first 1000 bp at the 5′ ends of MKN1–3 and MKN4 were examined for the presence of the commonest eukaryotic promoter elements: TATA-box, cap signal, CCAAT-box and GC-box.
Four potential TATA boxes, CCTTAAATCCAGAAG (corresponding to nucleotides 267–281 in GenBank sequence AF285148), TCATAAAACTCGAAT (nucleotides 459–473), GTATAGATATCGCGA (nucleotides 722–736) and CTATAATTGGTGACG (nucleotides 955–969), were identified in the MKN1–3 sequence. Three of them differ from the eukaryotic core consensus sequence, TATAAA, by a single nucleotide residue, and the remaining one differs by two residues. Nucleotides matching the core consensus sequence are shown in bold. The fourth TATA-box was assigned the best score by EUKPROM in PC Gene (IntelliGenetics, Inc.). However, no other promoter elements are associated with it. Both the second and third potential TATA-boxes have nearby cap signals. The first TATA-box, assigned the lowest score by EUKPROM, has two cap signals and a GC-box associated with it.
Four potential TATA-boxes were identified in MKN4. The one most likely to represent the authentic promoter, TTATAAAACTTGTGG (nucleotides 770–784 in GenBank sequence AF284817) contains the sequence TATAAA, which matches exactly the core consensus sequence for TATA boxes in eukaryotic promoters. A cap signal is located 31 bp 3′ to this TATA-box.
Examination of exon 1 (807 bp) of MKN1–3 and exon 1 (608 bp) of MKN4 revealed nine potential in-frame start codons for MKN1–3 and seven for MKN4. Analysis, using criteria developed by Cavener & Ray (1991), of the first 10 nucleotides 5′ to potential start codons and two nucleotides 3′ to them was undertaken in an attempt to identify the real start codons. Based on this analysis, the most probable start codons (in relation to the 5′ end of each exon) are the ninth for MKN1–3 and the seventh for MKN4.
Structural features of conceptual P. patens KNOX proteins
The calculated sizes of MKN1–3 and MKN4 depend upon which start codon is assumed to be correct in each case. When the first codon is chosen, MKN1–3 and MKN4 comprise 533 and 488 amino acid residues, respectively (Fig. 1). When the most probable, also the last in both cases, start codon is chosen, the predicted proteins consist of 325 and 338 residues.
The homeodomains of MKN1–3, 2 and 4 contain the four invariant amino acid residues (WF-N-R) of homeodomains from all non-yeast eukaryotes and, also like them, are highly basic (Scott et al., 1989) (Fig. 1). The calculated pI values for the MKN2, MKN4 and MKN1–3 homeodomains are 10.9, 11.0 and 11.1, respectively. Additionally, each moss homeodomain possesses 12 of the 13 residues that are invariant in the third α-helix of homeodomains of higher plant KNOX proteins (Fig. 1).
The ELK domains of MKN1–3, 2 and 4 exhibit leucines or other hydrophobic amino acids at every fourth or fifth position, interspersed among charged or polar residues (Fig. 1), providing the potential in each case for formation of an amphipathic α-helix with an offset hydrophobic face, as has been proposed for other KNOX proteins (Kerstetter et al., 1994). Other predicted helical regions in the three moss proteins, within the KNOX and homeodomains, coincide with those of higher plant KNOX proteins (Fig. 1).
Each of the MKN protein sequences was analyzed using programs within the software suite, PC Gene (IntelliGenetics, Inc.), and by eye, for PEST regions (Rogers et al., 1986), nuclear localization signals (NLS) (Meisel & Lam, 1996) and transcriptional activation domains (Gerber et al., 1994).
Candidate PEST regions were detected in both MKN2 and MKN4 at a similar location in the GSE domain to that observed in KNOTTED1. The MKN2 PEST region, HYVDTTPDEDNCGFDIGPLEYGAQEGDDLDTLGDENVMYPLDIDESVIVDPMASDEDI K (positions 380–445 –Fig. 1) contains 13 PEST residues, shown in bold in the 59 residue region above. The MKN4 PEST region, HYIETTPDEEDNFGSDIGTK (positions 380–406) contains eight PEST residues in the 18 residue region shown. The PCGene program did not identify any PEST regions within MKN1–3 with acceptable scores. However, examination of MKN1–3 by eye revealed a possible PEST sequence, HNLTGVSAGESTGATMSEEDEDYDSDYGAYDAH (positions 375–408) at the location corresponding to the site containing the PEST region in MKN2 and MKN4. It contains 11 PEST residues although proline is not included among them.
All three MKN proteins have stretches of basic amino acid residues within the ELK domain and the N and C-terminal portions of the homeodomain, which can be considered potential NLSs (for a review see Raikhel, 1992) (Fig. 1). Additionally, five of the six residues at the C-terminal end of MKN1–3 are basic amino acids that could function as a SV40-like NLS or, in combination with one of the basic regions within the homeodomain, as a bipartite NLS.
Examination of MKN1–3 and MKN4 revealed no obvious examples of the motifs, e.g. glutamine-rich, proline-rich and acidic, typical of eukaryotic transcriptional activation domains. However, the N-terminal region of MKN4 contains three small sections rich in one type of residue, one containing five alanine residues, ASAAAA (positions 224–230), another with four aspartate residues, DRDDD (positions 245–249) and the last containing four glutamate residues, ENEEE (positions 262–266) (Fig. 1), reminiscent of the longer homopolymeric sequences in the N-terminal regions of some higher plant KNOX proteins. Additionally, MKN2 and MKN4, but not MKN1–3, display the amphipathic α-helix in the KNOX domain for which an activation role has been proposed in other KNOX proteins (Sakamoto et al., 1999).
Protein and DNA sequence alignments
A subset of the aligned KNOX sequences analyzed in this study is depicted in Fig. 1. It includes the P. patens proteins MKN1–3, MKN2 and MKN4, representatives of higher plant class 1 and class 2 proteins and the algal protein AAKNOX1. The complete set of alignments included all KNOX proteins for which the corresponding gene sequences can be obtained from GenBank. Observed cases of the conservation of intron positions contributed to our confidence in the correctness of the alignments. Subsequent phylogenetic analyses were performed using nucleotide sequences corresponding to the boxed regions shown in Fig. 1. The validity of combining these regions was verified by the non-significant P-value, 0.577, derived by testing for homogeneity.
Phylogenetic trees, based upon nucleotide sequences derived from 39 angiosperm genes, three gymnosperm genes and one algal gene obtained from GenBank, the three moss genes reported here and four outgroups, were constructed using distance and parsimony methods. Of the 381 characters analyzed, 302 are parsimony-informative. Initially, MP generated two most parsimonious trees of length 2411 (consistency index (CI) = 0.286); reweighting yielded a single most parsimonious tree of length 433 (CI = 0.452). MP and NJ methods produced trees with very similar topologies and bootstrap support values. Therefore, a composite tree, which summarizes the trees obtained, is depicted in Fig. 4.
The genome of P. patens contains at least four different KNOX genes, each probably present as a single copy. Complete clones for two of the genes and an incomplete clone for another have been isolated from a genomic library and sequenced. Although the calculated probability of any given sequence being in the genomic library is between 98% and 99.8%, no MKN5 clone was obtained and the three MKN2 clones isolated were incomplete. This may indicate that MKN5 and the 5′ end of MKN2 are not represented in the library or, alternatively, that partial clones of these sequences are present but lack the probe site.
Architecture of P. patens KNOX genes
Most higher plant KNOX genes possess four or five introns at conserved locations. Typically, class 1 genes possess two introns at conserved locations within the KNOX-box, one intron within the GSE-box, and one intron at a conserved location within the homeobox. Class 2 genes have an additional intron at a conserved site within the ELK-box. Deviations from these arrangements appear to be rare in higher plant KNOX genes. The only two cases reported are the ArabidopsisSTM gene (Long et al., 1996), and the tomato TKN2 gene (Parnis et al., 1997). Both lack the second intron normally present in the KNOX-box. In other respects, they are typical class 1 genes.
The moss KNOX sequences exhibit a similar genetic architecture to that described above. MKN2 and MKN4 possess an intron at the conserved site within the homeobox and an intron in the GSE-box. Neither gene has an intron in the ELK-box. MKN4 has both KNOX-box introns at the expected locations and it possesses an additional intron 5′ to the KNOX-box. No other information is available for MKN2 since its sequence was derived from an incomplete clone. Nevertheless, both MKN4 and MKN2 bear a strong resemblance to higher plant class 1 genes. MKN2 and MKN4 share with higher plant class 1 genes another conserved feature (Vollbrecht et al., 1991; Matsuoka et al., 1993; Parnis et al., 1997), namely that the intron in the GSE-box is significantly larger that the others (Fig. 3). Like all KNOX genes, MKN1–3 possesses an intron at the conserved site within the homeobox and an intron in the GSE-box. It also has an intron in the ELK-box at the chacteristic site for class 2 genes. MKN1–3 lacks the first intron normally present in the KNOX-box but possesses the second KNOX-box intron at the expected location. As in the case of MKN-4, MKN1–3 has an additional intron 5′ to the KNOX-box. Thus, MKN1–3 appears to be more closely related to higher plant class 2 genes than are MKN2 and 4.
The sizes and base compositions of introns in the moss KNOX genes fall within the ranges recorded for angiosperm genes (Simpson & Filipowicz, 1996). It is worth noting that the smallest intron, that 5′ to the KNOX-box of MKN4, is 146 nucleotides long, exceeding the commonly accepted minimum figure (70 nucleotides) for efficient splicing in dicots (Simpson & Filipowicz, 1996). The introns are AT and T-rich compared with the exons, a feature shared with introns from other P. patens genes (Krogan, 1999; Krogan & Ashton, 2000) and from higher plants (Ko et al., 1998), and also linked to the efficiency of splicing (Goodall & Filipowicz, 1989; Luehrsen & Walbot, 1994; Gniadkowski et al., 1996), particularly in dicots. AT and T-richness are not characteristic of yeast or mammalian introns, suggesting that splicing in plants has some unique requirements that distinguish it from splicing in mammals and yeast (Simpson & Filipowicz, 1996). The consensus sequences of the 5′ and 3′ splice sites are also very similar to those derived from angiosperm genes (Simpson & Filipowicz, 1996) and other P. patens genes (Krogan & Ashton, 2000). Probably the most significant difference between the moss and higher plant sequences is at the second nucleotide in the introns at the 5′ splice site. In higher plants, this is almost invariably T (98–99%). In P. patens, it is usually T (75% for KNOX genes and 88–89% for other P. patens genes). In the moss KNOX genes, C is substituted for T at this position in the 5′ splice sites in 25% of cases. All the splice sites are functional and the overall similarity between splice site sequences in moss and higher plant genes suggests that intron splicing occurs in a similar way in both plant groups.
Several potential promoter elements are recognizable within the first 1000 bp at the 5′ ends of MKN1–3 and MKN4. For MKN1–3, we cannot discern which of them are functional. However, a strong candidate for the promoter of MKN4 was identified. It comprises a TATA-box element containing the sequence TATAAA, which matches exactly the core consensus sequence for TATA boxes in eukaryotic promoters, and an appropriately located cap signal.
Structural organization of P. patens KNOX proteins
Virtual translation of the coding regions of the moss KNOX genes reveals that the encoded KNOX proteins have the typical plant domain pattern: a non-conserved N-terminal region, followed in turn by a KNOX domain, a GSE domain, an ELK domain, a homeodomain, and a short, non-conserved C-terminal region (Fig. 1).
The homeodomains of MKN1–3, 2 and 4 contain the four invariant amino acid residues (WF-N-R) of homeodomains from all non-yeast eukaryotes and, also like them, are highly basic (Scott et al., 1989) (Fig. 1). Additionally, each moss homeodomain possesses 12 of the 13 residues that are invariant in the third α-helix of homeodomains of higher plant KNOX proteins (Fig. 1). Thus, in MKN1–3, valine replaces isoleucine, a conservative substitution, at the third position of this sequence, and in MKN2 and MKN4, glutamate replaces glutamine at the tenth position. The moss proteins also possess the three amino acid insert between helices 1 and 2 observed in all TALE homeodomains (Bürglin, 1997).
The ELK domains of MKN1–3, 2 and 4 exhibit leucines or other hydrophobic amino acids at every fourth or fifth position, interspersed among charged or polar residues (Fig. 1), providing the potential in each case for formation of an amphipathic α-helix with an offset hydrophobic face, as has been proposed for other KNOX proteins (Kerstetter et al., 1994). Other predicted helical regions in the three moss proteins, within the KNOX and homeodomains, coincide with those of higher plant KNOX proteins (Fig. 1). When plotted on a helical wheel, the hydrophobic residues of the α-helix from the second half of the KNOX domain cluster to one side of the helix in both MKN2 and MKN4, indicating that these helices are amphipathic, as are the corresponding helices in many higher plant KNOX proteins (Bharathan et al., 1997; Sakamoto et al., 1999). The equivalent α-helix in MKN1–3 is not amphipathic, a feature noted in several other KNOX proteins (Sakamoto et al., 1999).
Amino acids at 18 positions in higher plant KNOX proteins, signature residues, exhibit perfect within-class conservation in class 1 and/or class 2 proteins but differ between classes. Eleven signature residues are in the KNOX domain, three in the ELK domain and four in the homeodomain. MKN1–3 possesses all of the class 2, and none of the class 1, signature residues. MKN4 has all the class 1, and none of the class 2, residues. The partial MKN2 sequence also conforms precisely to the class 1 signature. Contrastingly, the algal protein, AAKNOX1, possesses class 1 or class 2 signature residues at only four of the 18 positions: two are class 1 residues and two are class 2 residues (Table 1).
Table 1. Class 1 and class 2 KNOX protein signature amino acid residues
The signature amino acid residue positions have been assigned numbers following alignments of the 50 proteins analysed in this study.
Residues are named according to the accepted convention. Only those residues, which are absolutely conserved in higher plant class 1 or class 2 KNOX proteins, are shown. Absence of a letter at any given position indicates that there is no completely conserved residue at that position in the class concerned.
Only conserved signature residues are shown for the moss MKN proteins. Dashes indicate unidentified residues in MKN2.
For the algal protein, AAKNOX1, all residues at signature positions are displayed. Those in bold correspond to either class 1 or class 2 signature residues.
The MKN proteins and all higher plant KNOX proteins possess a GSE domain between the KNOX and ELK domains. The algal protein, AAKNOX1, appears to lack this domain (Serikawa & Mandoli, 1999), strengthening further the contention that AAKNOX1 belongs to neither class 1 nor class 2.
Protein and DNA sequence alignments
The alignments of residues comprising the ELK domains and homeodomains, generated using the Clustal W program in OMIGA 1.1 (Oxford Molecular, Campbell, CA, USA), are unambiguous and needed no adjustment. However, alignment of the complete KNOX domains is problematic. For higher plant sequences, two different solutions to this problem have been proposed previously (Bürglin, 1997; Bharathan et al., 1999). We attempted to resolve this issue by comparing the KNOX domains to MEIS domains of several nonplant TALE proteins, since it is believed that these two domains may have evolved from a common ancestral (MEINOX) domain (Bürglin, 1997). This approach provided a partial resolution by strengthening confidence in the alignments of some regions within the KNOX domains. Residues of the GSE domains are also difficult to align. Therefore, a conservative approach was adopted and only regions of the KNOX proteins that can be aligned unambiguously (Fig. 1) were used for phylogenetic analyses. Confidence in the correctness of these alignments was enhanced by the observations that the positions of included intron sites and amino acid signature residues were perfectly conserved (Fig. 1).
Phylogenetic analysis and the evolution of KNOX genes
We provide in this report the results of phylogenetic analyses of KNOX genes based upon nucleotide sequences derived from 39 angiosperm genes, three gymnosperm genes and one algal gene obtained from GenBank, the three moss genes reported here and four outgroups.
MP and NJ methods produced trees with very similar topologies and bootstrap support values. The following observations, apparent from an examination of Fig. 4 depicting a composite tree, are pertinent. (1) All spermatophyte class 1 KNOX gene sequences fall within a monophyletic clade supported by strong bootstrap values. (2) The three gymnosperm (conifer) genes fall within a monophyletic clade with maximum bootstrap support. In turn, this clade is included in a larger monophyletic clade together with a subset of the angiosperm class 1 genes. However, only weighted MP strongly supports the latter association. (3) The moss class 1 sequences, MKN2 and MKN4, form a single, maximally supported group that is included, in a basal position, within a large monophyletic clade containing all the spermatophyte class 1 genes. (4) Inclusion of the moss class 2 gene, MKN1–3, within a monophyletic clade containing all angiosperm class 2 sequences, which themselves constitute a moderately to strongly supported monophyletic clade, is very strongly supported by bootstrap analysis. (5) All sequences distinguished as class 1 or class 2 based upon structural characteristics of the genes or the proteins they encode and, in some cases, upon knowledge of their expression pattern and/or function, are assigned accordingly by phylogenetic analysis to two monophyletic superclades. (6) The class 1 and 2 superclades in turn comprise a strongly supported monophyletic clade including all KNOX genes except the single algal sequence. (7) Finally, the algal gene AAKNOX1 is included, in a basal position, within a clade containing all KNOX genes analyzed. Observations (1), (5) and (6) are congruent with the findings of Bharathan et al. (1999) although these authors reported that the monophyly of class 1 sequences was not strongly supported. Our other observations are novel since these are the first analyses to include moss and algal sequences.
The phylogenetic data reinforce conclusions derived from structural analyses of KNOX proteins. The three moss genes, MKN1–3, 2 and 4, and the algal gene, AAKNOX1, are homologs of spermatophyte KNOX genes. MKN2 and 4 are class 1 homologs; MKN1–3 is a class 2 homolog. AAKNOX1 belongs to neither class. We propose therefore that the gene duplication and diversification events that created the class 1 and 2 KNOX gene superfamilies, discernible in extant angiosperm and other spermatophyte groups, occurred at least 400 million years ago (MYA), that is before separation of the moss lineage from that which produced the higher plants. We tentatively suggest that these events did not precede the evolution of land plants from aquatic green algal ancestors, which occured during the Ordovician period (435–500 MYA). The latter suggestion must be tempered by recognition that searches for algal KNOX genes have not been exhaustive and therefore the possibility remains that algal class 1 and/or class 2 genes may be discovered in the future.
In keeping with the common assumption that KNOX proteins are transcription factors, the conceptual MKN proteins possess several potential NLSs, some candidate sequences for transcriptional activation domains and some secondary structure features that may facilitate interactions with other proteins. Additionally, MKN2 and 4 resemble KNOTTED1 in having a strong PEST motif within the GSE domain. It has been suggested that the possession of PEST motifs in class 1 KNOX proteins implies that the cellular levels of these proteins may be tightly regulated (Vollbrecht et al., 1991).
The conservation of sequence and structure within a protein set implies conservation of function. The angiosperm KNOX sequences, and implicitly the proteins encoded by them, segregate into two monophyletic superclades each of which appears to be comprised of orthologs and paralogs with similar expression patterns and functions. Thus, class 1 genes are expressed in shoot meristems and play a role in the generation and maintenance of these meristems. Class 2 genes are expressed more globally and have been assigned no specific function. The above mentioned concept (conservation of sequence, structure and therefore function) was given credence by the cloning and functional analysis of a spruce class 1 KNOX gene, HBK1 (Sundås-Larsson et al., 1998). In addition to sequence similarity, HBK1 exhibits similar expression patterns and functions to those of angiosperm class 1 genes. Thus, it is expressed in the spruce shoot apical meristem in undifferentiated regions but not in organ primordia developing at the flanks of the meristem. Furthermore, ectopic expression of HBK1 in Arabidopsis causes aberrations in leaf development similar to those observed when angiosperm KNOX genes are over-expressed in this plant. It seems therefore that class 1 KNOX genes acquired a regulatory role in the formation and/or maintenance of shoot meristems prior to the divergence of angiosperms and gymnosperms. The discovery of bryophyte class 1 and 2 homologs raises the intriguing possibility that the functions of these gene classes may have been conserved for an even longer period of time. This is an especially seductive idea in the case of class 1 genes since both MKN2 and 4 transcripts have been detected in a cDNA library derived from gametophytic tissues of various ages (6–28-d-old), representing one-dimensional (protonemal filaments), two-dimensional (one cell thick leaves) and three-dimensional (gametophore stems and gametangia) structures containing a variety of apical meristematic cells and tissues. MKN1–3 is the first class 2 gene reported for a non-angiosperm plant. Given its striking similarity to angiosperm class 2 genes, it is tempting to infer that its function may be similar to theirs.
We propose that the correlation observed to date between multicellular plants with meristems and their possession of class 1 and 2 KNOX genes, in contrast to the presence in the unicellular alga, A. acetabulum, of a KNOX homolog that appears to belong to neither class 1 nor class 2, may be more than coincidental. Therefore, we suggest that the duplication and diversification of ancestral KNOX genes were essential steps in the evolution of plant SAMs.
This study was funded by a Natural Sciences and Engineering Research Council of Canada (NSERC) operating grant awarded to N.W. Ashton and NSERC postgraduate scholarships (PGSA & B) provided to C.E.M. Champagne. We wish to thank W. Chapco and G. Litzenberger for invaluable advice concerning phylogenetic analysis.