Evolution of a symbiotic receptor through gene duplications in the legume–rhizobium mutualism


  • Stéphane De Mita,

    1. Laboratory of Molecular Biology, Department of Plant Science, Wageningen University, Wageningen, the Netherlands
    2. INRA Nancy-Lorraine, UMR Interactions Arbres/Micro-organismes, Champenoux, France
    Search for more papers by this author
  • Arend Streng,

    1. Laboratory of Molecular Biology, Department of Plant Science, Wageningen University, Wageningen, the Netherlands
    Search for more papers by this author
  • Ton Bisseling,

    1. Laboratory of Molecular Biology, Department of Plant Science, Wageningen University, Wageningen, the Netherlands
    Search for more papers by this author
  • René Geurts

    Corresponding author
    1. Laboratory of Molecular Biology, Department of Plant Science, Wageningen University, Wageningen, the Netherlands
    Search for more papers by this author


  • The symbiosis between legumes and nitrogen-fixing rhizobia co-opted pre-existing endomycorrhizal features. In particular, both symbionts release lipo-chitooligosaccharides (LCOs) that are recognized by LysM-type receptor kinases. We investigated the evolutionary history of rhizobial LCO receptor genes MtLYK3-LjNFR1 to gain insight into the evolutionary origin of the rhizobial symbiosis.
  • We performed a phylogenetic analysis integrating gene copies from nonlegumes and legumes, including the non-nodulating, phylogenetically basal legume Cercis chinensis. Signatures of differentiation between copies were investigated through patterns of molecular evolution.
  • We show that two rounds of duplication preceded the evolution of the rhizobial symbiosis in legumes. Molecular evolution patterns indicate that the resulting three paralogous gene copies experienced different selective constraints. In particular, one copy maintained the ancestral function, and another specialized into perception of rhizobial LCOs. It has been suggested that legume LCO receptors evolved from a putative ancestral defense-related chitin receptor through the acquisition of two kinase motifs. However, the phylogenetic analysis shows that these domains are actually ancestral, suggesting that this scenario is unlikely.
  • Our study underlines the evolutionary significance of gene duplication and subsequent neofunctionalization in MtLYK3-LjNFR1 genes. We hypothesize that their ancestor was more likely a mycorrhizal LCO receptor, than a defense-related receptor kinase.


The symbiosis between plants of the legume family (Fabaceae) and the nitrogen-fixing bacteria referred to as rhizobia appeared c. 60 million yr ago (Sprent, 2008; Doyle, 2011). This event may be one of the reasons explaining the adaptive radiation that gave rise to c. 19 500 legume species (Lewis et al., 2005; Sprent, 2007; Legume Phylogeny Working Group, 2013). Molecular and genetic studies suggest that rhizobia co-opted signaling and cellular pathways from the more widespread and much older endomycorrhizal symbiosis (Parniske, 2008; Geurts & Vleeshouwers, 2012; Ivanov et al., 2012). In addition, at least some of these features are shared between these symbioses and the actinorhizal symbiosis involving various plants from the Fagales, Rosales and Curcubitales orders and nitrogen-fixing bacteria of the genus Frankia (Pawlowski & Demchenko, 2012).

Recent studies have revealed that mycorrhizal fungi of the genus Glomus and rhizobia secrete very similar lipo-chitooligosaccharide (LCO) signal molecules, which are termed Nod factors in rhizobia and Myc-LCOs in endomycorrhizal fungi (Maillet et al., 2011). Here we investigate the evolutionary history of a rhizobial Nod factor receptor gene named LjNFR1a and MtLYK3 in the legume model species Lotus japonicus and Medicago truncatula, respectively.

In legumes, perception of rhizobial Nod factors requires two genes encoding proteins of the LysM-domain receptor kinase family, named LjNFR1a and LjNFR5 in L. japonicus, and MtLYK3 and MtNFP in M. truncatula (Limpens et al., 2003; Radutoiu et al., 2003; Arrighi et al., 2006). These proteins have a similar structure: an extracellular domain containing three LysM domains, a transmembrane domain and an intracellular kinase domain. Mutations in either of these genes can block Nod factor-induced responses, suggesting that these proteins function in conjunction, a hypothesis that is supported by biochemical studies (Madsen et al., 2011; Broghammer et al., 2012; Pietraszewska-Bogiel et al., 2013).

The finding that Myc-LCOs and Nod factors are structurally very similar reinforces the hypothesis that the rhizobial symbiosis is derived from the endomycorrhizal symbiosis. It is plausible that, at a certain point in evolution, nitrogen-fixing rhizobia gained the capacity to synthesize molecules imitating Myc-LCOs, allowing them to hijack endomycorrizal infection mechanisms. Interestingly, in both model legumes neither of the Nod factor receptors MtLYK3-LjNFR1a or MtNFP-LjNFR5 is essential for the endomycorrhizal symbiosis (Catoira et al., 2001; Radutoiu et al., 2003). This suggests that other proteins are involved in the perception of Myc-LCOs, allowing legume hosts to discriminate between the two symbionts. The structural resemblance between Nod factors and Myc-LCOs suggests that Nod factor receptors evolved from ancestral Myc-LCO receptors (Zhang et al., 2007, 2009). Recent studies indicate that this is indeed the case for MtNFP-LjNFR5 (Op den Camp et al., 2011b; Young et al., 2011). Phylogenetic reconstruction has revealed that putative orthologs of MtNFP-LjNFR5 are present in many nonlegume species (Zhu et al., 2006), with the exception of Arabidopsis thaliana which is unable to establish endomycorrhizal symbiosis (Streng et al., 2011). In legumes, the MtNFP-LjNFR5 gene experienced a duplication event that was possibly driven by a whole genome duplication (WGD; Streng et al., 2011; Young et al., 2011). This WGD occurred early in the history of the Papilionoideae subfamily, but after evolution of the rhizobial symbiosis (Cannon et al., 2010). Transcriptome profiling studies in M. truncatula showed that MtLYR1, a paralog of MtNFP, is induced specifically during mycorrhization (Gomez et al., 2009; Young et al., 2011). In addition, these studies revealed that, although not essential for the mycorrhizal symbiosis, MtNFP is also still involved in Myc-LCO-induced gene expression (Czaja et al., 2012). Taken together, these results suggest the following model: the gene ancestral to MtNFP acted as a Myc-LCO receptor that was co-opted for Nod factor perception when the rhizobial symbiosis evolved in legumes. When this ancestral gene was duplicated, the two functions were separated by subfunctionalization and MtNFP became a Nod factor receptor. Then, MtLYR1 could have retained the ancestral function, and would then be a Myc-LCO receptor gene. Additional evidence supporting this model has emerged from studies on Parasponia (Cannabaceae), the only genus outside the legume family able to form the rhizobial symbiosis. Given that Parasponia is relatively distant from the legume family, the Parasponia–rhizobium symbiosis most likely evolved independently from the legume–rhizobium symbiosis. Furthermore, based on the phylogenetic position of the Parasponia genus as a single nodulating lineage amongst non-nodulating relatives, symbiosis in this genus most likely evolved relatively recently. In Parasponia, the rhizobial symbiosis requires Nod factor signaling as in the vast majority of legume–rhizobium interactions. Studies on the putative ortholog of MtNFP showed that Parasponia andersonii has only a single copy of this gene that is required for both rhizobial and endomycorrhizal interaction (Op den Camp et al., 2011b).

Previous phylogenetic analysis has shown that the MtLYK3-LjNFR1a gene family experienced several duplication events, including series of tandem duplications (Zhang et al., 2007, 2009; Lohmann et al., 2010). This resulted in small clusters of paralogous genes. Interestingly, A. thaliana also contains a gene similar to MtLYK3-LjNFR1a, called AtCERK1, which encodes a receptor of chitin oligomers mediating defense against fungal pathogens (Zhu et al., 2006; Miya et al., 2007; Wan et al., 2008; Liu et al., 2012). As LCOs are themselves acylated chitin oligomers, it suggests close functional relationships between AtCERK1 and MtLYK3-LjNFR1a. These observations raise the hypothesis that the ancestor of MtLYK3-LjNFR1a was a defense-related gene rather than a Myc-LCO receptor. This hypothesis is strengthened by the fact that Nod factor application transiently induces the expression of defense-related genes in L. japonicus, a response that is dependent on LjNFR1a (Nakagawa et al., 2011; Serna-Sanz et al., 2011). Recently, the mechanisms of the putative transition from innate immune receptor function of AtCERK1 to the function of symbiotic receptor of LjNFR1 were investigated (Nakagawa et al., 2011). Using chimeric gene constructs, it was shown that the kinase domain of AtCERK1 is unable to complement the L. japonicus LjNFR1a loss-of-function mutant. This is due to divergence of two small domains in the kinase; the activation loop (AL) and a three-amino acid motif (YAQ) in the αEF helix. This led to the hypothesis that these changes mediated the shift from defense to symbiosis (Nakagawa et al., 2011).

The legume family is separated into three subfamilies: the basal and polyphyletic Caesalpinioideae (containing, according to the current taxonomy, c. 2250 species), the Mimosoideae (c. 3270 species) and the Papilionoideae (c. 13 800 species; Legume Phylogeny Working Group, 2013). All three subfamilies contain nodulating lineages, albeit at very different frequencies (Sprent, 2007). The rhizobial symbiosis is assumed to have evolved a few million years after the emergence of legumes either as a single evolutionary event before the split in subfamilies or, alternatively, in parallel events in different legume lineages (Doyle, 2011). Within the Caesalpinioideae subfamily, the tribes Cercideae and Detarieae are consistently placed in a basal position of the legume phylogeny, and therefore are very likely to be external to the evolutionary event of acquisition of the rhizobial symbiosis (Young & Johnston, 1989; Doyle, 1998; Wojciechowski et al., 2004; Lavin et al., 2005; Sprent, 2007; Legume Phylogeny Working Group, 2013). These species therefore constitute the closest outgroup with respect to nitrogen-fixing legumes. By investigating genes similar to MtLYK3-LjNFR1a in the basal legume Cercis chinensis, a representative of the Cercideae tribe, we aimed to determine whether the initial duplication event in this gene cluster predates the emergence of the nitrogen-fixing rhizobial symbiosis in legumes. Additionally, we sought signatures of functional divergence between paralogous copies in order to identify signatures of neofunctionalization and to provide insights into the ancestral function of this class of LysM domain containing receptor-like kinases.

Materials and Methods

Bioinformatics analysis

In the rest of this article, the term homologs will be used generically for genes belonging to the same gene family, whether or not they belong to the same species, were generated by duplication or speciation, or are functionally related. The coding sequence of MtLYK3 homologs in three legumes (Medicago truncatula Gaertn., Lotus japonicus (Regel) K. Larsen and Glycine max (L.) Merr.) including PsSYM37, the ortholog of MtLYK3 in Pisum sativum L. (Zhukov et al., 2008), and two nonlegumes (Arabidopsis thaliana (L.) Heynh. and Vitis vinifera L.) were retrieved from GenBank (Limpens et al., 2003; Arrighi et al., 2006; Zhu et al., 2006; Zhang et al., 2007; Lohmann et al., 2010). In addition, we used genome annotation data of the legume Cajanus cajan (L.) Huth (Varshney et al., 2012) and four additional nonlegume species: Malus domestica Borkh. (Velasco et al., 2010), Populus trichocarpa Torr. & A. Gray ex Hook. (Tuskan et al., 2006), Prunus persica (L.) Batsch (International Peach Genome Initiative et al., 2013), Fragaria vesca L. (Shulaev et al., 2011) and Cannabis sativa L. (Van Bakel et al., 2011). All genes and accession numbers used are listed in Table 1. We identified homologous genes using BLASTN from the BLAST+ package (Altschul et al., 1990) against predicted coding sequence databases and selected MtLYK3-LjNFR1a homologs based on preliminary phylogenetic analysis.

Table 1. List of genes included in this study
  1. Location is given as a chromosome or scaffold identifier when available; position and strand are given relative to this chromosome or scaffold (position is the start of the coding sequence). Length is given in base pairs (coding sequence, including the stop codon). Exon numbers are preceded or followed by an asterisk if the sequence is 5′ or 3′ partial, respectively.

Brassicaceae Arabidopsis thaliana AtCERK1 AB367524chr. 37615 530/−18541–12Zhu et al. (2006)CERK1
Cannabaceae Cannabis sativa CasLYK1 PK21129scaf. 298735118/+10165–12*This studyCERK1
CasLYK2 HG426464scaf. 236148827/−18001–12This studyOutgroup
CasLYK3 PK06694scaf. 2038559 827/−17431–12This studyOutgroup
Rosaceae Fragaria vesca FvLYK1 gene00531chr. 631 071 242/−18631–12This studyCERK1
FvLYK2 gene19496chr. 3750 197/−18661–11This studyOutgroup
Malus domestica MdLYK1 MDP0000175360chr. 178605 950/−15661–10*This studyCERK1
MdLYK2 MDP0000136494chr. 97623 570/−19591–12This studyCERK1
MdLYK3 HG426465chr. 51019 451/+17551–11This studyOutgroup
Prunus persica PpLYK1 ppa019968mscaf. 316 367 422/+18271–12This studyCERK1
PpLYK2 ppa003023mscaf. 316 353 922/+9606–12*This studyCERK1
PpLYK3 ppa017142mscaf. 4760 712/+18661–12This studyOutgroup
Salicaceae Populus trichocarpa PtLYK1 XM_002301574chr. II19 188 799/+18451–12This studyCERK1
PtLYK2 XM_002321105chr. XIV8263 253/−18391–12This studyCERK1
PtLYK3 XM_002317109chr. XI735 066/−18631–11This studyOutgroup
Vitaceae Vitis vinifera VvLYK1 GSVIVT01030482001chr. 126054 569/−18451–12Zhang et al. (2009)CERK1
VvLYK2 GSVIVT01012662001chr. 10416 409/+18721–12Zhang et al. (2009)Outgroup
VvLYK3 GSVIVT01012665001chr. 10423 032/+18631–12Zhang et al. (2009)Outgroup
Fabaceae Cajanus cajan CacLYK1 C.cajan_09999chr. 320 062 965/+18031–12This studyA
CacLYK2 C.cajan_15801chr. 84755 526/+19741–12This studyC
CacLYK3 C.cajan_16785chr. 815 783 159/+18831–12This studyOutgroup
CacLYK4 HG426463scaf. 126590114 028/−19231–12This studyB
Cercis chinensis CecLYK1 HG426462Unknown 16141–12This studyB
CecLYK2 HG426462Unknown 19141–12This studyC
Glycine max GmLYK2 GmW2098N11.15chr. 248 547 625/+18541–12Zhang et al. (2007)B
GmNFR1a GmW2098N11.16chr. 248 554 799/+18601–12Zhang et al. (2007)A
GmNFR1b GmW2098N15.9chr. 143509 324/−18601–12Zhang et al. (2007)A
GmLYK3 GmW2026N19.18chr. 158678 938/−18841–12Zhang et al. (2007)Outgroup
GmLYK2b Gm0062x00016chr. 2016 207 633/−18481–12Zhang et al. (2009)C
Lotus japonicus LjNFR1a AJ575248chr. 238 634 189/+18661–12Zhu et al. (2006)A
LjNFR1b AJ575249chr. 238 618 061/+18931–12Zhu et al. (2006)B
LjNFR1c AB503681chr. 238 611 968/+18031–12Zhu et al. (2006)B
   LjLYS6 AB503687chr. 613 743 066/−18631–13Lohmann et al. (2010)C
LjLYS7 AB503688chr. 621 859 958/+18661–12Lohmann et al. (2010)Outgroup
Medicago truncatula MtLYK1 AY372401chr. 536 372 823/+17721–11Limpens et al. (2003)B
MtLYK2 AY372420chr. 536 297 139/+18391–12Limpens et al. (2003)A
MtLYK3 AY372406chr. 536 224 960/+18631–12Limpens et al. (2003)A
MtLYK6 AY372404chr. 536 189 086/+17251–11Limpens et al. (2003)B
MtLYK7 AY372405chr. 536 184 066/+18631–12Limpens et al. (2003)B
MtLYK8 MtD06512Unknown 17971–12Arrighi et al. (2006)Outgroup
MtLYK9 XM_003601328chr. 325 682 831/+18661–13Arrighi et al. (2006)C
Pisum sativum PsSYM37 EU564096Unknown 18541–12Zhukov et al. (2008)A

Identification of MtLYK3-LjNFR1a homologs in Cercis chinensis

A BAC library of the Cercis chinensis Bunge individual NA63335, obtained from the USDA/ARS U.S. National Arboretum, was constructed. The genome size was determined at 350 Mbps with a ploidy level of 2n = 14 (data not shown). High molecular weight DNA, isolated from nuclei, was partially digested with HindIII and ligated in the pCC1BAC cloning vector (Epicentre, Madison, WI, USA). The average insert size was 165 kb and a total number of 36 814 clones were picked, resulting in a 9.5× genome coverage. BAC clone DNA was spotted on two Hybond N+ filters (Amersham Biosciences, Pittsburgh, PA, USA) and the filters were screened for the presence of MtLYK3-LjNFR1a homologs. To this end, two probes were designed and amplified on C. chinensis root cDNA using the following primers: CsProbe_1_F (GGTGCAAATTGCTCTGGATT), CsProbe_1_R (TTGGGTAGTTATCGCCAAGC), CsProbe_2_F (CTGCTCAAGATGGGAAGGTC) and Cs_Probe_2_R (GCAACTTTCTCGCCTCTCAG). BAC clones harboring MtLYK3-LjNFR1a homologs were identified and grouped in contigs based on RFLP analysis. Two BACs were shotgun sequenced using next-generation sequencing. Clustering and DNA sequencing of the BACs was performed according to manufacturers' protocols trough paired-end sequencing using the Illumina Genome Analyzer II. A total of 8 pmol of DNA was used. The BAC vector sequence and contaminating E. coli sequence tags were removed from the dataset by collecting reads using either the BOWTIE 10.3 short read aligner or NextGENe v1.65 software (Softgenetics, State College, PA, USA). After filtering, the reads were assembled into contigs using the short-read assembler PEassembly from the NextGENe software package.

Molecular phylogeny and molecular evolution analyses

The coding sequences of MtLYK3-LjNFR1a and MtLYK8-LjLYS7 (which was used as outgroup) homologs in all studied species were translated and then aligned manually. The coding sequence alignment was derived from the amino acid alignment. The alignment length was 2793 bp for 42 sequences (including stop codons). We manually removed uninformative regions (regions of poor homology or containing mostly alignment gaps) and used the resulting 1503-bp alignment. Sequences, unfiltered and unfiltered alignments are available in Supporting Information Notes S1. Phylogenetic trees were reconstructed using maximum likelihood with PHYML v20130329 (Guindon et al., 2009) using the GTR+Γ6 model of evolution of nucleotide sequences, a BIONJ starting tree and 1000 bootstrap repetitions. We divided the phylogeny into five unrooted subtrees based on the identified clades (see the 'Results' section) and the alignment in an extracellular (amino acid residues 1 to 191 of the cleaned alignment, corresponding to amino acid residues 24–223 of the MtLYK3 sequence) and an intracellular (amino acid residues 192 to 501 of the cleaned alignment, corresponding to amino acid residues 303–619 of the MtLYK3 sequence) domains. Codon evolution models M1a and M2a were adjusted using CODEML from the PAML package version 4.6 (Yang, 2007) for both the extracellular and intracellular part of the alignment and for each of the five subtrees. Statistical significance of M2a relatively to M1a was evaluated by the likelihood ratio test based on a χ2 distribution with two degrees of freedom. In the case where M2a was significant, codon sites likely to fall into the positive selection category were identified using results of the Bayes empirical Bayes procedure (Yang et al., 2005) reported by CODEML.


Phylogenetic reconstruction of the MtLYK3-LjNFR1a gene family

In order to decipher the evolutionary history of the MtLYK3-LjNFR1a gene family in relation to the origin of the rhizobial symbiosis in the Fabaceae we included a caesalpinioid legume species of the tribe Cercideae in our evolutionary analysis. To this end Cercis chinensis was selected as it has a relatively small genome size of c. 350 Mbp. A C. chinensis bacterial artificial chromosome (BAC) library with a 10× genome coverage was constructed and screened by southern blotting using MtLYK3-LjNFR1 homologous sequences as probe. The identified BAC clones grouped in two contigs, and a representative clone of each contig was sequenced and subsequently annotated. This revealed that both BAC clones presented heterozygous copies of the same region containing a small gene cluster of LysM-type receptor kinases (Fig. 1).

Figure 1.

Genomic organization of MtLYK3-LjNFR1a homologs and pseudogenes. All gene names include a two- or three-letter prefix indicating the species of origin. The representation is not to scale. Distances between genes (based on coding sequence) and gaps are given in base pairs (bp) or kilobase pairs (kb). Colors represent groups defined after the phylogenetic analysis (see Fig. 2). Pseudogenes and partial sequences are represented by dashed lines and stripes at gene ends, respectively. Species of origin: At, Arabidopsis thaliana; Cas, Cannabis sativa; Fv, Fragaria vesca; Md, Malus domestica; Pp, Prunus persica; Pt, Populus trichocarpa; Vv, Vitis vinifera; Cac, Cajanus cajan; Cec, Cercis chinensis; Gm, Glycine max; Lj, Lotus japonicus; Mt, Medicago truncatula.

Only a phylogenetic analysis can unambiguously resolve the evolutionary history of gene duplications. To reconstruct the phylogeny of the MtLYK3-LjNFR1a gene family, we included the homologous protein sequences from C. chinensis, five papilionoid legume species (Medicago truncatula, Pisum sativum with only one sequence available, Lotus japonicus, Glycine max and Cajanus cajan) three Rosaceae species (Fragaria vesca, Malus domestica and Prunus persica) one Cannabaceae species (Cannabis sativa), one Salicaceae species (Populus trichocarpa), and two species outside of the Fabidae clade (Arabidopsis thaliana and Vitis vinifera). In addition to MtLYK3-LjNFR1a-like genes, we also included homologs of MtLYK8-LjLYS7, a LysM-domain receptor kinase gene that is closely related (Arrighi et al., 2006). In legumes, many pseudogenes can be identified due to disrupted reading frame or incomplete coding sequences. These pseudogenes were not included in the phylogenetic analysis. MtLYK4 is likely a chimeric copy of other paralogs as its extracellular domain is nearly identical to MtLYK3 (Limpens et al., 2003) and its intracellular domain is most similar to other copies (Zhu et al., 2006). A preliminary phylogenetic analysis supported that view (data not shown). Tandem duplications of MtLYK3-LjNFR1a genes can be identified in all legume species and, outside the legume family in P. persica (taking into account a pseudogene). In total, we retrieved 42 genes from 13 different species with 1 up to 7 copies per species (Table 1; Notes S1). These were used to construct a phylogenetic tree based on an alignment of coding sequences. This resulted in a tree with two main clades representing MtLYK8-LjLYS7 homologs and MtLYK3-LjNFR1a homologs (Fig. 2). Except A. thaliana, which has no MtLYK8-LjLYS7 homolog, all included species have at least one gene in both groups. The topology of both groups is consistent with species phylogeny; V. vinifera genes emerge first, followed by P. trichocarpa genes, then genes from the Rosales species forming a clade, and finally legume genes. This pattern strongly suggests that the initial duplication that gave rise to MtLYK8-LjLYS7 and MtLYK3-LjNFR1a is the oldest point of the tree and allows us to deduce the location of the root (Fig. 2).

Figure 2.

Maximum-likelihood phylogenetic tree of the MtLYK3-LjNFR1a and MtLYK8-LjLYS7 gene families. All gene names include a two- or three-letter prefix indicating the species of origin. Robustness of the topology was assessed by 1000 bootstrap replications. Portions of the tree poorly supported (< 700 replications) are represented by dotted lines. Duplication nodes are indicated by white circles and were inferred by strict parsimony-based reconciliation of gene and species trees. Most recent common ancestors are represented by black diamonds (ancestor of all legume genes) and black (ancestor of a monophyletic group) and white (ancestor of a paraphyletic group) squares. CERK1 is the group gathering all nonlegumes homologs of MtLYK3-LjNFR1a in nonlegumes, and A, B and C are legume-specific groups predating the origin of the legume family. Species of origin: At, Arabidopsis thaliana; Cas, Cannabis sativa; Fv, Fragaria vesca; Md, Malus domestica; Pp, Prunus persica; Pt, Populus trichocarpa; Vv, Vitis vinifera; Cac, Cajanus cajan; Cec, Cercis chinensis; Gm, Glycine max; Lj, Lotus japonicus; Mt, Medicago truncatula.

Two duplications predate the origin of the legume family and of the nitrogen-fixing symbiosis

We identified the nodes that correspond to duplication events through reconciliation of the gene tree with the species phylogeny (Fig. 2). Except for the ancestral duplication that generated the MtLYK8-LjLYS7 and MtLYK3-LjNFR1a clades, duplications that are shared by at least two of the species we examined are restricted to the MtLYK3-LjNFR1a clade in legumes. The legume-specific group within the MtLYK3-LjNFR1a clade is divided into three subclades, all strongly supported by the bootstrap analysis. We named these groups A, B and C, with the Nod factor receptor genes represented by group A. All three groups contain at least one gene copy of the symbiotic papilionoid legumes M. truncatula, L. japonicus, G. max and C. cajan. The single P. sativum sequence clusters with MtLYK3, confirming that the two genes are functional orthologs and that the MtLYK2MtLYK3 duplication predates the MedicagoPisum divergence. Groups B and C both contain also a C. chinensis gene, whereas a representative of group A was not identified in this species. To rule out the possibly that an A gene exists in C. chinensis but had been missed in the analysis, a PCR experiment was conducted using degenerate primers and genomic DNA of C. chinensis and a related species, Cercis siliquastrum L. This experiment also failed to identify the A gene, but yielded both B and C gene copies in C. siliquastrum (data not shown). Therefore, it seems that Cercis does not have a homolog of Nod factor receptor genes.

As the Cercideae are the most basal legume tribe, and as Cercis has genes belonging to both the B and C groups, we can conclude that the two rounds of duplication preceded the origin of the legume family, giving rise to three copies represented by group A, B and C (A being the most basal), followed by the loss of the A gene in Cercis species. Therefore the duplications occurred earlier than the legume WGD (which is specific to the Papilionoideae), but also before the evolution of the rhizobial symbiosis.

The legume-specific copies of MtLYK3 might result from tandem duplications

In M. truncatula, L. japonicus and G. max, group A, B and C genes are always organized as two separated loci. The genes from group A, including Nod factor receptors, are downstream of one or more genes from group B. In G. max this locus is duplicated, probably due to a recent WGD event. In one of the paralogous regions the group B gene copy has been lost. However, the pseudogene can still be located (Fig. 1, chr. 14). The cluster of A and B genes has been more widely extended in M. truncatula with two group B genes, two group A genes and a chimeric copy (MtLYK4), all interspersed with multiple pseudogenes. With the exception of MtLYK5, pseudogenes are truncated and seem to belong to group B (data not shown). In all papilionoid legumes investigated, a single C gene has been identified located at different, unlinked locus. By contrast, in the caesalpinioid legume C. chinensis, group B and C genes are linked in a single locus. The most parsimonious scenario to explain this divergence between papilionoid and caesalpinioid legumes is that the original duplications which gave birth to the A, B and C groups were initially in tandem. For the B and C genes this organization remained conserved in C. chinensis, whereas in an ancestral papilionoid legume the chromosomal localization of the C gene changed. This can be caused by a translocation event, gene conversion or after a WGD followed by complementary gene loss (loss of the C copy on one homologous chromosome, and loss of A and B copies on the other).

Gain of conserved motifs in kinase domain is not specific for legume MtLYK3-LjNFR1a receptors

Domain swapping experiments between AtCERK1 and LjNFR1a showed that two regions of the kinase domain of LjNFR1a, namely the activation loop and a three amino acid (YAQ) motif, are required for symbiotic signaling. This has led to the hypothesis that adaptation of the activation loop and gain of the YAQ motif mediating shifting the kinase function from defense to symbiosis (Nakagawa et al., 2011). We aimed to consider this hypothesis from a phylogenetic perspective.

We used WebLogo (Crooks et al., 2004) in order to summarize the conservation of the kinase region containing the activation loop and the YAQ motif (corresponding to positions 464–500 of the MtLYK3 sequence). We defined five clades defined after the phylogenetic tree (Fig. 2): the MtLYK8-LjLYS7 homologs (outgroup), the nonlegume MtLYK3-LjNFR1 homologs including AtCERK1 (CERK1) and the three legume-specific clades A, B and C. We generated one logo for each of those clades (Fig. 3). Strikingly, the YAQ motif is perfectly conserved not only in group A (containing Nod factor receptors), but also in the MtLYK8-LjLYS7 outgroup, not known to play a role in the rhizobial symbiosis. Furthermore, we found that the YAQ motif is also strongly conserved in the CERK1 clade, except for AtCERK1 itself and for PpLYK2 which has a YAR motif instead.

Figure 3.

Sequence logos showing amino acid conservation of the activation loop and YAQ regions of the kinase domain of MtLYK3-LjNFR1a and MtLYK8-LjLYS7 (outgroup) homologs within each of the five clades defined after the phylogenetic analysis. The left and right dotted frames indicate the activation loop and YAQ regions, respectively. Letter size is proportional to the conservation of the corresponding residues. Gaps in logos indicate gaps in the alignment.

Next we focused on the activation loop domain. In line with previous analysis (Nakagawa et al., 2011), we found that the activation loop domain in Nod factor receptors of group A is well conserved, with a consensus motif xEVGxSTLxTRLV. The CERK1 clade shows a similar consensus motif TEVGSxSLPTRLV, which again is strongly degraded in AtCERK1 (seven substitutions with respect to this consensus) and shows some variation in PtLYK2 (P replaced by A) and PpLYK2 (E replaced by I). Consistently, a similar motif was found in the MtLYK8-LjLYS7 group, xExGSxSLxTRLV, with two exceptions; E replaced by V in VvLYK3 and first L replaced by I in VvLYK2, the latter likely to be functionally neutral. We conclude that the activation loop and YAQ motif are essentially conserved in the Nod factor receptor clade (group A), the CERK1 clade as well as the MtLYK8-LjLYS7 outgroup, except for minor variation in duplicated genes and, more significantly, for AtCERK1. The latter appears to have undergone a lineage-specific degradation of these two motifs. Therefore, it is likely that the regions necessary for symbiotic signaling predated the MtLYK3-LjNFR1a- MtLYK8-LjLYS7 duplication and a fortiori the origin of the Rosids clade of plants, and have been lost secondary in the A. thaliana lineage rather than gained in the legume family.

We also examined the conservation of the activation loop and YAQ motif region in the legume-specific group B and C genes. Both regions are perfectly conserved in group C, and highly similar for group A (Fig. 3). By contrast, both regions are strongly degenerated and highly variable in proteins encoded by group B genes. Interestingly, the YAQ motif is never found in this clade while other regions of the kinase are better conserved. This shows that the symbiotic motifs present in the activation loop and YAQ domain were also lost in clade B genes following gene divergence.

Molecular evolution shows that legume paralogs have undergone functional divergence

Changes in protein function, such as those occurring during episodes of neofunctionalization, are susceptible to modify patterns of selective pressures on protein sequences. Selective constraints can be measured by determining the ratio of the nonsynonymous to synonymous rates of nucleotide substitutions, a ratio usually termed ω. On the one hand, stronger constraints on the protein sequence owing to purifying selection reduce the number of allowed amino acid substitutions and consequently lower the value of ω. On the other, fast rates of adaptive nonsynonymous changes (positive selection) can result in increased ω ratios. Positive selection is more confidently detected through the presence of positions with ω > 1 rather than by an elevation of the global ω ratio (Yang & Bielawski, 2000). To investigate possible changes in overall evolutionary constraint in the protein sequences in the different clades of the phylogeny of MtLYK3-LjNFR1a homologs following legume-specific duplications, we fitted two models of Yang et al. (2000): M1a is a two-ratio model allowing constrained (ω < 1) and neutral (ω = 1) positions and M2a is a three-ratio model allowing constrained (ω < 1), neutral (ω = 1) and positively selected (ω > 1) positions. These models allowed us to evaluate patterns of purifying constraints in the different clades and test for occurrence of positive selection. We conducted the analysis for the extracellular and intracellular portions of the alignment (Table 2).

Table 2. Result of evolutionary analyses based on the nonsynonymous to synonymous evolutionary rates ratio (ω)
CladeP-valueω0ω1ω2 p 0 p 1 p 2 Predicted positions
  1. The P-value gives the result of the test comparing the three-ratio (M2a) to the two-ratio (M1a) models. If the test is significant (one instance: clade A for the extracellular domain), results of the M2a are provided. Otherwise, the results of M1a are provided. ω0, ω1 and ω2, the three rate categories which are constrained to be < 1, = 1 and > 1, respectively; p0, p1 and p2, their respective frequencies. ω2 and p2 are not defined in M1a. In case M2a is significant, positions with > 0.95 posterior probability of falling into category ω2 are given with respect to the MtLYK3 sequence.

Extracellular domain
Outgroup> 0.990.111.00.660.34
A< 10−, 45, 77
C> 0.990.061.00.770.23
Intracellular domain
Outgroup> 0.990.081.00.920.08
CERK1> 0.990.071.00.960.04
A> 0.990.061.00.890.11
B> 0.990.101.00.690.31
C> 0.990.

First we analyzed the MtLYK8-LjLYS7 clade and the nonlegume CERK1 clade. In both cases, most positions were fairly conserved, with nearly two thirds of the positions in the extracellular domain and over 90% of the positions in the intracellular domain falling into the constrained positions category. Among the three legume clades (groups A, B and C) proteins of the C clade exhibited the most similar pattern to CERK1, with no positive selection and a high proportion of strongly constrained positions. By contrast, clade B proteins showed a marked relaxation of selective constraints, with up to 54% of extracellular and 31% intracellular positions classified as neutral (Table 2). The elevation of evolutionary rates in the kinase domain is particularly notable. However, despite this elevation of ω ratios, no position under positive selection was detected. By contrast, for the group A proteins we detected highly significant signatures of positive selection, which were all restricted to the extracellular domain. The estimated proportion of positions under positive selection (4%) corresponds to eight residues. In fact, only three positions were predicted with high statistical confidence to fall in the category where ω > 1 (positions in the MtLYK3 protein sequence: 43Q, 45R and 77G). These all are within the first LysM chitin oligomer binding motif. Apart from these signatures of positive selection, clade A proteins show patterns of purifying constraints that are similar to CERK1.

The patterns of molecular evolution strongly suggest that the three copies of MtLYK3-LjNFR1a homologs in legumes have diverged functionally, supporting a model of neofunctionalization, assuming that group C proteins fulfill the ancestral function associated with CERK1 proteins in other species (with the notable exception of the A. thaliana gene) while clades A and B proteins have evolved new functions, namely Nod factor receptors in the case of clade A. The occurrence of positive selection in the ligand-binding domains of group A proteins strongly points to coevolution with Nod factors.


We investigated the evolutionary history of Nod factor receptors from the MtLYK3-LjNFR1a gene family in order to understand the mechanisms that allowed them to gain their legume-specific function. We found that this gene family underwent two duplications in a relatively short time window that predates the origin of the legume family and a fortiori of the rhizobial symbiosis. We documented signatures of functional divergence between the resulting three paralogous copies, signatures of adaptive evolution between orthologous genes that encode rhizobial Nod factor receptors, and the loss of this gene copy in the basal and non-nodulating Cercis lineage. Based on these results, we argue that an event of gene duplication and subsequent neofunctionalization in the MtLYK3-LjNFR1a lineage was fundamental for the evolutionary gain of a legume-specific Nod factor receptor.

The phylogeny of LysM-domain receptor kinases in plants has been studied extensively (Arrighi et al., 2006; Zhu et al., 2006; Zhang et al., 2007, 2009; Lohmann et al., 2010). However, none of these studies was specifically focused on the MtLYK3-LjNFR1a clade. Instead, they all considered the whole LysM receptor family. By focusing specifically on the Nod factor receptor clade, a more reliable reconstruction of the phylogeny of this clade was achieved. The phylogenetic tree presented here is robust and distinct from those presented in earlier studies. By including a representative species of the most basal legume tribe (the Cercideae), we demonstrated that two rounds of duplication in the MtLYK3-LjNFR1a clade are ancestral to all speciation events in the legume family. This means that these duplications predate the probable origin of the legume–rhizobium symbiosis.

A WGD is known to have occurred in the Papilionoideae subfamily shortly after the origin of the rhizobial symbiosis in this lineage (Cannon et al., 2010; Young et al., 2011). As a result, this WGD did not affect more basal legume lineages of the Caesalpinioideae and Mimosoideae subfamilies although nodulating lineages exist in both. Several genes involved in the rhizobial symbiosis have been maintained as paralogous gene pair in papilionoid legumes after the WGD (Op den Camp et al., 2011a; Young et al., 2011; Ivanov et al., 2012). Many of those paralogous gene pairs show signatures of functional redundancy, suggesting that the ancestral gene (before the WGD event) already had a symbiotic function. Mechanisms such as complementary degenerative mutations may be progressively separating the functions of the two paralogs (subfunctionalization), a process that may occur independently in the different legume lineages. As the duplications in the MtLYK3-LjNFR1a clade are ancestral to the WGD event in papilionoid legumes, it may have allowed complete functional divergence (neofunctionalization) of the paralogous genes (Lynch & Conery, 2000).

A line of strong evidence supports the hypothesis that the three clades (A, B and C) in the MtLYK3-LjNFR1a lineage underwent functional divergence. Functional analyses in L. japonicus and M. truncatula demonstrated that LjNFR1a and MtLYK3 (group A) have a symbiotic function. It is likely that one or both of the copies GmNFR1a and GmNFR1b generated by a G. max-specific WGD event fulfill the same function. Strikingly, C. chinensis lacks a putative ortholog of this gene. Because this species contains gene copies of the other two clades (B and C), and because the emergence of the A clade predates the emergence of the other two, the Cercis lineage must have lost the ortholog of MtLYK3-LjNFR1a Nod factor receptors. The lack of large-scale genomic or transcriptomic sequences in these nonmodel species makes it impossible to definitively rule out the presence of an A-clade copy. However, we studied a second species of the same genus (C. siliquastrum) in which we isolated homologs falling in the B and C clades, but not in the Nod factor receptor clade A. The loss of this gene in the Cercis lineage makes sense because these plants do not establish the rhizobial symbiosis.

It is currently unclear whether the rhizobial symbiosis evolved once or several times independently within legumes (Doyle, 2011). The single evolution hypothesis would place the event at the position of the common ancestor of Papilionoideae and Mimosoideae, including most of the Caesalpinioideae but not the Cercideae. The finding that the ancestor of the Cercideae had an A-clade gene supports this scenario. However, our results do not rule out the multiple evolution hypothesis. In either case, both hypotheses clearly imply several events of loss of the rhizobial symbiosis. Furthermore, we cannot rule out less parsimonious models where the evolution of nodulation occurred even earlier, as we have no means of determining whether the Cercis lineage never evolved the rhizobial symbiosis or, alternatively, inherited the common ancestor of legumes and lost it subsequently.

One notable feature of the rhizobial symbiosis in legumes is its complex pattern of host–symbiont specificity exhibiting a wide spectrum ranging from generalists to specialists (Perret et al., 2000). Among other factors, structural characteristics of Nod factors were shown to control specificity at early stages of the symbiosis (Downie, 2010). Therefore, it is likely that the extracellular domain of Nod factor receptors coevolved with the Nod factor structure of the bacterial symbiont. Our phylogenetic analysis singled out a few sites in the first LysM domain evolving rapidly in MtLYK3-LjNFR1a proteins, which was not observed in the other two clades. Therefore, we hypothesize that these residues may be instrumental for host specificity. However, the precise role of the first LysM domain in ligand binding remains elusive. Biochemical and structural characterization of the extracellular region of AtCERK1 suggested that the second LysM domain is involved exclusively in ligand binding (Liu et al., 2012). This is supported by studies on the MtNFP-LjNFR5 Nod factor receptor that underlined the importance of the second LysM domain in ligand specificity (Radutoiu et al., 2007; Bensmihen et al., 2011). However, one of the sites we detected (43Q in both MtLYK3 and PsSYM37 sequences) was shown to be linked with within-species variation of sensitivity to Nod factor structure in P. sativum (Li et al., 2011), supporting the hypothesis that the first LysM domain contributes to ligand specificity.

In the remaining two clades, opposed evolutionary constraints were observed. LysM receptors of the B clade (LjNFR1b, LjNFR1c, MtLYK1, MtLYK6 and MtLYK7) displayed a striking relaxation of selective constraints, especially in the kinase domain, resulting in a substantial increase in rates of amino acid evolution. The symbiotic-specific conserved region of the kinase domain (activation loop and YAQ motif) was strongly degenerated in all genes of the B clade. One may argue that these genes might actually include pseudogenes and therefore bias evolutionary rates estimates. However several gene copies within the B clade are shared between species that have diverged over tens of million years and divergent gene expression patterns supports functionality of these genes (Limpens et al., 2003; Zhang et al., 2007; Benedito et al., 2008; Lohmann et al., 2010).

Comparative functional analysis of AtCERK1 and LjNFR1a revealed that the activation loop and YAQ motif of the kinase domain of LjNFR1a are required for symbiotic signaling (Nakagawa et al., 2011). It was concluded that these regions appeared during the evolution of legumes and were instrumental for the evolution of the rhizobial symbiosis. However, we present evidence that the amino acid residues constituting symbiotic regions in the activation loop and YAQ motif are actually ancestral to CERK1, and can even be found in the more distal clade of MtLYK8-LjLYS7 homologs. Therefore, we argue that the symbiotic regions in the activation loop and YAQ motif are an ancestral feature, but were lost in the A. thaliana lineage. In parallel, a second loss of the same motifs occurred in the legume-specific B clade. In line with it, we argue that an ancestral function of AtCERK1-like genes in innate defense induction is unlikely, but that this function evolved secondarily and specifically in the A. thaliana lineage. Besides, our results suggest that the clade C in legumes has taken over the ancestral function of CERK1, which is conserved in most nonlegumes (except A. thaliana). It sounds reasonable to suggest that this ancestral function is perception of symbiotic signals from endomycorrhizal fungi.

It is interesting to note that the proteins from the MtLYK8-LjLYS7 clade also contain the activation loop and YAQ motif that is associated with symbiotic signaling. This observation points to an even older origin of symbiotic functions in the LysM receptor kinase gene family. This ancestral function seems to have been inherited by all species we studied, with the exception of A. thaliana. Most of these species are known to be able to form successful endomycorrhizal interactions, including Cercideae species (Alexander, 1989; Wang & Qiu, 2006). Arabidopsis thaliana is a clear exception, but P. trichocarpa might also be another, as it is reported to form exclusively ectomycorrhizae (Harley & Harley, 1987). However, many Populus species are able to form both types of mycorrhizal symbioses and some results suggest that P. trichocarpa can establish both mycorrhizal symbioses (Baum & Makeschin, 2000).

Recently it was demonstrated that the establishment of endomycorrhizae requires Myc-LCOs and that a homolog of the other Nod factor receptor MtNFP-LjNFR5 is necessary for the perception of both rhizobia and endomorrhizal fungi in Parasponia andersonii (Maillet et al., 2011; Op den Camp et al., 2011b). This strongly suggests that Nod factor perception was derived from Myc-LCO perception. As Nod factor perception most probably requires a heterodimeric receptor complex involving both MtNFP-LjNFR5 and MtLYK3-LjNFR1 (Madsen et al., 2011; Pietraszewska-Bogiel et al., 2013), it suggests that besides a homolog of MtNFP-LjNFR5 a second receptor is also needed for Myc-LCO perception. Based on the presence of the kinase symbiotic domains, their expression in root tissue and loss of these genes in A. thaliana, we suggest that genes among the MtLYK8-LjLYS7, CERK1 or legume C clades fulfill such a function. The expression profile of MtLYK8 (probeset ID: Mtr.45170.1.S1_at) in the Medicago Gene Atlas (Benedito et al., 2008) further supports this view, with moderate but consistent induction of expression in endomycorrhizae over several experiments (Gomez et al., 2009; Hogekamp et al., 2011; Ortu et al., 2012). Consistently, MtLYR1 (probeset ID: Mtr.19870.1.S1_at) exhibits the same pattern but not MtNFP (probeset ID: Mtr.15789.1.S1_at). This could mean that the ancestral gene of the gene family gathering MtLYK8-LjLYS7 and MtLYK3-LjNFR1a genes was involved in symbiotic perception in endomycorrhizae.

Although none of the species we examined was actinorhizal, their common ancestor predates the common ancestor of all plants establishing the actinorhizal symbiosis with Frankia spp. (Doyle, 2011). The closest outgroup to actinorhizal plants would be P. trichocarpa. We note that, at this point of evolution, the ancestors of MtLYK8-LjLYS7 and MtLYK3-LjNFR1a genes both already contained the specific kinase domain associated with the symbiotic function. This may have allowed the evolution towards the perception of postulated Frankia signaling molecules in actinorhizal species (Hocher et al., 2011; Pawlowski & Demchenko, 2012).

The study presented here provides an example of a major functional innovation that was predated by duplication events in a key gene. It supports the hypothesis that gene duplications in MtLYK3-LjNFR1a LysM domain receptor kinases have played a key role in the evolution of the rhizobial symbiosis in legumes. Furthermore, our evolutionary analysis provides working hypotheses for the ongoing functional characterization of members of this gene family.


We thank Douglas Cook and Margaret Pooler for collaborating on the construction of the Cercis chinensis genomic library. Three anonymous referees improved the quality of the manuscript through constructive suggestions. This research was funded by grants NWO-VIDI-864.06.007, ERC-2011-AdG-294790 and NWO-NSFC-846.11.005. S.D.M.'s laboratory is supported by the Laboratory of Excellence ARBRE (ANR-2011-LABXARBRE-01).