Tracing the origin and evolutionary history of plant nucleotide-binding site–leucine-rich repeat (NBS-LRR) genes


  • Jia-Xing Yue,

    1. State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210093, China
    2. Department of Ecology and Evolutionary Biology, Rice University, Houston, TX 77005, USA
    Search for more papers by this author
  • Blake C. Meyers,

    1. Department of Plant and Soil Sciences, and Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711, USA
    Search for more papers by this author
  • Jian-Qun Chen,

    1. State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210093, China
    Search for more papers by this author
  • Dacheng Tian,

    1. State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210093, China
    Search for more papers by this author
  • Sihai Yang

    1. State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210093, China
    Search for more papers by this author

Authors for correspondence:
Sihai Yang
Tel: +86 25 83686406

Dacheng Tian
Tel: +86 25 83686406


  • Plant disease resistance genes (R genes) encode proteins that function to monitor signals indicating pathogenic infection, thus playing a critical role in the plant’s defense system. Although many studies have been performed to explore the functional details of these important genes, their origin and evolutionary history remain unclear.
  • In this study, focusing on the largest group of R genes, the nucleotide-binding site–leucine-rich repeat (NBS-LRR) genes, we conducted an extensive genome-wide survey of 38 representative model organisms and obtained insights into the evolutionary stage and timing of NBS-LRR genes.
  • Our data show that the two major domains, NBS and LRR, existed before the split of prokaryotes and eukaryotes but their fusion was observed only in land plant lineages. The Toll/interleukin-1 receptor (TIR) class of NBS-LRR genes probably had an earlier origin than its nonTIR counterpart. The similarities of the innate immune systems of plants and animals are likely to have been shaped by convergent evolution after their independent origins.
  • Our findings start to unravel the evolutionary history of these important genes from the perspective of comparative genomics and also highlight the important role of reorganizing pre-existing building blocks in generating evolutionary novelties.


Plant disease resistance (R) genes encode proteins that serve to sense the invasion of viral, bacterial or fungal pathogens and subsequently trigger a series of downstream immune responses, thereby playing a critical role in the plant innate immune system (Dangl & Jones, 2001). There are two major classes of immune receptors in plants: nonspecific transmembrane pattern recognition receptors (PRRs), for example, receptor-like kinases (RLKs), and specific cytoplasmic receptors, for example, the nucleotide-binding site–leucine-rich repeat (NBS-LRR) proteins. Proteins in the former class recognize conserved pathogen-associated or microbe-associated molecular patterns (PAMPs or MAMPs), whereas those in the latter class recognize pathogen-specific effectors (Chisholm et al., 2006; Jones & Dangl, 2006; Maekawa et al., 2011). In 1992, Johal & Briggs isolated the first plant R gene, Hm1, from maize (Zea mays). Since then, > 70 R genes have been cloned. Among them, NBS-LRR genes comprise the largest class and account for more than half of plant R genes (McHale et al., 2006).

NBS-LRR genes are characterized by encoding an N-terminal variable domain, a central nucleotide-binding site (NBS) domain, and a C-terminal leucine-rich repeat (LRR) domain. Based on whether they also encode an N-terminal Toll/interleukin-1 receptor (TIR) domain, NBS-LRR genes can be further divided into two subclasses, the TIR subclass and the nonTIR subclass (Meyers et al., 1999). In addition to their different domain architectures, NBS-LRR genes from these two subclasses also differ considerably in their phyletic distribution and downstream signaling pathways, suggesting possible functional divergence between them (Aarts et al., 1998; Tarr & Alexander, 2009). In the plant innate immune system, receptors encoded by NBS-LRR genes are localized in the cytoplasm and can specifically recognize viral effectors secreted by pathogens, either directly or indirectly; this recognition subsequently activates a complex downstream signaling pathway usually leading to rapid local cell death, termed the hypersensitive response (HR), around infected sites (Chisholm et al., 2006). Genome-wide investigations conducted in Arabidopsis thaliana, Oryza sativa, Medicago truncatula, Vitis vinifera, and Populus trichocarpa revealed that there are generally hundreds of NBS-LRR genes in plant genomes, reflecting the important roles of these genes (Meyers et al., 2003; Zhou et al., 2004; Ameline-Torregrosa et al., 2008; Yang et al., 2008). However, NBS-LRR genes are often polymorphic between individuals of a host population, and the complete set of these genes defines the repertoire for the detection of polymorphic pathogen effectors (Bakker et al., 2006; Zhang et al., 2009; Maekawa et al., 2011). It is therefore of particular interest to elucidate the origin and evolutionary history of this important gene family and also the evolutionary relationship between TIR- and nonTIR-NBS-LRR genes.

There is another line of plant defenses represented by the LRR-RLK transmembrane receptors which are localized on the plant cell surface. Containing an extracellular LRR domain and an intracellular Pkinase domain, LRR-RLKs are typically able to recognize highly conserved PAMPs, such as flagellin, lipopolysaccharides (LPSs), cold-shock protein, chitin, and β-glucans (Nurnberger & Brunner, 2002). These molecular patterns are often common components that participate in important functions for various microbial species (pathogenic or not). Therefore, resistance driven by LRR-RLKs is generally non-host-specific and may represent a more basal defense strategy. If so, one unanswered question about the plant innate immune system is whether LRR-RLK genes have a more ancient origin than NBS-LRR genes.

Analogous to the case for plants, animal innate immune systems also rely on two groups of receptors, the transmembrane Toll-like receptors (TLRs) and cytosolic Nod-like receptors (NLRs) (Ausubel, 2005). TLRs are characterized by an extracellular LRR domain and an intracellular TIR domain, whereas NLRs often demonstrate a tripartite domain architecture consisting of a variable N-terminal domain, a central NACHT domain and also a C-terminal LRR domain, like NBS-LRR proteins in plants (Ausubel, 2005; Staal & Dixelius, 2007). In addition, NBS-LRR and NACHT-LRR proteins are members of the signal-transduction ATPases with numerous domains (STAND) class of ATPases (Leipe et al., 2004; Rairdan & Moffett, 2007; Maekawa et al., 2011) which share similar biological functions and protein structures; however, the NACHTs have been placed in a separate phylogeny, suggesting different evolutionary histories of these two domains (Leipe et al., 2004; Rairdan & Moffett, 2007; Maekawa et al., 2011). Therefore, it is interesting to ask: what forces drove such striking similarities between the plant and animal innate immune systems? Did they come from the same common ancestor (divergent evolution) or did they evolve independently (convergent evolution), as suggested by a number of reviews (Ausubel, 2005; Rairdan & Moffett, 2007; Staal & Dixelius, 2007)?

The recent avalanche of whole-genome data from diverse species offers us an unprecedented opportunity to explore this series of questions via comparative analysis at the genomic level. In this study, we sampled 38 representative genomes, covering all major kingdoms of organisms (eubacteria, archaebacteria, fungi, protists, plants and animals) and identified all homologous genes of NBS-LRR, LRR-RLK, TLR, and also NLR genes in these genomes. For these well-defined immune receptor genes, we conducted a set of comparative analyses on their phyletic distribution, domain architecture, phylogenetic topology, and conserved motif evolution patterns. These data shed light on the origin and history of NBS-LRR genes and their evolutionary relationship with LRR-RLK genes in plants, and also those of the TLR and NLR genes in animals. In addition to R genes, a number of important chaperones and regulators involved in the plant immune response have been identified through genetic screens; these include WRKY in the LRR-RLK defense pathway and SGT1, RAR1, EDS1, PAD4, SAG101, HSP90 and SID2 in the NBS-LRR defense pathway (Rusterucci et al., 2001; Azevedo et al., 2002; Hubert et al., 2003; Leister et al., 2005; Eulgem & Somssich, 2007). Because of the essential functions of these signaling components for plant immunity, they were also incorporated in our study.

The clarification of the evolutionary relationships and timelines of plant R genes, especially NBS-LRR genes, in this study paves the way for elucidation of how plant disease resistance originated and evolved, and how this evolutionary innovation helped plants to survive and thrive in diverse terrestrial habitats. In addition, the findings of our study provide a good example of how recruiting and reorganizing of pre-existing building blocks can lead to functional innovation, thus shedding new light on the origin of evolutionary novelties.

Materials and Methods

Data sampling

For this study, we selected a total of 38 different whole-genome sequenced species, including six eubacteria, six archaebacteria, six fungi, six protists, seven plants and seven animals. These species were sampled because they represent all six major taxonomic kingdoms of organisms on earth. Their complete genomic sequences and corresponding annotation information were downloaded from online databases (Supporting Information Table S1).

For each species, protein entries matching the NBS domain (Pfam: PF00931) in the Pfam database v23.0 (Finn et al., 2009) were identified as NBS-encoding genes using hmmer searches with an E-value cut-off of 10−4 (Eddy, 1998). Likewise, we also identified all NACHT-encoding, TIR-encoding, and Pkinase-encoding genes in these 38 species (Table S1). For the two dubious TLR genes found in grape (Vitis vinifera) and poplar (Populus trichocarpa), we used their nucleotide sequences to run BLAST against the GenBank expressed sequence tag (EST) database to seek EST support for these two genes with > 85% sequence similarity. In addition, the nucleotide sequences of these two genes, together with their 5′ and 3′ flanking regions (each 5000 bp, respectively), were used for re-annotation with fgenesh ( and the domain architecture of the resulting predicted proteins was assessed using Pfam (Finn et al., 2009).

In addition, we randomly selected 30 NBS-encoding and NACHT-encoding genes from each of several other eubacterial and fungal species; these amino acid sequences were also downloaded from the European Bioinformatics Institute (EBI) UniProt database, including the 30 from each of the sets of NBS-encoding genes in eubacteria, NACHT-encoding genes in eubacteria, NBS-encoding genes in fungi, and NACHT-encoding genes in fungi (Supporting Information Notes S1). Fifteen NBS-encoding genes in gymnosperms (Pinus monticola) were also downloaded from EBI UniProt and National Center for Biotechnology Information (NCBI) GenBank and incorporated into our data set (Notes S1). Finally, 43 NBS- and NACHT-encoding genes of known function were selected for further analysis, and their amino acid sequences were downloaded from NCBI GenBank (Notes S1). The presence or absence of our domains of interest within these genes (e.g. NBS, NACHT, LRR, TIR, Pkinase, WD40, TPR, and HEAT) were verified based on both the Pfam (Finn et al., 2009) and SMART (Letunic et al., 2009) databases with an E-value cutoff of 10−4.

Additionally, all current available sequence data (nucleotides, ESTs and peptides) for 5126 species from nine major early plant lineages (Chlorokybales, Klebsormidiales, Zygnematales, Coleochaetales, Charales, liverworts, bryophytes, hornworts and lycophytes) and the trace data for a liverwort species, Marchantia polymorpha, were downloaded from NCBI GenBank (Table S2). For these nucleotide, EST, and trace sequences, TBLASTN was first implemented to find potential hits for the NBS domain under the threshold of 1. For those significant hits, the corresponding amino acid sequences given by high-scoring segment pairs (HSPs) were retrieved to assess the presence or absence of the NBS domain by Pfam. For those peptide sequences, Pfam searches were directly performed to detect their domain architecture. The amino acid sequences of these verified NBS domains were then analyzed by BLASTP against all other NBS-encoding genes used in this study to assess their relatedness with TIR- and nonTIR-type NBS domains.

For those other essential genes in plant disease resistance response pathways (Table S2), we first identified their key functional domains using Pfam. Then all known accessions encoding those domains were downloaded from EBI UniProt based on Pfam species distribution analysis results (Finn et al., 2009).

Sequence alignment and phylogenetic reconstruction

For all NBS/NACHT-encoding genes in our data set, the amino acid sequences of their NBS or NACHT region were aligned using muscle v3.8 (Edgar, 2004). All alignments used in this study are shown in Notes S1. Phylogenetic trees were constructed based on the neighbor-joining (NJ) method with a Poisson correction model using mega v4.0 (Tamura et al., 2007). The confidence for each branching node was assessed by bootstrap analysis with 1000 replicates (Felsenstein, 1985).

Identification and analysis of conserved motifs within each characteristic domain

For all NBS/NACHT-encoding genes sampled in our study, meme 4.0.0 (Bailey & Elkan, 1994) was employed to identify and analyze conserved motifs among amino acid sequences of NBS/NACHT regions. Based on expectation maximization, conserved motifs were identified in a group of sequences without a priori assumptions about their alignments. The individual profile for each conserved motif was described and a summary report was generated after motif summary tiling. Only those conserved motifs with position P-values ≤ 10−4 and no overlap with each other were reported.

The motif occurrence profile of the NBS/NACHT region for each gene in the meme motif summary report was assigned to different groups corresponding to ten phylogenetic clades (Eubacteria, Protista, Fungi, Bryophyta, Lycopodiophyta, Gymnospermophyta, Angiospermophyta, Protostomia, Echinodermata, and Chordata). The motif occurrence percentages of these ten motifs were determined for each group as the motif usage frequencies.

For other essential genes that participate in the plant disease resistance response, similar analyses were performed. The number of conserved motifs in our initial setting varied from three to ten according to the length of the corresponding domains.

Ancestral state reconstruction

The evolutionary age of the domain inferred by ancestral state reconstruction (ASR) was used to give further support to inferences regarding when and where the major domain fusion events (NBS with LRR, NACHT with LRR and also LRR with Pkinase) occurred. ASR using maximum likelihood (ML) methods was performed in Mesquite (v2.74) (Maddison & Maddison, 2010). For the ML-based ASR, the Markov k-state 1 parameter (Mk 1) model was employed. Character states and the corresponding relative likelihood supports were mapped onto the phylogeny of those major evolutionary lineages.


Genome-wide identification of NBS-LRR and LRR-RLK genes in multiple representative species

All genes encoding R gene characteristic domains (e.g. NBS, LRR and Pkinase) were identified in our sampled genomes (Table S1) based on a Pfam homologous search. Interestingly, all these characteristic domains can be detected in several eubacteria and archaebacteria (Table 1). For example, in our sampled six eubacteria genomes, 15 NBS-encoding, 153 Pkinase encoding, and seven LRR-encoding genes were identified (Table 1). This observation, together with the wide distribution of these domains in nearly all eukaryotic phyla, collectively suggests very ancient origins of these functional modules. Yet no domain fusion events were detected in our sampled eubacteria, archaebacteria and fungi genomes (Table 1). A further scan of all currently available sequence data for eubacteria, archaebacteria and fungi by BLAST further confirmed this finding. Therefore, the domain fusion events between NBS and LRR and also between LRR and Pkinase may have occurred exclusively in eukaryotes after the divergence of fungi. In our sampled genomes, the first structurally complete LRR-RLK gene that encodes both LRR and Pkinase domains was observed in protists. This observation suggests that the origin of the LRR-RLK genes occurred at an early stage of eukaryotic evolution, which is confirmed by our ASR analysis based on the ML method (Fig. 1). Contrasting with their loss in fungi and metazoans, the number of LRR-RLK genes in land plants was generally one or two orders of magnitude higher than in other phyla (Table 1). Very possibly, such substantial gene expansion of this gene family in the land plants was driven by their function specialization in plant immunity (Tang et al., 2010).

Table 1.   Genes encoding NBS/NACHT /Toll/interleukin-1 receptor (TIR)/Pkinase/leucine-rich repeat (LRR) domains identified in a genome-wide survey
KingdomPhylum or cladeSpeciesNBSNACHTTIRPkinaseLRR
  1. aData from Liu & Ekramoddoullah (2003, 2007).

  2. bThe question mark here indicates that data are currently not available.

  3. cData from Yang et al. (2008).

  4. dData from the Sea Urchin Genome Sequencing Consortium (Sodergren et al., 2006).

  5. The threshold of the hmmpfam search here is E-value < 10−4.

  6. T, TIR domain; NB, nucleotide-binding site (NBS) domain; NA, NACHT domain; L, LRR domain; Pk, Pkinase domain.

  7. The number of genes encoding L-Pk and encoding only L may be underestimated because of the difficulty of detecting the LRR domain.

Eubacteria Escherichia coli00000000011
Streptomyces coelicolor000603030330
Rhodococcus opacus000001000250
Trichodesmium erythraeum000408000403
Nostoc punctiforme000503210544
  Synechococcus sp. RCC30700000000010
Archaebacteria Methanocaldococcus jannaschii00000000000
Natronomonas pharaonis00000000010
Methanospirillum hungatei00000000060
Sulfolobus tokodaii00000000070
Methanosarcina barkeri00020200001
  Methanosarcina acetivorans00000200001
ProtistaAlgae-like (plant-like)Cyanidioschyzon merolae000000000599
Chlamydomonas reinhardtii00000002244771
 Volvox carteri00000000136660
Protozoa-like (animal-like)Paramecium tetraurelia0000030032363172
Tetrahymena thermophila000002001870110
  Monosiga brevicollis000003021233854
Fungi Saccharomyces cerevisiae0000000001149
Schizosaccharomyces pombe0000000001059
Aspergillus terreus0002040001169
Coprinus cinereus00000300013912
Magnaporthe grisea00020200000
  Chaetomium globosum00010110001039
PlantaeBryophytaPhyscomitrella patens402200004105536170
LycopodiophytaSelaginella moellendorffii020140001158848261
GymnospermophytaPinus monti> 67a> 61a?b?b?b?b?b?b?b?b?b
AngiospermophytaArabidopsis thaliana93c53c17c4c00039220783264
Vitis vinifera97a362c14c62c00129274986346
Populus trichocarpa78a252c10c78c0011163531348471
Oryza sativa0a464c0c52c00013021152173
  Sorghum bicolor01840610002228927179
AnimaliaProtostomiaNematostella vectensis0001313071327129
 Caenorhabditis elegans00020115156173
EchinodermataStrongylocentrotus purpuratus0002208d41222d493386171
ChordataBranchiostoma floridae000840692864208431399
Ciona intestinalis0000217011295107
Mus musculus0002271212114662232
Homo sapiens0005281011106702281
Figure 1.

Maximum likelihood (ML) ancestral state reconstruction for the domain fusion events in the evolution of plant and animal immune receptors. White circles, without domain fusion; black circles, with domain fusion. The pie chart at each node shows the relative likelihood of these two possible states at that particular node. LRR, leucine-rich repeat; NBS, nucleotide-binding site; NLR, Nod-like receptor; RLK, receptor-like kinase; TLR, Toll-like receptor.

Compared with LRR-RLK genes, NBS-LRR genes showed a much later origin and are probably specific to land plants according to both our genome-wide survey and the ASR (Table 1 and Fig. 1). In the land plant lineage, four NBS-LRR genes were first detected in the moss Physcomitrella patens, associated with the TIR domain in their N termini (Table 1). This observation is consistent with previous studies showing the presence of TIR-NBS-like and TIR-encoding genes in this species (Akita & Valkonen, 2002; Meyers et al., 2002). As P. patens diverged at an early stage of land plant evolution, these NBS-LRR genes might represent the archetype of NBS-LRR genes in land plants. In spikemoss (Selaginella moellendorffii), a total of 16 NBS-encoding genes were detected and two of them also encoded a C-terminal LRR domain (Table 1). In contrast to the limited number of NBS-LRR genes in these two basal plants, substantial lineage-specific gene expansion occurred thereafter and hundreds of NBS-LRR genes were found in gymnosperm and angiosperm genomes. In addition, as shown in Table 1, TIR-NBS-LRR genes were first observed in P. patens, whereas nonTIR-NBS-LRR genes were first found in S. moellendorffii, both of which branched before the split of gymnosperms and angiosperms. As P. patens represents an evolutionarily older lineage (bryophytes) than S. moellendorffii (lycophytes), this observation also hints at an earlier origin of TIR-NBS-LRR than their nonTIR counterparts.

Further investigation of NBS-LRR genes in early land plants

To improve the resolution of the evolutionary history of NBS-LRR genes in early land plants, a further scan of all current available early plant sequence data (nucleotides, EST, peptides and trace) in GenBank, covering 5126 species from nine major early plant lineages (Chlorokybales, Klebsormidiales, Zygnematales, Coleochaetales, Charales, liverworts, bryophytes, hornworts and lycophytes) (Qiu & Palmer, 1999), was conducted (Table S2). Based on this data set, 16 NBS-encoding fragments were identified and the existence of the NBS domain in the plant lineage can be further dated back to the origin in the Coleochaetales (Table S3). However, no evidence of the fusion between the NBS domain and either the LRR or the TIR domain was found based on this data set. To characterize these 16 NBS-encoding fragments, their amino acid sequences were used to run BLASTP against all other NBS-encoding genes used in our study. The best hits for those three NBS-encoding fragments in Coleochaetales were NBS domains from one NBS-WD40 gene in archaebacteria and two TIR-NBS-LRR genes in dicot plants (Table S4). For the 12 NBS-encoding fragments in liverworts, half of them had the best hits with TIR-NBS-LRR or TIR-NBS genes and the other half showed the highest similarity with two NBS genes, three PK-NBS genes, and one NBS-LRR gene (Table S4). Although the latter six genes with best hits do not encode TIR domains, their NBS domains were either within or adjacent to the monophylic clades of the TIR-type NBS domain in our later phylogenetic analysis (Fig. 2). By contrast, the NBS-encoding fragment found in lycophytes (Isoetes lacustris) had the best hit for a typical nonTIR-type NBS domain from a rice CC-NBS-LRR gene (Table S4). As Coleochaetales and liverworts diverged before the split of bryophytes and lycophytes, this result provided further support for the theory that the TIR-type NBS is more closely related to the NBS domains in early land plants and the nonTIR-type NBS is a later, derived innovation that arose before the split of lycophytes.

Figure 2.

Phylogenetic analyses for nucleotide-binding site (NBS) and NACHT domains. NBS/NACHT domains of representative NBS/NACHT-encoding genes in multiple species were used to construct this phylogenetic tree. (a) The entire tree combining the NBS and NACHT domains and separate trees for (b) NBS and (c) NACHT domains are shown. NBS and NACHT domains are denoted by triangles and squares, respectively, with different colors to represent different phylogenetic clades. Blue branches represent the Toll/interleukin-1 receptor (TIR) clades and the light green branches represent the TIR-type NBS-like PpC NBS gene family in Physcomitrella patens. The red branches denote those function-verified NBS- or NACHT-encoding genes.

Independent domain recruitments in the evolution of plant and animal immune receptor genes

The genomic investigation of the TLR and NLR genes of animal genomes revealed a similar story. First, as with our observations for Pkinase, NBS and LRR domains, the phyletic distribution pattern of NACHT and TIR domains suggested that these two domains also had ancient origins before the split of prokaryotes and eukaryotes (Table 1). Moreover, it is worth noting that almost all of our identified TLRs and NLRs exist only in metazoans. There are a total of four nonmetazoan genes that encode both the LRR and TIR domains, two from the bacterium Nostoc punctiforme and two from land plants (grape and poplar). For those two bacterial genes, some other domains, such as Ras and Miro, were found between the LRR and TIR domains. Therefore, these two genes cannot be considered as typical TLR genes like those identified in metazoans. For these two genes in grape and poplar, we ran a BLAST search against the GenBank EST database and found no hits with > 85% sequence similarity. In addition, we re-annotated these two genes and found no support for their LRR-TIR domain architecture. Therefore, they probably result from annotation error.

In metazoans, NLR genes were first detected in the sea anemone Nematostella vectensis, and TLR genes were first observed in the nematode Caenorhabditis elegans (Table 1). Based on the presence/absence pattern of these two genes and also our ASR (Fig. 1), we propose that the fusion events between the LRR and TIR domains and also between the NACHT and LRR domains both occurred in the early evolution of metazoans, before the split of protostomia and deuterostomia. In deuterostomia, the numbers of both TLR and NLR genes appear to reach a relatively stable level of approximately two dozen, with striking exceptions in the sea urchin (Strongylocentrotus purpuratus) and sea squirt (Ciona intestinalis) (Table 1). For the sea urchin, > 200 TLR and NLR genes were reported, showing substantial expansion for these two immune system genes (Rast et al., 2006; Sodergren et al., 2006). In the genome of the sea squirt, however, a considerable number of gene loss events have been observed, leaving only two NLRs and no TLRs in this genome (Table 1).

While these findings are similar to our findings for the plant NBS-LRR and LRR-RLK genes, it is worth noting that their shared TIR and LRR domains seemed to be recruited into animal and plant immune receptor genes independently. To further test this hypothesis, we examined the phyletic distribution pattern of domain organization in different evolutionary lineages (Fig. 3). We found that all the characteristic domains (e.g. NBS, NACHT, TIR, LRR, TPR, HEAT and WD40) originated before the split of prokaryotes and eukaryotes. In prokaryotes, the organizations of NBS-TPR and NBS-WD40 were fairly common, as were the organizations of NACHT-HEAT and NACHT-WD40. By contrast, the novel organization of NBS-LRR and NACHT-LRR dominated the land plant and metazoan lineages, respectively. The organization of LRR-Pkinase and LRR-TIR also emerged in different evolutionary lineages. Therefore, the innovations of domain organization patterns in plant and animal immune receptor genes seemed to be two independent processes that occurred in different evolutionary lineages and stages.

Figure 3.

Phyletic distribution and domain organization patterns of nucleotide-binding site (NBS)-, NACHT- and Pkinase-encoding genes. The distribution and domain organization patterns of NBS-, NACHT- and Pkinase-encoding genes were mapped to the phyletic tree. Plus (+) and minus (–) indicate whether the domain or organization pattern was detected in the corresponding phylogenetic clade under the cut-off of E-value < 10−4 in the hmmpfam search. It is worth noting that two major innovations took place in the shift from prokaryotes to eukaryotes (denoted by light-gray blocks) and from unicellular organisms to multicellular organisms (denoted by dark-gray blocks), respectively. LRR, leucine-rich repeat; TIR, Toll/interleukin-1 receptor.

Contrasting evolutionary patterns of NBS and NACHT domains

Two clearly separated clades, the NBS clade and the NACHT clade, were observed in the phylogenetic tree based on the NBS/NACHT region (Fig. 2 and Notes S1). Particularly, the NBS and NACHT sequences from eubacteria and archaebacteria were assigned to the corresponding clades without mixture. This observation suggests that there were already some fundamental differences between the NBS and NACHT domains before the split of eubacteria, archaebacteria, and eukaryotes. Therefore, these two apparently similar domains must have either diverged a long time ago or actually originated independently. In addition, contrasting to the substantially diversified nonTIR NBS subclass, a striking monophylic clade of the TIR-type NBS subclass was observed both in the phylogenetic tree based on the NBS/NACHT combined alignment and in the tree based on the NBS-only alignment, which further suggests a distinct evolutionary history and probable functional divergence of TIR and nonTIR NBS subclass genes.

To further explore the evolutionary relationship between the NBS and NACHT domains, the ten most conserved motifs within these two domains were identified by meme using the NBS/NACHT combined data (Table 2 and Fig. 4). Consistent with previous studies on the NBS domain, a number of functionally critical motifs were found in our analysis, including the P-loop (motif 1), kinase-2 (motif 5), GLPL (motif 4), RNBS-B (motif 3) and other previously described motifs (Meyers et al., 1999), as shown in Table 2. In addition to just comparing these motifs in TIR- and nonTIR-type NBS domains in land plants, as was generally done in previous studies, the much larger data set in our study enabled us to map the motif combination and usage frequency patterns of these conserved motifs on a phylogeny that included all major kingdoms of organisms on the planet. Clearly contrasting patterns were revealed in our analysis: while the NACHT domain remains quite conserved from prokaryotes to eukaryotes, the NBS domain shows considerable flexibility in different evolutionary lineages (Fig. 4).

Table 2.   Conserved motifs identified in the nucleotide-binding site (NBS) and NACHT regions of aligned sequences
Motif IDaMotif namebConsensus sequencea
  1. aMotif IDs and consensus sequences were generated by meme analysis.

  2. bMotif names are adapted from Meyers et al. (1999).

Motif 1Kinase a/P-loopIVGMGGIGKTTLAK
Motif 9UnknownxDxILPILKLSYDD
Figure 4.

Fingerprint analyses for conserved motifs within nucleotide-binding site (NBS) and NACHT domains. The usage frequency of each conserved motif in the NBS and NACHT domains from different evolutionary clades was counted and is shown in the bar plot. Bars 1 to 10 represent different conserved motifs (motifs 1 to 10) and the height of each bar represents its specific usage frequency (from 0 to 1). The characteristic motifs of the Toll/interleukin-1 receptor (TIR) and nonTIR subclasses of the NBS domain are highlighted in red and blue, respectively.

For the NBS domain, motifs 1, 3, 4, and 5 were well preserved, with > 80% usage frequency throughout the eukaryotic lineages (Fig. 4). The remaining motifs, however, showed considerable variation in their usage frequency across different lineages. Within land plant lineage, motifs 2, 6 and 7 appear to be specific to the nonTIR-type NBS, while motif 10 was detected only in the TIR-type NBS. Notably, judging from the motif usage frequency pattern, the NBS domain of the TIR subclass is more similar to the prokaryotic NBS domain than its nonTIR counterparts (Fig. 4). This observation provided another clue to suggest that the TIR-type NBS-LRR genes in plants may be more evolutionarily ancient than nonTIR-NBS-LRR genes.

For the NACHT domain, a much simpler pattern was observed, in which motifs 1, 3, and 5 were detected with constantly high representation across different evolutionary lineages. Such clearly contrasting patterns of motif combinations within the NBS and NACHT domains were in concordance with our phylogenetic analysis above, both of which support an independent origin and evolution of the NBS and NACHT domains. In addition, the finding that motif 1 (P-loop) is shared between the NBS and NACHT domains here is consistent with a previous observation of the P-loop in multiple ATP- or GTP-binding domains (Saraste et al., 1990), thus highlighting the structural and functional importance of this element.

Other essential genes in the plant disease resistance response pathway

We extended our investigation to a number of important regulator and cofactor genes in the plant disease resistance response pathway (Table 3 and Fig. S1) and found considerable variation with respect to the origins and evolutionary patterns of their characteristic domains. The key domains of NDR1, HSP90, EDS1, PAD4, SAG101, and SID2 were found in multiple eubacteria species, suggesting that their origins predate the split of prokaryotes and eukaryotes. The corresponding domains of SGT1, RAR1, and WRKY were found to first appear in protists and fungi, and these domains were widespread throughout the eukaryotic lineage. The AvrRpt cleavage domain encoded by RIN4, however, seems to be specific to land plants, and may have a co-evolutionary relationship with plant NBS-LRR genes. Nevertheless, it is worth noting that the origin of these key domains occurred no later than that of the corresponding immune receptor genes in the respective signaling pathway (e.g. RIN4 vs NBS-LRR and WRKY vs LRR-RLK). Therefore, it seems that, concurrently with the origin of the NBS-LRR and LRR-RLK genes, other necessary components in the respective signaling pathways were established, thus ensuring successful pathogen recognition and downstream signal transduction in plants.

Table 3.   Phyletic distribution of characteristic domains within some essential signaling components in the plant disease resistance response pathway across multiple species
Family namePfam domainEubacteriaArchaeabacteriaProtistaBryophytaLycopodiophytaGymnospermophytaAngiospermophytaFungiProtostomiaEchinodermataChordata
  1. a‘Species’ is the number of species in which the domain is found.

  2. b‘Sequences’ is the number of unique sequences in which each domain is found.

  3. c‘Regions’ is the number of domains that are found in all sequences in the alignments.

  4. dThe question mark here indicates that data are currently not available.

Rar1CHORD PF04968Speciesa0017111173461814
Lipase_3 PF01764Speciesa3391261113191407
SID2 (EDS16)Chorismate_bind PF00425Speciesa16101311?d1859000

Conserved motifs within these key domains were also identified using the meme program and their combinatorial patterns and usage frequencies were analyzed (Fig. S2). We found diverse patterns of motif evolution for these domains. For some domains, such as RIN4 and RAR1, the motif usage patterns seemed to be quite constant in evolution (Fig. S2). In contrast, the motif usage patterns of some other domains showed dramatic changes across different evolutionary lineages (Fig. S2).

Evolutionary conservation of the flagellin receptor in land plants

A member of the LRR-RLK proteins called FLS2 has been well studied in Arabidopsis thaliana. Experiments have shown that this protein can detect a 22-amino acid fragment (flg22) of a microbial flagellin polypeptide; detection initiates a cascade of immune defenses (Zipfel et al., 2004). Mutations in the FLS2 receptor lead to susceptibility to bacterial pathogens in A. thaliana (Chinchilla et al., 2006). Interestingly, Takai et al. (2008) found that the homologous gene in rice remains capable of recognizing flg22 and can also induce a corresponding immune response. In our study, we found a uniform copy number of FLS2-homologous genes in all our sampled land plants, usually as a single copy (Fig. 5). The protein IDs of these FLS2-homologous genes are GSVIVP00003414001 and GSVIVP00003415001 in V.  vinifera, 197404 in P. trichocarpa, 5057881 in Sorghum bicolor, 119860 in S. moellendorffii and 151614 in P. patens. As shown in Fig. 5, these genes clustered as a single clade in the phylogenetic tree with high bootstrap support. This observation reflects the considerable evolutionary conservation of the flagellin perception strategy in the plant innate immune system. It will be interesting to investigate the functional conservation of these FLS2-homologous genes in other land plants in future experiments.

Figure 5.

The FLS2 clade in the phylogenetic tree of leucine-rich repeat (LRR)-Pkinase genes across multiple species. Representative LRR-receptor-like kinase (RLK) genes across multiple species were used to construct this phylogenetic tree. Both the snapshot of the homolog gene group including FLS2 gene (FLS2 clade) (a) and the entire tree (b) are shown. Different colors represent different phylogenetic clades. Blue branches in the LRR-RLK gene tree denote the FLS2 clade.


The origin of plant NBS-LRR and LRR-RLK genes

Numerous experimental studies have revealed that all these major domains (e.g. NBS, LRR and Pkinase) are indispensable for the function of plant R genes (Dangl & Jones, 2001), suggesting that domain origin and fusion events are critical steps in the origin of plant R genes. Until whole-genome sequencing data recently became available in a series of land plant species, the two major approaches for making inferences about domain appearance and fusion history were BLAST-based homologous searches and degenerate primer-based PCR. For example, in 2002, the discovery of two TIR-encoding sequences in the bryophyte P. patens by a BLAST search against an EST database highlighted the widespread distribution of this domain in land plants (Meyers et al., 2002). In the same year, Akita & Valkonen (2002) identified several TIR-type-like NBS-encoding fragments (without the TIR and LRR regions) in this species by degenerate primer-based PCR. However, the inherent bias underlying these two approaches meant that the resulting data were incomplete. Without obtaining the full-length NBS-LRR and TIR-NBS-LRR genes, it is still difficult to make inferences about the history of domain fusions. In this study, in which not only an array of representative land plant genomes but also all currently available early land plant sequence data were included, we were able to identify and date these important domain fusion events with much higher confidence and better resolution. First, the NBS, LRR, and TIR domains are all very ancient, with their origins probably predating the split of prokaryotes and eukaryotes. In plants, the emergence of the NBS domain in plant lineages can be dated back to the origin in the Coleochaetales. Secondly, the NBS-LRR and TIR-NBS domain fusion events both occurred no later than the divergence of bryophytes and at least no fusion event was found in currently available liverwort sequence data. With more whole-genome sequencing data from early land plants becoming available in the future, we should be able to further narrow the window of time in which these domain fusion events could have occurred.

A rather controversial issue in the plant R gene community is the chronological order of the origin of TIR-type and nonTIR-type NBS-LRR genes. According to the phylogenetic tree, nonTIR-NBS-LRR genes appear to be much more diverse than their TIR counterparts, and it is therefore tempting to speculate that nonTIR-NBS-LRR genes are more ancient than the TIR type (Cannon et al., 2002). Also, in 2003, Meyers et al. found that nonTIR-NBS-LRR genes also show greater diversity in their intron positions and phases, which makes this hypothesis even more appealing. However, our analysis showed that the NBS domains found in early land plants (Coleochaetales, liverworts, and bryophytes) are closer to the TIR-type NBS domain at sequence level (Table S4 and Fig. 4). Moreover, across the early land plant lineages, the complete TIR-NBS-LRR genes were first identified in bryophytes, whereas the complete nonTIR-NBS-LRR genes were found in lycophytes, which diverged c. 50 million yr later than bryophytes. Finally, compared with the nonTIR-type NBS domain, the motif usage frequency of the TIR-type NBS domain in land plants showed much higher similarity to that of the NBS domain in several distant evolutionary lineages such as eubacteria, archaeabacteria and fungi, suggesting that they are closer to the ancestral state. Therefore, collectively, our findings support an earlier origin of TIR-type NBS than the nonTIR type. A higher selective constraint for TIR-NBS-LRR genes or a higher diversifying selection pressure for nonTIR-NBS-LRR genes may have been responsible for explaining the different patterns of these two types of NBS-LRR genes found in phylogenetic studies.

Compared with NBS-LRR genes, LRR-RLK genes seem to be more ancient and probably originated in the early-branching unicellular eukaryotes, after the split of the fungal lineage. The timeline of the origin of plant R genes revealed in our study provides further support for the widely known PAMP-triggered immunity–effector-triggered immunity (PTI-ETI) model (Jones & Dangl, 2006). The first layer of defense provided by transmembrane receptors such as LRR-RLKs can recognize conserved features of certain pathogen-/microbe- associated molecules (e.g. lipopolysaccharide, flagellin and peptidoglycans) and initiate downstream self-protection responses, collectively known as PAMP-triggered immunity (PTI) (Jones & Dangl, 2006). A further investigation of LRR-RLK genes in these early-branching unicellular eukaryotes will shed light on the ancestral function of these genes in their evolutionary history. However, just one layer of defense does not provide plants with sufficient protection. Some successful pathogens gradually evolved the ability to suppress or evade the surveillance of LRR-RLKs by deploying certain virulence effectors (Jones & Dangl, 2006). This new challenge explains why another layer of defense, the intracellular NBS-LRR receptors, emerged in land plant genomes during the arms race. These new weapons are capable of specifically sensing the presence of corresponding virulence effectors and triggering the plant hypersensitive response. A consequently striking expansion of this gene family throughout the entire land plant lineages and their frequent recombination provide a powerful arsenal for land plants, allowing them to evolve novel resistance specificities quickly and effectively to fight diverse pathogens in the terrestrial environment (Leister, 2004).

Convergent evolution of the plant and animal innate immune systems

Noting the constitutional and structural similarities of plant and animal immune receptors, it is tempting to speculate that plant and animal innate immune systems have a common ancestry. However, further reflection suggests that this may not be the case. There are some noteworthy differences between plant and animal innate immune systems. First, while both transmembrane TLRs and cytoplasmic NLRs in animals are able to recognize conserved PAMPs (or MAMPs), such recognition in plants is exclusively localized to the cell membranes, and is performed by LRR-RLKs and other transmembrane PRRs (Girardin et al., 2002; Jones & Takemoto, 2004). Instead of performing intracellular recognition of PAMPs like animal NLRs, the NBS-LRRs in plants can recognize pathogen-specific effectors and mount effector-triggered immunity (ETI) (Jones & Dangl, 2006). Secondly, consistent with different recognition strategies, NBS-LRRs in plants and NLRs in animals differ considerably in copy number and the types of molecules that they can recognize; plant NBS-LRRs show much higher diversity and can recognize a much broader spectrum of pathogens than their counterparts in animals. Thirdly, the downstream signaling pathways in the animal and plant innate immune systems are also unlikely to be derived from a common origin. A large set of signaling components identified in these two systems are different. For example, the WRKY transcription factors critical in downstream activation of LRR-RLKs are absent from metazoans, whereas plants lack nuclear factor (NF)-κB, an indispensable regulator for the animal immune response (Ausubel, 2005). These fundamental differences, together with the seemingly independent recruitment of those common domains encoded in plant and animal immune receptors, collectively suggest that plant and animal innate immune systems have different origins and their striking resemblance was shaped by convergent evolution.

Given the compelling scenario of convergent evolution for plant and animal innate immune systems, it is of interest to ask why the apparently similar or even common functional modules were chosen for these immune systems and why they are assembled in such an analogous way. A noteworthy fact is that the NBS- and NACHT-encoding genes in plants, animals and also fungi, especially when found together with a C-terminal tandemly repetitive domain such as an LRR or WD40, are frequently associated with the ability to discriminate self and non-self, and they play a role in triggering immune responses such as programmed cell death (PCD) (Leipe et al., 2004; Fedorova et al., 2005; Petrilli et al., 2005). Similarly, most TIR-encoding genes identified in plants and animals have been shown to function in immune responses and their TIR domains play important roles in transducing signals to downstream signaling partners (Meyers et al., 2002; Akira & Takeda, 2004). LRR- and Pkinase encoding genes, in contrast, are involved in the signal transduction pathways of many important biological processes. Therefore, there should be some common functional constraints for both modules in choosing and assembling immune receptor genes for plant and animal innate immune systems. Future functional studies on these genes and also their regulatory networks should provide further elucidation of these aspects of the immune systems.

Evolutionary novelty based on reorganization of pre-existing building blocks

How morphological or physiological novelties are generated is one of the most exciting yet also most challenging questions in evolutionary biology. It is commonly believed that gene duplications play a major role in this process by providing additional copies of raw materials and functional redundancy for evolutionary modification (Ohno et al., 1968; Hughes, 1994; Des Marais & Rausher, 2008; Flagel & Wendel, 2009). Alternatively, recent progress in ‘evo-devo’ has suggested that alterations in gene expression patterns could also lead to considerable phenotypic changes, especially in morphology, without the need for new genes (Carroll, 2000; Wray, 2007). We believe that our studies may provide an excellent example of how reorganization of pre-existing functional modules, at both domain and motif levels, can lead to evolutionary novelties.

Domains are considered to be the basic unit of proteins, and reorganizing these building blocks may lead to significant changes in the physical structure and also biochemical activity of the corresponding proteins. Therefore, the reshuffling of pre-existing protein domains could, to a large extent, contribute to functional innovation in the course of evolution. Taking NBS-encoding genes as an example, the NBS domain is a member of the STAND superfamily which has ATPase activity and exists in multiple genes that undertake diverse functions (Leipe et al., 2002). In our study, NBS-TPR and NBS-WD40 organization patterns were frequently detected in eubacteria, archaebacteria, and also in higher animals, suggesting that such organizations may be quite ancient and undertake important biological functions. Surprisingly, they were entirely lost in plant lineages and replaced by a novel organization pattern, the NBS-LRR. Such innovation may have led land plants to evolve a second layer of defense that helped them to better adapt to their new habitat. Similarly, the NLR genes in the animal innate immune system have a similar story. The results of several other studies also suggest that this domain reorganization process is fairly common for a substantial number of functionally novel genes (Apic et al., 2001; Basu et al., 2008; Deshmukh et al., 2010). Moreover, it is worth noting that the critical domain organization changes leading to NBS-LRR and NLR genes both occurred during the shift from unicellular organisms to multicellular organisms. Is this merely a coincidence or does it hint at a more general principle? In fact, consistent with our observation of the reorganization of domains, a series of studies in animal evolution have revealed a correlation between such domain recruitment and reorganization events and the evolution of biological complexities, including multicellularity (Itoh et al., 2007; Kawashima et al., 2009).

Further focusing on the evolution of the protein domain itself, we propose that the strategy of reorganizing pre-existing building blocks also works at the motif level. These conserved motifs may contain important co-factor binding sites or enzyme catalytic sites, and thus alterations in the motif combinatorial pattern will have a considerable effect on the functional specificities of the corresponding domains. In this study, we examined the evolution of the motif combinatorial patterns of several immune-related domains, and a total of five main patterns were discerned based on our findings (Fig. 6). First, for some highly conserved domains, as illustrated by pattern 1, the motif combinatorial patterns are well preserved across different evolutionary lineages. However, more typically, the combination of conserved motifs within a given domain will undergo modification during evolutionary history. For example, pattern 2 shows that some domains, such as SGS and lipase_3, will experience motif loss during this process. Conversely, a copy of a duplicated motif or an entirely new motif can also be added to the original motif combination, as indicated by patterns 3 and 4, respectively. Finally, for entirely novel domains, their motif combinations are generally formed by a way of ab initio establishment (pattern 5).

Figure 6.

The evolution of conserved motif combinations within domains. Five basic patterns of motif combination evolution are shown in cartoons. Different color blocks represent different motifs. A brief description and examples are also given for each pattern.

Rather than merely a theoretical model, more and more experiments have demonstrated the efficiency of the strategy of reorganizing pre-existing building blocks in generating new functions. For example, a number of synthetic proteins of fused regulatory or catalytic modules can display novel signaling input/output relationships and thus generate the potential to rewire the corresponding cellular pathways (Dueber et al., 2003; Yeh et al., 2007). Similarly, Peisajovich et al. (2010) investigated the effect of domain reorganization on the behavior of the yeast mating pathway and observed great diversity in pathway response dynamics generated by chimeric domain recombinants. Therefore, many functional novelties in evolution may not require the invention of entirely new genetic materials, but instead can arise through the recruitment and reorganization of pre-existing building blocks and the establishment of novel linkages among them. Simple, economical but also powerful, this may be an eternal theme in evolution.


This work was supported by the National Natural Science Foundation of China (30930008, 30970198 and 30930049), the Fundamental Research Funds for the Central Universities and the Qing Lan Project. We thank the Editor and two anonymous reviewers of New Phytologist for their critical comments on the manuscript.