PHYLOGENETIC ANALYSES UNRAVEL THE EVOLUTIONARY HISTORY OF NAC PROTEINS IN PLANTS

Authors

  • Tingting Zhu,

    1. Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture and Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, Hubei 430074, China
    2. Graduate University of Chinese Academy of Sciences, 19 Yuquan Road, Beijing 100049, China
    Search for more papers by this author
  • Eviatar Nevo,

    1. Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel
    Search for more papers by this author
  • Dongfa Sun,

    1. College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
    Search for more papers by this author
  • Junhua Peng

    1. Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture and Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, Hubei 430074, China
    2. Department of Soil and Crop Sciences, Colorado State University, Fort Collins, Colorado 80523
    3. E-mail: junhuapeng@yahoo.com
    Search for more papers by this author

Abstract

NAC (NAM/ATAF/CUC) proteins are one of the largest groups of transcription factors in plants. Although many NAC proteins based on Arabidopsis and rice genomes have been reported in a number of species, a complete survey and classification of all NAC genes in plant species from disparate evolutionary groups is lacking. In this study, we analyzed whole-genome sequences from nine major lineages of land plants to unveil the relationships between these proteins. Our results show that there are fewer than 30 NAC proteins present in both mosses and lycophytes, whereas more than 100 were found in most of the angiosperms. Phylogenetic analyses suggest that NAC proteins consist of 21 subfamilies, most of which have highly conserved non-NAC domain motifs. Six of these subfamilies existed in early-diverged land plants, whereas the remainder diverged only within the angiosperms. We hypothesize that NAC proteins probably originated sometime more than 400 million years ago and expanded together with the differentiation of plants into organisms of increasing complexity possibly after the divergence of lycophytes from the other vascular plants.

Transcription factors are a group of proteins that control cellular processes by regulating the expression of downstream target genes. A large proportion of the plant-specific proteins are transcription factors, indicating the importance of these proteins for the evolution of the plant lineage (Olsen et al. 2005). The NAC (NAM/ATAF/CUC) domain is a highly conserved amino acid motif that defines one of the largest groups of plant-specific transcription factors. The first characterized NAC proteins were petunia NAM (no apical meristem) (Souer et al. 1996) and Arabidopsis ATAF and CUC (cup-shaped cotyledon) (Aida et al. 1997), after which “NAC” was named. NAC was later identified in protein sequences of other plant species, including rice, wheat, tomato, potato, and pumpkin (John et al. 1997; Ruiz-Medrano et al. 1999; Xie et al. 1999; Kikuchi et al. 2000; Collinge and Boller 2001). Proteins that contain the NAC domain are involved in diverse regulatory processes during the plant's development, such as mediating lateral root formation and auxin signaling (Xie et al. 2000); regulating senescence, cell division, and wood formation (Kubo et al. 2005; Uauy et al. 2006; Demura and Fukuda 2007; Mitsuda et al. 2007; Willemsen et al. 2008; Kato et al. 2010; Zhong et al. 2010); or biotic and abiotic stress responses (Hegedus et al. 2003; Fujita et al. 2004; Kim et al. 2007; Lu et al. 2007). The complex regulation of NAC transcription factors includes microRNA (miRNA)-mediated cleavage of mRNAs and ubiquitin-dependent proteolysis (Xie et al. 2002; Mallory et al. 2004) and can be of importance for crosstalk between different pathways (Olsen et al. 2005).

The NAC domain consists of 120–130 amino acids that form four distinct subdomains: A, B, C, and D. Structural analysis of the Arabidopsis thaliana NAC domain indicated that it consists of a twisted antiparallel β-sheet that packs against an N-terminal α-helix on one side and a short helix on the other (Ernst et al. 2004). A comparative analysis showed the NAC domain does not possess any known DNA-binding motif, but one face of the NAC domain is rich in positive charges and is probably involved in DNA binding (Ernst et al. 2004). The central part of the NAC domain does share some structural similarity with the large subdomain of the GCM (glial cells missing) DNA-binding domain, which is present in metazoans but not plants (Cohen et al. 2003). However, the mode of DNA recognition by the NAC domain is still unknown.

Previous phylogenetic analyses of NAC proteins were mainly based on the genome sequences of A. thaliana and Oryza sativa (Kikuchi et al. 2000; Ooka et al. 2003; Fang et al. 2008; Nuruzzaman et al. 2010), which provided a useful, but limited, phylogenetic framework for the classification of NAC proteins in flowering plants. Studies on rice NAC protein sequences first classified NAC into three typical subfamilies (OsNAC3, ATAF, and NAM) (Kikuchi et al. 2000). Comprehensive analyses using whole-genome sequences showed that NAC could be further classified into several smaller subfamilies that are highly conserved across A. thaliana and O. sativa (Ooka et al. 2003; Fang et al. 2008; Nuruzzaman et al. 2010). However, none of these analyses shed light on the complex evolutionary history of NAC proteins but simply illustrated the classification of these subfamilies. Moreover, characterization of the evolution of NAC diversity requires the phylogenetic analysis of NAC proteins from a more diverse evolutionary groups of plants, including algae, bryophytes, and different lineages of vascular plants.

According to Olsen et al. (2005), the NAC gene family is not limited to monocots and eudicots, but is also found in conifers and mosses. Thus, we decided to broaden the current knowledge of NAC protein family. The completion of several high-quality plant-genome sequencing projects provided us with the unique opportunity to make a complete assessment and thorough comparative analysis of the NAC proteins encoded in plants. The analysis of the relatively full set of NAC proteins in genomes from diverse species across plant phylogeny allows for a definitive classification of NAC proteins and an assessment of their origins, evolutionary relations, patterns of differentiation, and proliferation in the various phylogenetic groups. In this study, based on the analysis of 837 NAC protein sequences, we show that the NAC family is present in major lineages of land plants and has undergone a major radiation after the evolution of vascular plants probably due to whole-genome duplication (WGD) and other duplication events. The NAC family established in early-diverged land plants over 400 million years ago (Mya) was conserved during subsequent plant evolution, although many gene duplications and losses occurred. Our analysis defines 21 subfamilies that represent intricate evolutionary relationships among NAC proteins.

Materials and Methods

SEQUENCE RETRIVAL AND ALIGNMENT

NAC protein sequences were retrieved from published studies and the publicly available databases. The A. thaliana NAC proteins were retrieved from the Arabidopsis Information Resource (http://www.arabidopsis.org/). A dataset of predicted O. sativa L. ssp. japonica NAC proteins was retrieved from the PlantTFDB (plant transcription factor database) (Guo et al. 2008) and combined with the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/). A dataset of predicted Zea mays NAC proteins was retrieved from MaizeSequence (http://www.maizesequence.org). Datasets of predicted NAC proteins in Physcomitrella patens, Selaginella moellendorffii, Vitis vinifera, and Carica papaya were retrieved from the PlantTFDB (Guo et al. 2008). HMMsearch (Eddy 1998) was used to screen the genome assemblies of Chlamydomonas reinhardtii version 4.0 (Merchant et al. 2007) (http://www.phytozome.net/chlamy), Sorghum bicolor version 1.0 (Paterson et al. 2009), Populus trichocarpa version 2.0 (http://www.phytozome.net/poplar), Cyanidioschyzon merolae (Matsuzaki et al. 2004), Ostreococcus tauri version 2.0 (Palenik et al. 2007) (http://www.jgi.doe.gov/), and draft assemblies of Chlorella vulgaris C-169 version 2.0 and Volvox carteri version 1.0 (http://www.jgi.doe.gov/) with the PFAM profile hidden Markov model (pHMM) NAM.hmm (http://pfam.sanger.ac.uk/). For Cyanophora paradoxa and Glaucocystis nostochinearum, we did BLAST searches against the EST database in NCBI (http://www.ncbi.nlm.nih.gov/) using multiple NAC-like genes from different lineages as queries. For C. paradoxa and G. nostochinearum, 28,043 and 8755 expressed sequence tag (EST) sequences were scanned, respectively.

The resulting datasets were then screened to remove sequences with many ambiguous amino acid calls and putative DNA sequences sampled from the same species with identities higher than 95% and no indels (Zhang et al. 2001). All nonredundant putative NAC protein sequences were manually checked for the NAC domain through CDD (http://www.ncbi.nlm.nih.gov/sites/cdd), with PFAM NAM domain (PF02365) as guidance. For simplicity, all sequences were renamed according to Table S1.

Protein sequences were aligned using the HMMalign program in the HMMer package version 3.0 (Eddy 1998) and the pHMM NAM.hmm from PFAM (http://pfam.sanger.ac.uk/). The NAC region was then extensively adjusted manually in BioEdit (http://www.mbio.ncsu.edu/BioEdit/BioEdit.html). Unambiguous aligned positions were used for subsequent phylogenetic analyses.

Phylogenetic Analysis

The Jones, Taylor, and Thorton (JTT) model was selected as the best-fitting amino acid substitution model by ProtTest (Abascal et al. 2005). The maximum likelihood (ML) analysis was performed with the program PhyML version 3.0.1 (Guindon and Gascuel 2003), using the JTT model of amino acid substitution, an estimated gamma distribution parameter, and an Shimodaira-Hasegawa-like approximate likelihood ratio test. ML searches were initiated with a BIONJ tree (Guindon and Gaschuel 2003). The PHYLIP package version 3.69 (Felsenstein 1989) was used to perform 100 bootstrap replicas of a neighbor joining (NJ) tree based on a JTT distance matrix. The Bayesian analysis was performed with MrBayes version 3.1.2 (http://mrbayes.csit.fsu. edu/): two independent runs were computed for 10 million generations, at which point the standard deviation of split frequencies was less than 0.01; one tree was saved every 100 generations. All trees were visualized using the program Figtree (http://tree.bio.ed.ac.uk/software/figtree/).

DETECTION OF CONSERVED MOTIFS

The MEME program (Bailey and Elkan 1994) was used to predict potential patterns in the complete amino acid sequences of NAC proteins. All motifs discovered by MEME with expected values lower than 1E-30 were searched in the InterPro database with InterProScan (Mulder et al. 2005). Each motif was individually checked so that incorrect or insignificant matches were discarded.

Results

NAC PROTEINS PRESENT IN ALL MAJOR GROUPS OF LAND PLANTS

We searched for NAC protein coding sequences in the complete genome or genome assemblies of the eudicots A. thaliana, C. papaya, V. vinifera, and P. trichocarpa, the monocots O. sativa, S. bicolor, and Z. mays, the lycophyte S. moellendorffii, the moss P. patens, the chlorophytes C. reinhardtii, O. tauri, V. carteri, and C. vulgaris, and the red alga C. merolae, plus 28,043 and 8755 EST sequences for glaucophytes C. paradoxa and G. nostochinearum, respectively, to determine whether these subfamilies arose earlier in plant evolution and also to understand the deeper evolutionary history of this family in plants.

We got hits in all of these genomes except for chlorophytes, glaucophytes, and red alga. However, genome sequence data are currently not available for streptophyte algae, which is closely related to the common aquatic ancestor of land plants (Graham 1993). Thus, we could not figure out whether NAC proteins are land plant specific or have arisen at a much earlier age before the transition from water to land. After removing incomplete or redundant sequences, 837 NAC protein sequences were collected to generate a dataset, representing the major evolutionary lineages of land plants (Table S1, Fig. 1).

Figure 1.

A simplified phylogeny of species used in this study. The total number of NAC proteins found in the genome of each species is indicated. Black bars roughly show the number of whole-genome duplication (WGD) events during the evolution of certain species (Oryza sativa: two to three rounds of WGD [Tang et al. 2009]; Arabidopsis thaliana: three to four rounds of WGD [Simillion et al. 2002; Jaillon et al. 2007]; and Populus trichocarpa: two rounds of WGD [Tuskan et al. 2006]).

Over 100 putative NAC proteins are found in A. thaliana, P. trichocarpa, O. sativa, Z. mays, and S. bicolor, making it one of the largest families of transcription factors in plants (Riechmann et al. 2000; Gutierrez et al. 2004). We also identified 74 and 66 NAC proteins in V. vinifera and C. papaya, respectively. In contrast, however, there are only 30 or fewer NAC proteins in the early-diverged land plants, S. moellendorffii and P. patens (Table S1, Fig. 1). This suggests an expansion of NAC proteins occurred after the evolution of vascular plants.

KEY AMINO ACID RESIDUES ARE HIGHLY CONSERVED ACROSS LAND PLANTS

The traditional NAC region consists of four subdomains: A, B, C, and D. It is reported, however, that subdomain E, which flanks subdomain D, is important as a DNA-binding domain in AtNAM, the A. thaliana NAM protein (Duval et al. 2002). Since it was not present in some of our NAC protein sequences, the subdomain E was not considered in the following phylogenetic analysis, although it was commonly used in previous studies (Ooka et al. 2003; Fang et al. 2008). To characterize the molecular evolution of NAC proteins, we aligned the retrieved amino acid sequences in the conserved NAC region (Fig. 2). It clearly showed that subdomains A, C, and D were highly conserved whereas subdomain B was variable. This was consistent with Ooka et al. (2003) using A. thaliana and O. sativa sequences alone and also indicates the importance of these conserved subdomains during the NAC-involved plant evolution.

Figure 2.

Alignment of the NAC domain of representative proteins. A representative of the 21 subfamilies of NAC is shown. Subdomains A–D are shown by bars above the sequences. Red residues are common to all the sequences, whereas blue residues are common to at least half. The shaded box indicates the region rich in conserved hydrophobic amino acids. Arrows mark the positions of the substitutions in the dysfunctional CUC1 proteins (Takada et al. 2001).

NAC proteins have been shown to homo- and heterodimerize, with the help of a short antiparallel β-sheet and two prominent salt bridges. This structure involves hydrophobic amino acids arginine (R) and glutamate (E) residues at the highly conserved N-terminal end of the NAC domain (Fig. 2) and when these two residues are changed to alanine (A), the ANAC019 NAC domain is found in its monomeric form (Xie et al. 2000; Olsen et al. 2005). As we expected, the R and E residues present are 84% and 94%, respectively, in all proteins analyzed. A mutation of lysine (K) in CUC1 might affect nuclear localization, DNA binding, or the structural integrity of the NAC domain. Another mutation changes a leucine (L) to a phenylalanine (F), a substitution that is likely to result in conformational stress and instability of the NAC domain (Takada et al. 2001). This implies the existence of some highly conserved sites in subdomain B, although it is relatively variable (i.e., the 100% and 86% presence of residues P and L) (Fig. 2). A region of the NAC domain rich in conserved hydrophobic amino acids was predicted to contain an NES (nuclear export signals) in ANAC019 (Olsen et al. 2005). In our dataset, this hydrophobic segment appears in most of the NAC proteins (Fig. 2), which indicates a conservation of chemical similarity, although there were differences in amino acids at some sites.

In general, the high degree of sequence similarity within the NAC domains, particularly in DNA-binding domains and key amino acids involved in structural stability, indicates that the molecular structures and their ability in recognition of cis-elements were conserved during evolution.

SEVERAL NAC SUBFAMILIES FOUND IN FLOWERING PLANTS WERE ALREADY PRESENT IN EARLY-DIVERGED LAND PLANTS

To understand the evolutionary relationships between NAC proteins, we used conserved NAC regions of the alignment to compute phylogenetic trees. An ML analysis shows that proteins from different species cluster together in compact clades with high support values (Figs. 3 and S1). The NJ analyses support the existence of most of these clades. Based on the topology of the trees, clade support values, branch lengths, and visual inspection of the NAC amino acid sequences, we defined 21 NAC protein subfamilies involving 785 (94%) of the 837 NAC proteins analyzed, which are largely diversified from one another (Figs. 2,3, and S1; Table 1). The previous nomenclature of these subfamilies proposed by Ooka et al. (2003) and Nuruzzaman et al. (2010) was mainly derived from the proteins discovered in A. thaliana and O. sativa, which led to its nonuniversality. Therefore, we relabeled these subfamilies in simple Roman numerals, followed by their previous nomenclature (Table 1, Fig. S1).

Figure 3.

Maximum likelihood analysis of 837 NAC protein sequences, shown as an unrooted cladogram. The blue balloons delineate the 21 subfamilies of NAC proteins. Colored dots symbolize the species to which the proteins in each group belong (green: Oryza sativa, Sorghum bicolor, Zea mays[monocot]; pink: Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Carica papaya[eudicot]; yellow: Selaginella moellendorffii[lycophyte]; blue: Physcomitrella patens[moss]).

Table 1.  Classification of all NAC proteins retrieved from plant genomes.
SubfamilyEudicotMonocotLycophyta SmBryophyta Pp
AtVvPtCpOsSbZm
  1. At =Arabidopsis thaliana, Vv =Vitis venifera, Pt =Populus trichocarpa, Cp =Carica papaya, Os =Oryza sativa. Sb =Sorghum bicolor, Zm =Zea mays, Sm =Selaginella moellendorffii, Pp =Physcomitrella patens.

Ia/NAM/CUC3 11 7 12 7 17 14 12 3 8
Ib/NAC1  4 2  4 2 10  7  9  
Ic/SND 13 8 16 5 14 12 15 4 8
II/ONAC4  7 5  9 4  8 12 15 1 1
IIIa/TIP  9 6  5 3  5  3  4
IIIb/NAC2 13 4  9 3  6  5  7  
IIIc/ANAC11  7 3  8 3  4  3  6 1
IVa/ANAC1 12  17 1     
IVb/ANAC34  3 3  5 1  8  5  9 1 1
IVc/TERN  3 3  5 2  4  3  4  
IVd/–  4 7  7 3  5  6  8 4
Va(1)/NAP  3 3  7 2  6  6  5  
Va(2)/NAP  2 2  4 2  1  1  1
Vb/SNAC  8 4  6 3  6  6 10 5 8
VIa/SENU5  4 3  6 3
VIb/ONAC1      8  3  5  
VIc/ONAC6  1 2  4 2  2  2  2
VII/–   17 7     
VIII/ONAC3  410  8 5  5  5
IX/–     14  6   
X/–  7  5  3
Total11072149581321041151926

Our results are mostly consistent with the groups proposed by previous phylogenetic analyses using A. thaliana and O. sativa sequences alone (Ooka et al. 2003; Fang et al. 2008; Nuruzzaman et al. 2010). Two major differences are: the subfamilies ATAF and OsNAC3 were merged into one subgroup Vb and the subgroup NAP was divided into two subfamilies (Va(1) and Va(2)). We also characterized two new subgroups (VII and X), including some Arabidopsis and rice NAC sequences not present in the analysis by Ooka et al. (2003). Of the 837 proteins analyzed, 52 (∼6%) do not clearly fall in any of the 21 subfamilies (Figs. 3 and S1). These proteins often have a high degree of sequence divergence from other NAC proteins, which may be due to the pseudogene sequences probably caused by nonfunctionalization.

Of the 21 subfamilies, 15 include only the angiosperm proteins and six include angiosperm and lycophyte proteins (Fig. 3, Table 1). Considering the last common ancestor of angiosperms and lycophytes that lived sometime in the Upper Silurian period more than 415 Mya (Kenrick and Crane 1997); this could imply that these six NAC subfamilies are at least 415 million years (My) old. Interestingly, three of these subfamilies include not only vascular plants but also moss proteins. Given that the oldest evidence for the existence of vascular plants is trilete spores in Upper Ordovician sediments (Steemans et al. 2009), it suggests that these subfamilies are more than 443 My old.

It is worth noting that a clade composed of P. patens NAC proteins is a sister to the proteins in subfamily IIIc (Fig. 3). However, we did not include these moss proteins into the IIIc subfamily as this relationship is not strongly supported. Nevertheless, this relationship suggests that subfamily IIIc is phylogenetically closer to moss proteins than to any other land plant proteins.

SOME NAC PROTEINS ACQUIRED DISTINCT PROPERTIES DURING THE EVOLUTION

We also noticed that, of the 15 angiosperm-protein-only subfamilies, subfamilies IVa, VIa, and VII contain only eudicot members, whereas subfamilies VIb, IX, and X contain only monocot members. More specifically, P. trichocarpa and C. papaya proteins were only grouped into subfamily VII. This possibly implies that these monocot-/eudicot-specific subfamilies represent some unique properties, which evolved separately within monocots and eudicots after their divergence from the last common ancestor about 140–145 Mya (Davies et al. 2004; Anderson et al. 2005).

The absence of Arabidopsis and V. vinifera NAC sequences in subfamily VII suggests that these NAC proteins may have been either lost in Arabidopsis and V. vinifera or evolved in the Populus and C. papaya lineages after their divergence from the last common ancestor. It is widely accepted that in addition to the common mechanisms regulating vascular tissue formation shared by all vascular plants, tree species might have evolved unique regulatory mechanisms controlling wood formation (Nieminen et al. 2004). Considering the facts that herbaceous Arabidopsis and woody poplar shared their last common ancestor over 100 Mya (Tuskan et al. 2006) and wood formation is crucial for the survival of tree species, it is conceivable that poplar trees evolved with more complexity of functionally regulatory switches, which makes the classification of subfamily VII reasonable.

To better understand the interrelationship of these subfamilies, we performed Bayesian analysis using a simplified alignment, which included 21 representative sequences selected from these subfamilies (Fig. 4). Although some branches in the Bayesian phylogenetic tree had low support values, it did show some complex divergence of NAC protein families, in which monophyly accompanied with paraphyly, and confirmed some close relationships between different subfamilies. For example, the monocot-/eudicot-specific subfamilies IVa, VIa, VIb, VII, IX, and X probably evolved later among land plants. Thus, there was plenty of time for them to gain their distinct properties. Pairs of subfamilies such as II/IVa, IIIa/VIa, IVb/IVd, and IVc/IX seem to form monophyletic lineages.

Figure 4.

A Bayesian analysis was performed on an alignment of the NAC sequence of one representative of each of the 21 subfamilies of NAC proteins. The tree is unrooted. The numbers in the clades are posterior probability values; clades with less than 50% support were collapsed.

CONSERVED NON-NAC MOTIFS ARE PRESENT IN MOST NAC SUBFAMILIES

The amino acid sequences outside the NAC region are often called transcriptional activation regions (TARs), which are highly variable, even in closely related proteins from the same species. A common feature of NAC protein TARs is the frequent occurrence of simple amino acid repeats and regions rich in serine and threonine, proline and glutamine, or acidic residues (e.g., Souer et al. 1996, Duval et al. 2002; Hegedus et al. 2003). Nevertheless, it has been reported that short-conserved amino acid motifs are often present in the TARs of related NAC proteins (Duval et al. 2002). Therefore, a theory emerged that if our classification of these NAC proteins is correct, there should be conserved motifs within these subfamilies. We investigated all the TARs of NAC proteins for amino acid patterns in our dataset and found 25 conserved motifs (including previously stated subdomain E, which was excluded from our phylogenetic analysis, Table 1). The closely related members in the phylogenetic tree have the same or very similar motif compositions in their TARs (Fig. 5): all are located at C-terminal to the NAC domain apart from motif 5; each of these motifs is only present in members of the same subfamily except for motif 11, which is present in both subfamilies Va(1) and Va(2), probably indicating their much closer relationship. None of the 25 conserved motifs correspond to known domains in the InterPro database (release 27.0); nevertheless, the motif composition of these sequences may provide clues for further functional analysis.

Figure 5.

Non-NAC amino acid motifs are highly conserved in each NAC subfamily. An idealized representation of a typical member of each NAC subfamily is shown, with the NAC domain and other conserved motifs drawn as shaded boxes. The diagrams are not drawn to scale.

Here, we have to point out that some of our NAC protein sequences are not complete (lack of the starting methionine or the stop codon or both), which might be due to the whole-genome sequencing methodology. Therefore, certain motifs stated here are probably not present in those incomplete NAC proteins. However, we speculate that most, if not all, of these sequences contain the distinct motifs belonging to their own subfamilies, which may be confirmed afterwards when the sequencing density of the whole genome is high enough.

Some motifs are not common within certain subfamilies but are distributed among several groups. For instance, when querying in the plant membrane protein database ARAMEMNON (http://aramemnon.botanik.uni-koeln.de/), 21 A. thaliana and five O. sativa sequences in our dataset, which were classified into subfamilies Ic, IIIa, IIIb, IVa, VIa, and VIc, were predicted to contain a transmembrane segment at their extreme C-termini. Another example was that some of the NAC genes could be posttranscriptionally regulated by a microRNA, miR164, which limits boundary expansion by cleaving the relevant mRNA (Mallory et al. 2004).

The presence of highly conserved motifs among proteins of the same subfamily supports the phylogenetic relationships inferred from the NAC domain sequences alone. The conservation of these extra motifs during plant evolution suggests that they are essential for the function of the NAC proteins within subfamilies. Moreover, the presence of some motifs in a few unrelated NAC proteins, the NAC proteins from different subfamilies, also indicates that domain-shuffling processes may have played a relatively small role in NAC evolution.

Discussion

GENE DUPLICATION EVENTS IMPACT GREATLY ON THE EXPASION OF NAC FAMILY

After exhaustive searches of complete genomes and other sequence data, we have identified the full repertoire of NAC proteins found in diverse plant species, improving previous reports of NAC in Arabidopsis and rice. For example, in Arabidopsis and rice, we identified 117 and 144 NAC proteins, versus the 105 and 75 previously reported (Ooka et al. 2003). We also newly identified the relatively complete set of NAC proteins from currently available genome assemblies of poplars, C. papaya, V. vinifera, Z. mays, S. bicolor, S. moellendorfii, and P. patens. The total number of NAC proteins found among plant genomes somewhat reflects the complex patterns of WGDs (Fig. 1) or gene loss that have characterized the evolutionary histories of these species analyzed in the present study. Extensive studies of the Arabidopsis genome sequence unveiled the remnants of three, possibly four, rounds of WGD (Simillion et al. 2002; Jaillon et al. 2007). At least two genome doublings have been proposed in the poplar (Tuskan et al. 2006); whereas the rice lineage experienced more than two, perhaps three, rounds of WGD (Tang et al. 2009). Moreover, genes involved in transcription regulation, signal transduction, and development are preferentially retained following gene duplications (Seoighe and Gehring 2004; Maere et al. 2005; Barker et al. 2008), such as tandem duplication, segmental duplication, and transposition events (retroposition and replicative transposition) (Zhang 2003; Li et al. 2009). Thus, it was not surprising to find in our dataset that several gene clusters, each with a certain genes, tandemly located in the genomes. For example, a cluster containing four genes on O. sativa chromosome 11 (LOC_Os11g31330, LOC_Os11g31340, LOC_Os11g31360, and LOC_Os11g31380). Usually, the sequences of the clustered members are highly similar to each other and fall into the same subfamilies, suggesting that tandem duplication did contribute to the expansion of NAC protein family. An analysis conducted by Nuruzzaman et al. indicated nine segmental duplication events involving 18 genes in the rice genome (Nuruzzaman et al. 2010). Given that the segmental duplication associated with the salicoid duplication event, a WGD event that occurred near the emergence of Salix and Populus lineages 60–65 Mya significantly contributed to the amplification of many multigene families (Cannon et al. 2004; Kalluri et al. 2007; Barakat et al. 2009; Wilkins et al. 2009), it will be of great interest to determine the contribution of other patterns of duplication, which needs further investigation. In fact, the duplication of large intragenomic segments accounted for about 60% of the transcription factors gene duplication events in rice (Xiong et al. 2005).

EVOLUTIONARY HISTORY OF NAC PROTEINS

Phylogenetic analyses provide a powerful approach for classifying NAC proteins and assessing their evolutionary history. Our analysis shows that several of the major subfamilies of NAC transcription factors already existed in early-diverged land plants, including moss. The overall picture that emerges from our analyses is that the NAC protein family has expanded and differentiated in parallel with the differentiation of plants into organisms of increasing complexity. Due to lack of genomic data for charophytes, we could not determine whether NAC proteins were land plant specific or arose much earlier before the divergence of land plants. The 30 oldest NAC proteins analyzed in the present study were derived from P. patens (Fig. 1). Compared to other protein families in plants, which usually expanded from fewer ancestral copies—for example, bHLH and SKP1 evolved from one ancestral sequence (Kong et al. 2007; Pires and Dolan 2010) and Glutaredoxins (GRXs) from four (Ziemann et al. 2009)—the number (30) of the oldest NAC proteins in P. patens is quite large. Moreover, some of these proteins fell into the same subfamily and probably functioned redundantly (see discussions below). Therefore, we suggest that these 30 NAC proteins discovered in moss are not the very original ones.

Since we found no NAC proteins in all of the unicellular organisms investigated in the present study, it is appealing to hypothesize that the first few NAC protein(s) originated sometime over 400 Mya when many novel structures and physiological mechanisms evolved through versatile gene regulatory networks. It was a mosaic period from then on, until the divergence of mosses. However, something did happen during that period, perhaps another round of gene expansion. To prove this, genome sequences of charophytes and more primitive land plants, such as liverworts, would be needed. Unfortunately, only a handful of expressed sequence tags are currently available. The vast diversification of NAC proteins appears to accompany the origin of flowering plants, which probably involved WGD. It is likely that gene duplication events in multiple NAC gene subfamilies produced novel lineages that have been maintained through angiosperm history.

THE ROLES OF TARS DURING NAC EVOLUTION

The findings that most of the TARs of the putative NAC proteins were conserved in parallel with NAC domain structures imply they are phylogeneticly informative, which is similar to that in other plant transcription factors such as WRKY (Eulgem et al. 2000) and MADS box (Zahn et al. 2005; Shan et al. 2007). However, the TARs of NAC proteins seem to have diverged more widely than the two family proteins mentioned above; thus, their influences should be more informative. The presence of some highly conserved motifs in different NAC subfamilies indicates the complicated evolutionary patterns of NAC proteins and suggests that the partners of molecular interactions are also conserved. A good example was a subset of Arabidopsis NAC mRNAs, including CUC1, CUC2, NAC1, At5g07680, and At5g61430, which was predicted to be targeted by members of the miR164 gene family (Rhoades et al. 2002).

Meanwhile, it is well known that gene structural diversity is probably one of the important factors for the evolution of multigene families. In particular, sequence divergence via exonization of intronic sequences and pseudoexonization of exonic sequences occurred frequently, resulting in coding region differences of recently duplicated genes (Xu et al. 2009). In Populus, according to Hu et al. (2010), 11 NAC genes out of the 49 paralogous pairs possessed different exons in their coding regions within nearly identical gene lengths. The fact that some NAC TARs are critical for target recognition allowed speculation that the differences caused by exon–intron boundary shifts and/or frameshift mutations might, to some extent, attribute to the subfunctionalization or neofunctionalization of NAC genes. Therefore, it is possible to deduce that both NAC domains and TAR structures are involved in determining the functions of NAC proteins during the evolutionary process.

NAC GENE FUNCTIONS ARE RELATIVELY CONSERVED WITHIN SUBFAMILIES

The phylogenetic reconstructions presented herein initially unravel the evolutionary history of NAC protein family. However, the lack of sequence information and limited studies of NAC family genes hindered our thorough understanding of the detailed stories during the long evolution process. Some NAC proteins have been intensely studied, but little or nothing is known about the functionality of many other NAC family genes and classes. With the currently known functions of NAC proteins, we classified them into two categories: (1) those that are involved in the relatively conserved processes (such as embryogenesis, cell division, seedling development, floral development, and senescence); and (2) those that are involved in relatively specific processes (such as biotic and abiotic stress responses). We found that members of the same NAC subfamily frequently regulate similar processes in different species (Table 2) except for subfamilies IVb, IVd, Va(1), and Va(2). Representatives are subfamilies Ic and Vb; nearly all of the NAC proteins from the former are involved in seedling development whereas the ones from the latter in stress responses (Table 2). Sometimes the functions of these proteins overlap, causing them to be partially or totally redundant (e.g., CUCs and NSTs). The functional redundancy provided an opportunity for them to constitute a system with high mutational robustness. Interestingly, the poplar PtrWND proteins from subfamily Ic seem to act in different ways: a diverse combination of heterodimers of PtrWNDs together with their downstream transcription factors form a transcriptional network, thus activating the secondary wall biosynthetic program during wood formation (Zhong et al. 2010).

Table 2.  Functionally characterized NAC proteins from different plant species.
NameNAC numberFunctionReferenceCategory2
  1. 1These proteins were not included in our phylogenetic analysis; their classification was considered according to the literature cited.

  2. 2Category: (i) proteins that are involved in the relatively conserved processes (embryogenesis, cell division, seedling development, floral development, and senescence, etc.); and (ii) proteins that are involved in relatively specific processes (biotic and abiotic stress responses, etc.).

Subfamily Iai
 CUC1AtCUC1Partially functional redundant in regulating shoot apical meristem formationTakada et al. (2001) 
 CUC2ANAC098
 CUC3ANAC031 Vroemen et al. (2003) 
 ORE1ANAC092Involved in the age-related resistance responseCarviel et al. (2009), Balazadeh et al. (2010)
 OsNAC1Os02g36880 Kikuchi et al. (2000) 
 OsNAC2Os04g38720Has potential utility for improving plant structure for higher light-use efficiency and higher yield potential in riceMao et al. (2007)
 ONAC045Os11g03370Enhances rice drought and salt toleranceZheng et al. (2009) 
 GRAB21Involved in plant growth, development, and senescenceXie et al. (1999)
 CmNACP1 Involved in supracellular regulationRuiz-Medrano et al. (1999) 
Subfamily Ibi
 NAC1ANAC021/22Function in auxin-induced development of lateral rootsXie et al. (2000) 
Subfamily Ici
 VND1ANAC037Redundantly regulate metaxylem and protoxylem vessel differentiationKubo et al. (2005) 
 VND2AtVND2
 VND3ANAC105   
 VND4ANAC007
 VND5ANAC026   
 VND6ANAC101
 VND7ANAC030 Yamaguchi et al. (2008) 
 SMBANAC038Negatively regulates FEZ activity, repressing stem cell-like divisions in the root cap daughter cellsWillemsen et al. (2008)
 NST1ANAC043Redundantly regulate secondary wall thickenings and are required for anther dehiscenceMitsuda et al. (2005) 
 NST2ANAC066
 SND1/NST3ANAC012 Zhong et al. (2007) 
 OsNAC7Os06g33940Kikuchi et al. (2000)
 PtrNAC064PNAC112WNDs together with their downstream transcription factors form a transcriptional network involved in the regulation of wood formation in poplarZhong et al. (2010) 
 PtrWND2APNAC084
 PtrWND2BPNAC085   
 PtrWND4APNAC011
 PtrWND6APNAC058   
 PtrWND6BPNAC056
Subfamily IIIa   ii
 NTL2ANAC014a/bKim et al. (2007)
 NTL5ANAC060 Kim et al. (2007) 
 NTL6ANAC062Plays a role in cold-induced pathogen resistance responseSeo et al. (2010)
 NTL9/CBNACAtNAC076a/bPlays a role in osmotic stress response and leaf senescenceKim et al. (2008) 
 TIPANAC091An essential component in the turnip crinkle virus (TCV) resistance response pathwayRen et al. (2000)
 OsNTL1Os06g01230 Kim et al. (2010) 
 OsNTL2Os08g06140Kim et al. (2010)
 OsNTL3Os01g15640 Kim et al. (2010) 
Subfamily IIIbi
 NTL1ANAC013 Kim et al. (2007) 
 NTL3ANAC016a/bKim et al. (2007)
 NTL4ANAC053 Kim et al. (2007) 
 NTL7ANAC017Kim et al. (2007)
 NTL11/NAC2ANAC078Regulates flavonoid biosynthesis under high lightMorishita et al. (2009) 
 VNI1ANAC082Yamaguchi et al. (2010)
 OsNTL6Os02g57650 Kim et al. (2010) 
 OsNTL5Os08g44820Kim et al. (2010)
 OsNTL4Os09g32040 Kim et al. (2010) 
Subfamily IIIcii
 RIMI1 Functions as a host factor that is required for multiplication of RDV (rice dwarf virus) in riceYoshii et al. (2010) 
Subfamily IVai
 NTL10ANAC001 Kim et al. (2007) 
 NTM1ANAC068a/bInvolved in cell cycle controlKim et al. (2006)
 NTM2/NTL13ANAC069   
Subfamily IVbi, ii
 LOV1ANAC034Control of flowering time and cold responseYoo et al. (2007) 
 ANAC036ANAC036Involved in the growth of leaf cellsKato et al. (2010)
Subfamily IVc   i
 TERN1Involved in tobacco elicitor responsiveSuzuki et al. (1999)
Subfamily IVdi, ii
 FEZANAC009Promotes periclinal, root capforming cell divisionsWillemsen et al. (2008) 
 OsNAC6Os08g33910Plays an important role in eliciting responses to high-salinity stressYokotani et al. (2009)
  Involved in abiotic and biotic stress-responsive gene expressionNakashima et al. (2007) 
Subfamily Va(1)i, ii
 AtNAPANAC029Controlling cell expansion in specific flower organsSablowski and Meyerowitz (1998) 
 OsNAC10Os11g03300Enhances drought tolerance in riceJeong et al. (2010)
 NAM-B11 Regulates senescence and improves grain protein, zinc, and iron content in wheatUauy et al. (2006) 
 CarNAC31Involved in drought stress response and various developmental processesPeng et al. (2009)
Subfamily Va(2)   i, ii
 AtNAMANAC018Regulate embryogenesisKunieda et al. (2008)
 AtNAC2AtNAC2Involved in salt stress response and lateral root development;He et al. (2005), 
regulates embryogenesisKunieda et al. (2008)
Subfamily Vb   ii
 ANAC019ANAC019Regulates JA (jasmonic acid)-induced expression of defense genesBu et al. (2008)
 AtNAC3AtNAC3Involved in the age-related resistance responseCarviel et al. (2009) 
 RD26ANAC072a/bInvolved in a novel ABA-dependent stress-signaling pathwayFujita et al. (2004)
 ATAF1ANAC002Negatively regulates the expression of stress-responsive genes under drought stressLu et al. (2007) 
 ATAF2ANAC081Represses the expression of pathogenesis-related genesDelessert et al. (2005)
 ANAC102ANAC102Regulates seed germination under low-oxygen stressChristianson et al. (2009) 
 SNAC1Os03g60080Improves drought and salinity tolerance in riceHu et al. (2006)
 SNAC2Os01g66120Improves stress tolerance in riceHu et al. (2008) 
 OsNAC3Os07g12340Kikuchi et al. (2000)
 OsNAC4Os01g60020A key positive regulator of plant hypersensitive cell deathKaneda et al. (2009) 
 OsNAC5Os11g08210Senescence-associated ABA-dependent NAC transcription factorSperotto et al. (2009)
 GRAB11 Involved in plant growth, development, and senescenceXie et al. (1999) 
 StNAC1In response to infection and to woundingCollinge and Boller (2001)
 TaNAC41 Involved in defense response against stripe rust pathogen infection and abiotic stressesXia et al. (2010) 
Subfamily VIai
 SENU51 Involved in leaf senescenceJohn et al. (1997) 
 NTL8ANAC040Mediates salt stress signaling in flowering time control;Kim et al. (2007),
  regulates gibberellic acid-mediated salt signaling in seed germinationKim et al. (2008) 
 VNI2ANAC083Regulates xylem cell specification as a transcriptional repressor that interacts with VND proteinsYamaguchi et al. (2010)
Subfamily VIc   i
 XND1ANAC104Negatively regulates xylem vessel differentiationZhao et al. (2008)

Conclusions

In this study, we have established the evolutionary backbone of the NAC protein family in plants, and primarily unraveled its intricate evolutionary history. It is a pity that based on currently available data, we could not gather every piece of the puzzle so that to complete the integrate story of what happened in the past, especially when and how the very first NAC protein(s) originated. However, results deduced from this study do enlighten us on several aspects: NAC proteins of some subfamilies that show unique characteristics should be further investigated; the fact that some NAC gene lineages have experienced extensive duplications makes the NAC family an excellent system for the study of the evolutionary fate of duplicated genes; meanwhile, considering the presence of NAC genes in land plants ranging from mosses to eudicots, it will be of great interest to study the NAC genes in relatively “simple” bryophytes to elucidate complex patterns of both conservation and divergence of gene expression, protein interaction, and DNA binding, especially with regard to the evolution of plant development.


Associate Editor: M. Johnston

ACKNOWLEDGMENTS

We are greatly indebted to the editors and the two anonymous reviewers for their constructive comments and corrections. This work was supported by the China National Science Foundation (NSFC) Grant Nos. 31030055 and 30870233, China National Special Program for Development of Transgenic Plant & Animal New Cultivars (Development of transgenic quality wheat germplasm with soft & weak gluten, and Development of transgenic wheat new cultivars with resistance against rust diseases and powdery mildew), Chinese Academy of Sciences under the Important Directional Program of Knowledge Innovation Project Grant No. KSCX2-YW-Z-0722, the CAS Strategic Priority Research Program Grant No. XDA05130403, the “973” National Key Basic Research Program Grant No. 2009CB118300, and the Ancell Teicher Research Foundation for Genetics and Molecular Evolution.

Ancillary