SEARCH

SEARCH BY CITATION

Keywords:

  • ectomycorrhiza;
  • gene duplications;
  • gene family evolution;
  • Laccaria bicolor;
  • protein kinases;
  • RAS small GTPases

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information
  • • 
    The ectomycorrhizal fungus Laccaria bicolor has the largest genome of all fungi yet sequenced. The large genome size is partly a result of an expansion of gene family sizes. Among the largest gene families are protein kinases and RAS small guanosine triphosphatases (GTPases), which are key components of signal transduction pathways.
  • • 
    Comparative genomics and phylogenetic analyses were used to examine the evolution of the two largest families of protein kinases and RAS small GTPases in L. bicolor. Expression levels in various tissues and growth conditions were inferred from microarray data.
  • • 
    The two families possessed a large number of young duplicates (paralogs) that had arisen in the Laccaria lineage following the separation from the saprophyte Coprinopsis cinerea. The protein kinase paralogs were dispersed in many small clades and the majority were pseudogenes. By contrast, the RAS paralogs were found in three large groups of RAS1-, RAS2- and RHO1-like GTPases with few pseudogenes.
  • • 
    Duplicates of protein kinases and RAS small GTPase have either retained, gained or lost motifs found in the coding regions of their ancestors. Frequent outcomes during evolution were the formation of pseudogenes (nonfunctionalization) or proteins with novel structures and expression patterns (neofunctionalization).

Abbreviations:
 ARF

ADP ribosylation factor

Cy3

cyanine-3

D

degenerate

ECM

ectomycorrhizal

epK

eukaryotic protein kinase

EST

expressed sequence tag

FB

fruiting body

FLM

free-living mycelium

GEO

Gene Expression Omnibus

GTPase

guanosine triphosphatase

JGI

Joint Genome Institute

IKS, Ira1 kinase suppressor; MAPK

mitogen-activated protein kinase

ME

minimum evolution

MP

maximum parsimony

N

novel

NF

neofunctionalization

NJ

neighbor joining

PLK, Polo like kinase; R

retained

RAB

rat brain

RAN

RAS-related nuclear

RAS

rat sarcoma

RHO

RAS homolog

SF

subfunctionalization

TC

tentative consensus

Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Ectomycorrhizal (ECM) symbiosis – the union of soil fungi and roots of woody plants – is widespread in boreal and temperate forests (Smith & Read, 1997). During symbiosis, the two organisms exchange carbon and nutrients in a specific tissue that is formed during contact between a compatible fungus and plant. The plant-derived carbon supports the growth of an extensive web of hyphae. The advancing front of this mycelium can efficiently prospect for nutrient resources, often at a considerable distance from the plant root. Behind the front, the extramatrical mycelium can differentiate into discrete tissue, forming rhizomorphs and fruiting bodies (Smith & Read, 1997). The development of these distinct tissues and the regulation of their physiological activities require that the fungus can respond adequately to cues from the environment and the host. Information from such external signals is transferred to the interior of the cells and is converted into specific responses that affect gene expression, cellular processes and differentiation. The molecular mechanisms of the signaling system have not yet been characterized in ECM fungi (Martin et al., 2007).

Notably, the analysis of the genome of Laccaria bicolor, which is the first ECM fungus to become sequenced, has shown a significant expansion in several gene families known to be involved in signal transduction pathways (Martin et al., 2008). Two of these families, Family-2 and Family-6, encode protein kinases and RAS small guanosine triphosphatases (GTPases), respectively (Martin et al., 2008). Protein kinases control numerous cellular processes in eukaryotes, such as morphological changes, cell cycle transitions and stress responses (Manning et al., 2002; Westfall et al., 2004). Members of the RAS GTPase family function as versatile molecular switches in diverse processes, including signal transduction, cell polarity, cytoskeleton regulation and vesicle trafficking (Ridley, 2001).

The most common mechanism for generating the expansion of gene families is the duplication of genes or larger chromosomal regions. The classical model for describing the creation of functional novelties following gene duplication postulates that gene duplication creates a redundant locus that is free to accumulate otherwise deleterious mutations as long as the original copy maintains the ancestral function (Ohno, 1970). The most likely outcome after a period of relaxed selection is that the redundant gene degenerates to become a pseudogene (nonfunctionalization). A less frequent outcome is that the redundant copy obtains a new function by alterations in coding or regulatory sequences (neofunctionalization, NF). Alternatively, the original function of the ancestral protein can be partitioned between the two duplicates, with each daughter gene becoming specialized in a subset of functions of the ancestral gene (subfunctionalization, SF) (Hughes, 1994). The SF model was extended into the so-called duplication–degeneration–complementation model, which also includes mutations in regulatory regions (Force et al., 1999). In this model, the two gene copies acquire complementary loss-of-function mutations to the point at which both copies produce the full complement of the single ancestral gene. More recent studies have suggested that neither NF nor SF alone can adequately explain the divergence of gene duplicates. Instead, many gene duplicates undergo rapid SF, followed by a prolonged and substantial NF (the SNF model) (He & Zhang, 2005).

The extensive expansions of protein kinases and RAS small GTPases in Laccaria compared with saprophytic and parasitic basidiomycetes suggest that these families have important roles in controlling the morphological and physiological changes accompanying the establishment of a functional mycorrhizal association. Martin & Selosse (2009) suggested that the adaptation to a rapidly changing environment during ECM formation involves a cascade of gene networks, including signaling pathways. The purpose of this study was to examine the patterns and mechanisms for expansion and functional divergence of Laccaria protein kinases and RAS small GTPases in Family-2 and Family-6. Phylogenetic trees were constructed for the proteins in these families. Clusters of Laccaria paralogs were identified from the trees that arose and diverged in the Laccaria lineage following the separation from other basidiomycetes. The relationships of the Laccaria paralogs to the well-characterized protein kinases and small GTPases of Saccharomyces cerevisiae were established. Pseudogenes with truncated catalytic regions were also recognized. The genetic changes following the duplication events were examined in more detail in some of the paralog clusters to determine the validity of the various evolutionary models for describing the fates and functional divergence of the duplicated genes. The transcriptional patterns for the Laccaria duplicates were revealed using data from oligonucleotide microarray and expressed sequence tag (EST) analyses (Martin et al., 2008).

Materials and Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Genomes used

Sequence data for five basidiomycete genomes were extracted from the databases at the Joint Genome Institute (JGI) (http://www.jgi.doe.gov) and the Broad Institute (http://www.broad.mit.edu/annotation/fgi). The genome sequences used were as follows: the 65 megabase pair (Mbp) genome of Laccaria bicolor S238N-H82 predicted to encode 20 614 proteins (release version 1.0, March 2005; JGI); the genome of Coprinopsis cinerea (Coprinus cinereus) Okayama7#130 containing ∼37 Mbp and encoding 13 544 proteins (release 1, July 2003; Broad); the Phanerochaete chrysosporium RP78 genome containing 35 Mbp and encoding 10 048 proteins (release version 2.0, February 2005; JGI); the Cryptococcus neoformans H99 genome containing 19.5 Mbp and encoding 7302 proteins (Assembly 1, May 2003 & Gene set 3.0, February 2006; Broad); and the Ustilago maydis 521 genome containing 19.7 Mbp and encoding 6522 proteins (Release 2, March 2004; Broad).

Protein family identification

Protein families were identified and analyzed as described previously (Martin et al., 2008). Briefly, all predicted protein sequences (in total 58 030) from the five basidiomycetes were searched against each other using the BLASTP program (Altschul et al., 1990), with a threshold E-value of 1e−5, and used as input for the TRIBE-MCL algorithm, applying default parameters (Enright et al., 2002). The evolutionary change in protein family sizes was calculated using the CAFE program (De Bie et al., 2006). Given a phylogeny and protein family sizes of extant taxa, the CAFE tool infers the most likely gene family sizes at internal nodes and identifies families that have accelerated rates of gene gains or losses.

Network analysis of the protein kinase superfamily

In total, 1120 protein sequences were retrieved from 41 families containing sequences with significant homology to the protein kinase Pfam domain PF00069 (E-value threshold of 0.05). An all-against-all similarity search of the 1120 proteins using BLASTP with an E-value threshold of 1e−5 was performed. The sequence similarities (E-values) were used to generate a weighted graph employing the Biolayout graph algorithm (Enright & Ouzounis, 2001). The protein families identified by the TRIBE-MCL algorithm were mapped onto the graph, and the topology for each protein family was inspected manually. Network properties, such as average connectivity, average node degree and maximum connectivity, of the protein kinase superfamily were calculated using Biolayout software.

Phylogenetic analysis

A phylogenetic tree of fungal protein kinases was constructed by extracting the PF00069 domain from 512 proteins of basidiomycete Family-2 using Seqret software (EMBOSS, version 4.0.0) (Rice et al., 2000) having an E-value of at least 0.05. In addition, the PF00069 domain was extracted from 121 Saccharomyces cerevisiae protein kinases present in the KinBase database (http://kinase.com). Proteins confirmed by the domain search were regarded as putative protein kinases. All PF00069 domains (512 + 121) were aligned against a new HMM profile using the HMMAlign program (HMMER software, version 2.3.2, http://hmmer.janelia.org), which was constructed using the HMMbuild program (HMMER) from a full alignment of 25 281 protein kinase domains obtained from the Pfam-LS database (version 22.0) (Finn et al., 2006). A neighbor joining (NJ) tree with 100 bootstrap replicates was constructed using Quicktree (Howe et al., 2002). A tree viewer (ATV) (Zmasek & Eddy, 2001) and the Hypertree program version 12 (Bingham & Sudarsanam, 2000) were used to visualize and classify the protein kinase groups. ‘Orthologs in S. cerevisiae’ included subgroups containing sequences of yeast kinases. The subgroups were placed into the following eukaryotic protein kinase (epK) families: ‘AGC’ includes cyclic nucleotide and calcium phospholipid-dependent kinases, ribosomal S6-phosphorylating kinases, G-protein-coupled kinases and all close relatives of these groups; ‘CAMK’ comprises calmodulin-regulated kinases; ‘CMGC’ includes cyclin-dependent kinases and mitogen-activated protein kinases (MAPKs); ‘Other’ consists of kinases not classified into any of the above groups (e.g. IKS (Ira1 kinase suppressor) and PLK (Polo like kinase)); ‘STE’ includes many kinases functioning in the MAPK cascades. (For a description of these subgroups, see http://kinase.com/.)

Using a similar procedure, a phylogenetic tree for RAS small GTPases was reconstructed by extracting the Pfam domain PF00071 from 160 proteins within Family-6. These domains, together with 29 S. cerevisiae PF00071 domains obtained from the Pfam database, were aligned against new profile HMM constructed from an alignment of 5239 RAS domains present in the Pfam database. The RAS superfamily is broadly classified into five subfamilies (subgroups): RAS (rat sarcoma), RHO (RAS homologs), RAB (rat brain), ARF (ADP ribosylation factors) and RAN (RAS-related nuclear) (Takai et al., 2001).

Trees of the aligned PF00069 and PF00071 domains were also reconstructed using the minimum evolution (ME) and maximum parsimony (MP) methods (100 bootstrap replications) in MEGA 3.1 software (Kumar et al., 2004).

Paralogs and pseudogenes

Paralogs were identified by analyzing the NJ trees. Only paralogs within the lineage-specific branch having a bootstrap support of 50% or higher were included in the analysis. This definition of paralogs corresponds to the term inparalogs used by others (Sonnhammer & Koonin, 2002).

Pseudogenes were recognized as gene models lacking any of the conserved sites in the catalytic domains of well-characterized protein kinases and RAS GTPases. Conserved sites were identified by using information retrieved from the Conserved Domain Database (Marchler-Bauer et al., 2007).

Prediction of ancestral protein sequences

Laccaria protein sequences within the subgroups of IKS1- and PLK-like protein kinases (cf. Fig. 2c,d, see Results) and RAS1-, RAS2- and RHO1-like small GTPases (cf. Fig. 4b,c, see Results) were manually annotated. Ancestral protein sequences were predicted using Gapped Ancestral Sequence Prediction (GASP) software (Edwards & Shields, 2004). The input tree was extracted from the constructed linearized NJ tree, and the multiple alignments of the protein sequences were generated using MUSCLE (version 3.6) (Edgar, 2004).

imageimage

Figure 2. Evolution of protein kinases from Laccaria bicolor and other basidiomycetes. (a) A weighted graph of proteins belonging to the protein kinase superfamily of Laccaria bicolor, Coprinopsis cinerea, Phanerochaete chrysosporium, Cryptococcus neoformans and Ustilago maydis. The graph was produced using data from an all-against-all similarity search of 1120 proteins matching the protein kinase domain PF00069 (cf. Materials and Methods) and the network was visualized by the Biolayout graph algorithm (Enright & Ouzounis, 2001). Mapped onto the network are the identities of the 12 largest TRIBE protein families (≥ 15 members). (b) An unrooted neighbor joining (NJ) tree of protein sequences containing the PF00069 domain belonging to Family-2. Terminal branches of the 140 L. bicolor sequences are shown in red, 121 Saccharomyces cerevisiae sequences in blue and the remaining 372 basidiomycete sequences in grey. On the basis of this clustering, the protein kinases were classified into a number of subgroups (cf. Table 1). Clades that do not contain yeast protein kinases are indicated (N). The evolution of sequences within the clades ‘Other/IKS’ and ‘Other/PLK’ (indicated by arrows) were analyzed in more detail. (c) Divergence of L. bicolor protein kinases within the ‘Other/IKS’ subgroup. The expression levels (log2 levels as indicated by the scale) of the Laccaria genes are shown in the middle of the panel. ECM, ectomycorrhiza; FB, fruiting body; FLM, free-living mycelia; C, mock inoculated control; T, treated with bacterial inoculation (cf. Materials and Methods). The right panel shows the predicted motif structures of the paralogs and their predicted ancestors (labelled A1 and A2). The alternating grey and yellow shaded background indicates L. bicolor and ancestral sequences with identical motif structure. The N-terminal region corresponding to the 336005 gene model is incomplete (indicated by broken line) because of the terminal location of a scaffold. (d) Divergence of L. bicolor protein kinases within the ‘Other/PLK’ subgroup. The panels for expression levels and motif structures of paralogs and ancestors (A1) are explained in the legend to (c). PF00659 is the polo box domain.

image

Figure 4. Evolution of RAS small guanosine triphosphatases (GTPases) in Laccaria bicolor. (a) An unrooted, neighbor joining (NJ) tree of protein sequences containing the RAS small GTPase domains (PF00071). The terminal branches of the 55 L. bicolor sequences are shown with red lines, the 23 Saccharomyces cerevisiae sequences with blue lines and 105 sequences from the other basidiomycetes with black lines. Based on the clustering of sequences, basidiomycete sequences belonging to the RAS small GTPases were classified into a number of subgroups (cf. Table 2). The evolution of genes within the encircled clades of ‘RAS1/RAS2’ and ‘RHO1’ was analyzed in more detail. (b) Divergence of RAS1- and RAS2-like GTPases in L. bicolor. The NJ tree divided the L. bicolor proteins into two subgroups. The bottom clade (RAS1-like subgroup) contains six L. bicolor gene models, including four paralogs (Laccaria gene IDs labelled with a prefix Lac and in bold type), as well as RAS1 and RAS2 of S. cerevisiae. The middle panel showing the expression patterns and the right panel showing the motif structures of paralogs and ancestors (designated A1–A8) are explained in the legend to Fig. 2c. (c) Divergence of RHO1-like GTPases in L. bicolor. The panels showing the expression levels and motif structures of paralogs and ancestors (A1–A10) are explained in the legend to Fig. 2c.

Download figure to PowerPoint

Protein motif identification

Protein motifs were identified from full-length nucleotide sequences of the L. bicolor paralogs and their ancestors using the MEME tool (version 3.5.7) (Bailey & Elkan, 1994). The default parameters used were an E-value threshold of 1e−10 with the motif length parameter set to a minimum of 5 and a maximum of 50. In principle, three different changes in protein motif structure can be expected following gene duplications (Yang et al., 2006). First, the ancient protein motif structure may be retained (R). Second, one or more protein motifs may degenerate (D) and parts of the original function may be lost. Third, at least one novel (N) protein motif may have evolved compared with the structure of the ancestral gene. After a duplication event, any combination of these three gene fates is possible, thereby giving the possibility of six different outcomes (RR, RD, RN, DD, DN and NN).

Analysis of nucleotide substitutions

Pairwise alignments of the L. bicolor paralogous protein sequences were generated using MUSCLE, and the corresponding nucleotide alignments were constructed using Tranalign (EMBOSS). The rates of synonymous substitutions per site (ds) and rates of nonsynonymous substitutions per site (dn), using a sliding window size of 30 and a shift size of 15 codons, were determined for each gene pair using the CRANN program (Creevey & McInerney, 2003).

Analysis of microarray data

The whole-genome L. bicolor (S238N) microarray was constructed by NimbleGen (Madison, WI, USA), representing 20 226 gene models (based on the JGI L. bicolor genome sequence version 1.0) (Martin et al., 2008), and includes eight independent, nonidentical, 60-mer reporters for each gene model. The array design is available at the National Center for Biotechnology Information Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) under the accession number GPL6192 (‘INRA Laccaria bicolor whole-genome expression array 45 K version 1’). Before analysis, reporters were discarded that potentially could cross-hybridize, that is reporters expected to hybridize with several transcripts. For this purpose, sequences for all reporters were searched against the L. bicolor genome sequence using BLASTN. Reporters showing more than 85% similarity to genes other than the gene intended were removed. After this filtering, 16 977 gene models remained in the dataset that could be assessed using at least one 60-mer reporter.

To maximize the number of experiments in this study, we used two sources of microarray data. First, we collected data for a series of eight previously published transcriptional microarray hybridizations (Martin et al., 2008). The complete dataset is publicly available (GEO accession GSE9784) and the data represent samples derived from various tissues, as described previously (Martin et al., 2008). Four samples originated from ECM tissue, two derived from symbiosis with poplar roots (ECM3 and ECM1) (GSM245987 and GSM245992) and two from interactions with Douglas fir (ECM9 combined) (GSM245993 and GSM245994). Another two samples were derived from fruiting bodies (FB1, FB2 combined and FB6) (GSM245995 and GSM245996) and two samples from free-living mycelium (FLM14C and FLM21) (GSM245997 and GSM247052). The two FLM samples (GSM245997 and GSM247052) and the two Douglas fir ECM samples (GSM245993 and GSM245994) represented biological replicates. Secondly, we also added data from another three Laccaria transcriptional microarray hybridizations (A. Deveau et al., unpublished) (GEO accessions GSM292146, GSM292155 and GSM292658) representing one sample of free-living mycelium (FLM14C) (K_94480_532) and two biological replicates of free-living mycelium encountering Pseudomonas fluorescens BBc6R8 bacterial suspension (FLM14T) (E_90754_532 and K2_110226_532). All samples were labeled with cyanine-3 (Cy3).

To estimate the signal-to-noise threshold (signal background) before the presence of a transcript can be declared, the mean intensity of 30 000 random reporters present on the microarray was calculated. A gene was deemed to be expressed when its signal intensity was three-fold higher than the mean signal-to-noise threshold (cut-off value) of 30 000 random reporters present on the array (50–100 arbitrary units). Gene models showing a three-fold higher signal than the cut-off level were considered as transcribed. The microarray data were analyzed by applying two sequential mixed-model ANOVAs (Jin et al., 2001; Wolfinger et al., 2001) to the log2-transformed spot measurements. From this procedure, we retrieved the estimated relative expression level (log2) for each gene and each treatment. Student's paired t-test with a two-tailed distribution was used to determine whether the gene expression values (log2) of the recently duplicated paralogs were significantly different (Table 3, see Results).

Table 3.  Divergence in sequence and expression patterns of paralogs of protein kinase and RAS guanosine triphosphatase (GTPase) families in Laccaria bicolor
ProteinCompared duplicatesFate of protein motifs2Rate of substitutions3Gene expression divergence4Evolutionary models of functional divergence5 
  • 1

    Gene models in italics were found to be truncated within regions corresponding to catalytic domains, and are thus regarded as pseudogenes.

  • 2

    Changes in protein motifs following gene duplications. In principle, three different changes in protein motif structure can be expected following gene duplications (Yang et al., 2006). First, the ancient protein motif structure may be retained (R). Second, one or more protein motifs degenerate (D) and parts of the original function may be lost. Third, at least one novel (N) protein motif has evolved compared with the structure of the ancestral gene. After a duplication event, any combination of these three gene fates is possible, thereby giving six possible outcomes (RR, RD, RN, DD, DN and NN).

  • 3

    The rate of change in nucleotide substitution (dn/ds) when comparing the duplicates using a sliding window analysis (cf. Fig. S2, Supporting Information). The dn/ds ratio was not calculated for paralogs with identical or truncated nucleotide sequences. Duplicates including ancestral sequences were not analyzed (indicated as N/A, not applicable).

  • 4

    Divergence in gene expression pattern of the compared gene duplicates. Data from microarray experiments including eight different samples from three different tissues: fruiting bodies, mycorrhizal root tips and mycelium. The numbers given are the P values of Student's paired t-tests (see Materials and Methods). Gene models for which the expression pattern could not be concluded, as data had been excluded because of expected cross-hybridization, are indicated (No data). Gene expression levels of the ancestral sequences are labeled not applicable (N/A).

  • 5

    Evolutionary models proposed for the generation of divergence in protein motif structures and gene expression levels (Yang et al., 2006). Neofunctionalization (‘NF’) matches the protein motif changes of RD and RN. In cases in which both genes exhibit new protein structures (NN), this suggests that the gene duplicates undergo rapid subfunctionalization (‘SF’), followed by a prolonged and substantial NF (the ‘SNF’ model). A divergence in expression pattern indicates that the duplicates have evolved new functions according to either the NF or SF models (the two models cannot be distinguished because the expression patterns for the ancestral sequences are not known). ‘Retention’ matches RR. Evolutionary models in italics indicate seven cases in which the duplication event was not resolved in at least one of the phylogenetic trees based on the minimum evolution (ME) and maximum parsimony (MP) methods (Fig. S3, see Supporting Information).

  • 6

    Topological differences in ME and MP trees may alter the evolutionary model for the ancestral proteins A6 and A7.

  • 7

    The 168862 gene model is located at the end of a scaffold and the region corresponding to the N-terminus is unknown. The truncated gene product has the same motifs as the ancestral sequence A4.

Protein kinases
IKS1-like (cf. Fig. 2c)299595299584RD0–3.58 E-07NF
336005A1RRN/AN/ARetention
PLK-like (cf. Fig. 2d)1603461161489RRTruncatedNo dataRetention
RAS GTPases
RAS1-like (cf. Fig. 4b)297047297059RRIdenticalBelow backgroundRetention
249795249517RD0–1.0No dataNF
A6A7NNN/AN/ASNF6
RAS2-like (cf. Fig. 4b)150140148268RRIdenticalNo dataRetention
256615254501RR0–0.9No dataRetention
A4A1RNN/AN/ANF
A2291884RRN/AN/ARetention
A3148636RRN/AN/ARetention
RHO1-like (cf. Fig. 4c)154172148214RR0–3.0No dataRetention
169236168945RRIdenticalNo dataRetention
A2336069RRN/AN/ARetention
A3248037RRN/AN/ARetention
A41688627RRN/AN/ARetention
A1A5RNN/AN/ANF
163181163355NN0–1.01 E-07SNF
A8333548RDN/AN/ANF
A6A9RNN/AN/ANF
148377154221RD0.2–1.05 E-09NF

Analysis of ESTs

Gene expression was also investigated using 14 312 tentative consensus (TC) sequences assembled from 38 901 EST sequences which were downloaded from the LaccariaDB website (mycor.nancy.inra.fr/IMGC/LaccariaGenome). The gene model corresponding to each TC sequence was predicted using BLASTN with a cut-off of 100 bp and a minimum of 90% identity. The TC sequences corresponding to genes belonging to protein Family-2 and Family-6 were further investigated, and ambiguous matches (i.e. same TC sequences show a near-perfect match to more than one gene model) were manually inspected and removed.

Results

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Strategies to identify expanded gene families

Protein kinases and RAS small GTPases are large families (or superfamilies) that are commonly identified on the basis of the presence of conserved protein domains (Takai et al., 2001; Manning et al., 2002; Wennerberg et al., 2005). The identification of these families based solely on the presence of conserved domains is, however, not feasible because the protein kinase and RAS small GTPase domains occur in a number of different combinations with other domains. The Pfam website (pfam. sanger.ac.uk) reports that the protein kinase domain (Pfam PF00069) is found in 771 different architectures, and the RAS domain (Pfam PF00071) in 35 architectures. In addition, domains in many multiple domain proteins are ‘promiscuous’, that is they are evolutionary labile and can be found in combination with many other domains (Basu et al., 2008). As a result, the presence of a shared conserved protein domain does not necessarily imply that the proteins share a common evolutionary history. To circumvent these problems, we decided to identify families of protein kinases and RAS small GTPases by clustering full-length sequences using the TRIBE-MCL algorithm (Enright et al., 2002).

The 58 030 protein sequences from the five analyzed basidiomycete genomes were clustered into 7352 TRIBE families. The protein kinase PF00069 domain was found in 41 families containing 1140 proteins. The largest family was Family-2 containing 532 proteins, 150 of which originated from L. bicolor. The RAS PF00071 domain was found in 10 TRIBE families containing 451 proteins. Among these, Family-6 was the largest family with 165 proteins (59 from L. bicolor). An evolutionary analysis showed that Family-2 and Family-6 have been significantly (P < 0.001) expanded in the L. bicolor lineage (Fig. 1).

image

Figure 1. Evolutionary changes of gene family sizes of protein kinases and RAS small guanosine triphosphatases (GTPases) from various basidiomycete genomes. A linearized neighbor joining (NJ) tree with 1000 bootstrap replicates of 18S ribosomal DNA sequences from Laccaria bicolor, Coprinopsis cinerea, Phanerochaete chrysosporium, Cryptococcus neoformans and Ustilago maydis was constructed. Aspergillus niger was used to root the tree and the divergence time between A. niger and U. maydis (Berbee & Taylor, 2001) was used to date other lineages. The branch lengths are in million years and are indicated by the scale (x-axis). The bootstrap support for each node was < 98 (values not shown). The numbers shown at the nodes are the most likely family sizes of Family-2 (protein kinases) and Family-6 (RAS small GTPases) (indicated as Family-2/Family-6), as estimated using the CAFE tool (De Bie et al., 2006).

Download figure to PowerPoint

A detailed investigation of the expansion of Laccaria proteins in Family-2 and Family-6 was performed using phylogenetic methods. Data were analyzed using three different tree constructing methods, including NJ, ME and MP. In general, the ME and MP methods generated less well-resolved trees than the NJ method. Accordingly, the homologs/paralogs and duplication events identified differed slightly between the three methods. Because clustering in the NJ tree agreed very well with the presence of various catalytic motifs of the small GTPases (Table S3, see Supporting Information), and the NJ method has been used to cluster structurally and functionally related protein kinases in many eukaryotes (Manning et al., 2002), data from the NJ trees are shown when analyzing the patterns of expansion in Family-2 and Family-6.

Expansion of protein kinases in Family-2

To examine the relationship between Family-2 and the other PF00069-containing families identified in the five basidiomycete genomes, a weighted graph of all proteins containing the PF00069 domain was generated (Fig. 2a). Clearly, the large TRIBE families, including Family-2, formed cohesive clusters in the obtained network. Family-2 was located centrally within this network connecting to all other families. The network properties of all protein families showed a power-law degree distribution (data not shown). The average connectivity of the protein kinase superfamily was 238 and the network diameter was 9, showing that, despite the clear division into many families, the protein kinase superfamily is highly conserved. The maximum connectivity (562) was found among the proteins of Family-2.

The PF000069 domain was identified in 140 of 150 Laccaria proteins of Family-2. These proteins, as well as those from the other basidiomycetes, were classified into various subgroups and families (epK families) of protein kinases using a phylogenetic method (Manning et al., 2002) (Fig. 2b). In total, 80 of the 140 Laccaria proteins were affiliated with subgroups that belong to the five epK families of S. cerevisiae (Tables 1, S1, see Supporting Information). The remaining L. bicolor proteins of Family-2 were either found in subgroups lacking S. cerevisiae homologs or located outside any of the clusters. Hence, these proteins did not have any yeast ortholog.

Table 1.  Classification of protein kinases in Family-21
GroupSubgroupsNumber of proteinsParalogs2
LacCoprPhanCrypUst
  • 1

    In total, Family-2 consists of 512 proteins containing the protein kinase domain PF00069. These proteins were classified into a number of subgroups based on a phylogenetic analysis [neighbor joining (NJ) tree] of the PF00069 domain. A subgroup is defined as a clade of sequences having a bootstrap support value ≥ 50 (phylogenetic tree shown in Fig 2b). ‘Orthologs in S. cerevisiae’ includes subgroups containing sequences of yeast kinases. The classification of these subgroups was made according to KinBase (Materials and Methods). ‘No orthologs in S. cerevisiae’ comprises subgroups that do not contain any yeast kinase homologs. The ‘Not clustered’ protein kinases include those that are located outside any of the clades identified in the NJ tree. Copr, Coprinopsis cinerea; Cryp, Cryptococcus neoformans; Lac, Laccaria bicolor; Phan, Phanerochaete chrysosporium; Ust, Ustilago maydis. Further details including protein ID, catalytic domains, expression and pseudogenes within these subgroups are given in Table S1 (see Supporting Information).

  • 2

    Number of L. bicolor paralogs identified in each subgroup

Orthologs in Saccharomyces cerevisiae
AGC 8  9  8  9 8 7 
CAMK12 13 12 1212 8 
CMGC13 19 17 181716 2
STE 9 15 15 121313 
Other14 24 17 201813 5
No orthologs to S. cerevisiae
Lac 6 16  0  0 0 016
Copr 2  0  4  0 0 0 
Phan 2  0  0  7 0 0 
Lac/Copr 7  7  7  0 0 0 
Lac/Phan 2  2  0  2 0 0 
Copr/Phan 1  0  2  1 0 0 
Lac/Copr/Phan 4  6  5  4 0 0 3
Lac/Copr/Ust 1  2  1  0 0 1 
Copr/Phan/Ust 1  0  1  1 0 1 
Lac/Copr/Phan/Cryp 3  3  3  3 3 0 
Lac/Copr/Phan/Ust 2  2  2  2 0 2 
Lac/Copr/Phan/Cryp/Ust 8  9  9  9 8 8 
Not clustered  13  6  7 5 3 
Total95140109107847226

Twenty-six of the 140 Laccaria proteins in Family-2 were identified as paralogs, that is they are recent duplicates that have arisen after the split between L. bicolor and C. cinerea. The paralogs were distributed into ten subgroups containing two to five paralogs (Fig. 3a). Three of these subgroups contained orthologs from S. cerevisiae: ‘Other/IKS’, ‘Other/PLK’ and ‘MAPK/ERK’. Sixteen of the Laccaria paralogs appeared to be truncated in their catalytic domains and probably represent pseudogenes (Table S1). This number is significantly larger than expected (χ2-test, P < 0.001), as deduced from the overall frequency of pseudogenes within Family-2 (in total 42 pseudogenes).

image

Figure 3. Distribution of paralogs in different subgroups of protein kinases and small RAS guanosine triphosphatases (GTPases) of Family-2 and Family-6, respectively. Subgroups indicated by ‘*’ were subjected to evolutionary analysis (Figs 2c,d, 4b,c). Underlined subgroups contain a Saccharomyces cerevisiae ortholog. Pseudogenes (grey bars) were recognized as those having truncated catalytic regions. Nonpseudogenes, black bars. (a) Family-2. In total 26 paralogs were identified in this family. Subgroup 4 is the ‘Other/IKS’, 5 the ‘Other/PLK’ and 6 the ‘MAPK/ERK’ subgroup [belongs to the cyclin-dependent kinases (CMGC)]. (b) Family-6. In total 26 paralogs were identified in this family. Subgroup 1 is the RHO1-like subgroup, 2 the RAS2-like subgroup and 3 the RAS1-like subgroup.

Download figure to PowerPoint

The paralogs of the ‘Other/IKS’ and ‘Other/PLK’ subgroups were examined in more detail. Two of the three Laccaria paralogs (gene model IDs 299584 and 299595) present in the ‘Other/IKS’ subgroup displayed high sequence similarity to the yeast IKS1 ortholog, particularly along residues predicted to be located within the catalytic region (Fig. 2c) (Table S2, see Supporting Information). The gene model for the third Laccaria IKS1-like paralog (ID 336005) was found at the terminus of an assembled scaffold, and hence truncated in a region corresponding to the C-terminus. The 336005 protein displayed 100% sequence identity to the 299584 protein, and the two models could possibly represent the same gene.

The two Laccaria paralogs (IDs 161489 and 160346) that were identified within the ‘Other/PLK’ subgroup were found to encode significantly shorter gene products (232 amino acid residues) than the yeast PLK ortholog (CDC5) (705 amino acid residues) and another PLK-like protein kinase of L. bicolor (ID 191474, 853 amino acid residues) (Fig. 2d) (Table S2). Although, several of the conserved sites of the protein kinase domain (PF00069) could be identified in the 161489 and 160346 paralogs, they appeared to lack regions corresponding to the ATP binding pocket, as well as the polo box domain (PF00659). Furthermore, these paralogs were detected not to be expressed above the background level. On the basis of these findings, the two Laccaria paralogs of the ‘Other/PLK’ subgroup were considered to be pseudogenes.

A substantial number of Family-2 Laccaria genes were found to be expressed on the basis of the microarray (136 of 140) and EST (67 of 140) data. Within the subset of paralogs, 22 and five genes, respectively, were indicated to be expressed by the microarray and EST analyses (Table S1). According to the EST data, Family-2 contained 15 expressed pseudogenes, 10 of which contained homologs among the five conserved epK families identified in S. cerevisiae (Table S1).

Expansion of RAS small GTPases in Family-6

Family-6 contained 55 Laccaria proteins with the PF00071 RAS domain. On the basis of a phylogenetic analysis, these sequences could be grouped into 29 subgroups (Fig. 4a). Fourteen of these (represented by 20 Laccaria proteins) contained orthologs of the RAS, RHO, RAB and RAN subfamilies of the GTPases of S. cerevisiae. These subfamilies represent four of five main subfamilies of small GTPases so far characterized among eukaryotes (Wennerberg et al., 2005). The remaining 35 Laccaria sequences did not have any yeast ortholog (Tables 2, S3).

Table 2.  Classification of small guanosine triphosphate (GTP)-binding proteins in Family-61
GroupSubgroupsNumber of proteinsParalogs2
LacCoprPhanCrypUst
  • 1

    In total, Family-6 consists of 160 proteins containing the RAS domain PF00071. These proteins were classified into a number of subgroups based on a phylogenetic analysis [neighbor joining (NJ) tree]. A subgroup is defined as a clade of sequences having a bootstrap support value ≥ 50 (phylogenetic tree shown in Fig. 4a). ‘Orthologs in S. cerevisiae’ includes subgroups (clades) containing sequences of small GTP-binding proteins from yeast (cf. Materials and Methods). ‘No orthologs in S. cerevisiae’ comprises subgroups that do not contain any yeast small GTP-binding proteins. The ‘Not clustered’ small GTP-binding proteins include those that are located outside any of the clades identified in the NJ tree. Six of the small GTP-binding proteins in yeast (YPT53, YPT10, YPT11, YPT52, RHO4 and RHO5) were found outside these clusters. Copr, Coprinopsis cinerea; Cryp, Cryptococcus neoformans; Lac, Laccaria bicolor; Phan, Phanerochaete chrysosporium; Ust, Ustilago maydis.

  • 2

    Number of L. bicolor paralogs identified in each subgroup.

Orthologs in Saccharomyces cerevisiae
RAS
RAS1/RAS2 1 6 1 1 1 1 4
RSR1 1 1 0 1 0 1 
RHEB 1 2 2 1 1 1 
RHO
RHO1 1 1 1 1 1 1 
RHO2 1 1 1 1 1 1 
RHO3 1 1 1 1 1 1 
CDC42 1 1 1 1 2 1 
RAB
YPT1 1 1 1 1 1 1 
YPT6 1 1 1 1 1 1 
YPT7 1 1 2 1 1 1 
YPT31/YPT32 1 1 1 1 1 1 
SEC4 1 1 1 1 1 1 
VPS21 1 1 1 1 1 1 
RAN       
GSP1/GSP2 1 1 1 1 1 1 
No orthologs in S. cerevisiae
Lac 416 0 0 0 016
Copr 1 0 2 0 0 0 
Lac/Copr/Phan/Ust 3 3 3 3 0 3 
Lac/Copr/Phan/Cryp/Ust 713 7 7 8 7 6
Not clustered  3 2 3 2 1 
Total29552927242526

Twenty-six of 55 Laccaria gene models in Family-6 were identified as paralogs. These paralogs clustered into six subgroups (Fig. 3b). The majority were found within three large subgroups encoding RAS1-, RAS2- and RHO1-like GTPases. The RAS1-like subgroup contained six Laccaria gene models including four paralogs. In addition, this clade contained RAS1 and RAS2 of S. cerevisiae, and RAS1 homologs from other basidiomycetes (Fig. 4b, bottom clade). Three of the six Laccaria gene models of this cluster (ID 147993 and the paralogs 297047 and 297059) appeared to be truncated in regions corresponding to catalytic domains and also contained frameshift mutations and premature stop codons, and were classified as pseudogenes. By contrast, two other paralogs (IDs 249517 and 249795), as well as the gene model 190576, were found to encode gene products with complete catalytic sites. In addition, the C-terminal of these polypeptides contained a lipid modification site, ‘CaaX’ (where ‘a’ represents an aliphatic amino acid residue and X any amino acid residue) (Table S5, see Supporting Information). This tetrapeptide motif is found in many small GTPases and functions as a substrate for lipid modifications that increase protein hydrophobicity and facilitate membrane associations (Wennerberg et al., 2005).

The RAS2-like subgroup of Laccaria paralogs formed a sister clade to the RAS1-like subgroup (Fig. 4b). The RAS2-like subgroup contained seven Laccaria gene models, including six paralogs and homologs from C. cinerea, P. chrysosporium, C. neoformans and U. maydis. The close functional relationships between these proteins were indicated by the fact that all contain sequences encoding a catalytic site (cd0414) of the RAS2 subfamily of small GTPases (Table S4, see Supporting Information). The first member of the RAS2 subfamily was characterized in U. maydis (AY149916 in Fig. 4b) (Müller et al., 2003). The RAS2 subfamily is not represented in S. cerevisiae. Two of the six Laccaria paralogs of the RAS2-like subgroup (IDs 150140 and 148268) were found to encode relatively short gene products, with truncated catalytic sites and no CaaX C-terminal motif, and were considered as pseudogenes (Table S4).

The RHO1-like subgroup of Laccaria paralogs formed a sister clade to a cluster containing classical RHO1 proteins, including RHO1 of S. cerevisiae (Fig. 4c). The RHO1-related paralogs had a catalytic region similar to RHO1 GTPases (cd01870), as well as the so-called RHO insertion of 12 amino acid residues located between the G4 and G5 boxes, which distinguish RHO proteins from other small GTPases (Freeman et al., 1996) (Table S6, see Supporting Information). The CaaX C-terminal motif could not be identified in any of the RHO1-like paralogs. Furthermore, several were significantly longer than classical RHO1 proteins as a result of a C-terminal extension beyond the conserved GTPase domain (Wennerberg & Der, 2004). The RHO1-related Laccaria GTPases were clustered into three distinct clades, GI–GIII (Fig. S1, see Supporting Information). The seven paralogs of GI (IDs 154172, 148214, 169236, 168945, 336069, 248037 and 168862) shared a conserved C-terminal extension of 120–121 residues. The sequence identity of the C-terminal regions was in the range 85–100%, compared with a slightly larger range of identity for the catalytic domain (71–100%). No motifs or domains were identified in the conserved C-terminal sequence using the SMART tool (http://smart.embl-heidelberg.de) or the Profile Scan Server (http://hits.isb-sib.ch/cgi-bin/PFSCAN). For the GII clade, the C-terminal region of three paralogs (IDs 163355, 163181 and 333548) did not display any sequence similarity to GI proteins. The GII sequences varied extensively in terms of both length and composition. The two paralogs (IDs 154221 and 148377) of the GIII cluster were classified as pseudogenes.

In total, Family-6 contained 11 pseudogenes, seven of which were paralogs. Hence, pseudogenes were slightly over-represented in the cohort of paralogs when compared with their total abundance within Family-6 (χ2-test, P < 0.001). Microarray and EST data supported the expression of 36 (including eight paralogs) and 25 (including three paralogs) gene models, respectively, within Family-6. Among these, seven were classified as pseudogenes. One of these (ID 250961) contained a RHEB catalytic region and another a RHOA catalytic region (Table S3).

Evolutionary mechanisms

To obtain an insight into the evolutionary mechanisms generating the divergence of protein kinases and small GTPases, the changes in protein motif structures were examined among paralogs and ancestors within the ‘Other/IKS’ and ‘Other/PLK’ subgroups of Family-2, as well as among the RAS1-, RAS2- and RHO1-like proteins of Family-6. Ten recent duplication events were identified and duplicates were located in the terminal branches of the phylogenetic trees, as well as 11 ancient duplications that incorporated at least one ancestral sequence. On the basis of the gain, loss and retention of the original motifs of the protein-coding regions, seven cases of NF, two of SNF and 12 of ‘Retention’ were identified (Table 3). The analysis of the motif organization did not reveal any cases of SF, that is where the original motif had been degenerated and partitioned between the gene copies.

The divergence in expression pattern for paralog pairs formed by the recent duplication events was inferred by the analysis of microarray data. The analyses included data for eight different samples representing three different tissues (FBs, ECM root tips and mycelia). Evidence for expression divergence, possibly caused by alterations in regulatory regions, was found among the paralogs of three pairs (Table 3). These paralogs also showed sequence divergence in motif structures. Changes in the expression patterns for the remaining paralogs could not be examined because of potential cross-hybridizations.

The driving force for the evolutionary changes of the duplicates was investigated by calculating the rates of nucleotide substitutions at nonsynonymous (dn) and synonymous (ds) sites of recent duplicates. A dn/ds ratio larger than unity is indicative of positive selection (Zhang et al., 1998). Using this criterion, positive selection was detected in two pairs of the compared gene duplicates belonging to the IKS1- and RHO1-like subgroups, respectively. In both pairs, the dn/ds ratio was larger in regions corresponding to the C-terminus, and outside regions encoding the conserved catalytic domains (Fig. S2, see Supporting Information).

Because the ME and MP trees were less well-resolved than the NJ tree, five of the 21 duplication events identified in the NJ clades of ‘Other/IKS’, ‘Other/PLK’, RAS1-, RAS2- and RHO1-like proteins were not identified in the ME and MP trees (Table 3) (Fig. S3, see Supporting Information).

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Family-2 of protein kinases and Family-6 of small GTPases are among the largest gene families in L. bicolor (Martin et al., 2008). The phylogenetic analysis showed that the two families have significantly expanded along the lineage leading to L. bicolor. The number of recent duplicates, that is paralogs, was similar in the two families. However, the pattern of duplication events and the fate of the formed gene copies giving rise to the lineage-specific expansions were notably different in Family-2 and Family-6. The protein kinase paralogs of Family-2 were dispersed in many small clades, and the majority appeared to be pseudogenes. By contrast, the RAS small GTPase paralogs of Family-6 could be clustered into three large clades and the proportion of pseudogenes was smaller than in Family-2.

In L. bicolor, the expanded RAS1 and RAS2 clades contained six and seven gene models, respectively. In comparison, the two clades contained only one RAS1 and one RAS2 gene from C. cinerea, P. chrysosporium, U. maydis and C. neoformans. The RAS1 and RAS2 genes of U. maydis and C. neoformans have been well characterized and fulfil different functions in the regulation of growth, morphology, mating and pathogenicity (Alspaugh et al., 2000; Lee & Kronstad, 2002; Waugh et al., 2002; Müller et al., 2003). The only RAS gene that has yet to be functionally characterized in L. bicolor is the Lbras gene which corresponds to a sequence (ID 190576) in the RAS1 clade. The Lbras gene complements the RAS2 function in S. cerevisiae, and its expression has been shown to be dependent on interactions with the plant host (Sundaram et al., 2001). The analysis of microarray data shows that the gene model 190576 is among the most highly expressed RAS genes in L. bicolor when grown in association with a host plant (ECM tissue), but also when grown as a saprophyte or in FB tissue (Fig. 4b). Because of problems of cross-hybridizations, microarray data could only be obtained for five of the 12 additional Laccaria genes identified within the RAS1- and RAS2-like subgroups. Three were expressed above background levels (Fig. 4b).

The RHO1-like paralogs in the expanded clade varied extensively in sequence, but all were atypical RHO GTPases. Nonclassical RHO proteins with extended C-terminal regions and devoid of the CaaX motif have been characterized in several eukaryotes (Rivero et al., 2001; Wennerberg & Der, 2004; Boureux et al., 2007). The functions of these atypical RHO GTPases are not well known. Some appear to have signaling properties that overlap with those of classical RHO GTPases, such as RHO1 and Cdc42. Thus, they control signal pathways that affect actin reorganization and regulate cell shape, polarity, adhesion and membrane trafficking. Others have functions distinct from the classical RHO proteins (Wennerberg & Der, 2004). The fact that the RHO1-related paralogs of Laccaria varied extensively in expression patterns and sequence motifs suggests that they have a number of different cellular functions.

The proportion of pseudogenes among the recent duplicates (paralogs) was higher than that among the older duplicates (nonparalogs) within Family-2 and Family-6. Hence, nonfunctionalization appears to be a common fate for the young duplicates of protein kinases and small GTPases. This outcome can be expected according to the classical model of gene duplications (Ohno, 1970). Furthermore, it has been observed that the frequency of pseudogenes is particularly high in large and lineage-specific protein families with a function relating to environmental responses (Harrison & Gerstein, 2002). Pseudogenes within such families are expected to be important for facilitating the adaptation to changing environments by allowing a temporary relaxation from selection pressure. The pseudogene may subsequently be resurrected if a favorable mutation occurs that increases the organism's fitness (Harrison et al., 2002).

Notably, several of the identified protein kinase and small-GTPase pseudogenes of L. bicolor were found to be expressed. Expressed protein kinases with truncated catalytic regions have been identified in the genome of several eukaryotes, including mouse and human (Caenepeel et al., 2004), and pseudogenes of RHO-related proteins have previously been reported in the genome of Dictyostelium discoideum (Rivero et al., 2001). However, to our knowledge, this is the first report on protein kinase and small GTPase pseudogenes in fungi.

NF (seven cases) was found to be a more valid model than SF (not detected) and SNF (two cases) in explaining the evolutionary events leading to prolonged preservation and functional divergence of the Laccaria protein kinases and RAS small GTPases. This is in agreement with results from studies on the evolution of gene duplicates in S. cerevisiae. A significant number of the duplicates in yeast have evolved according to the NF model (Papp et al., 2003; He & Zhang, 2005; Byrne & Wolfe, 2007; MacCarthy & Bergman, 2007). Several of these studies have proposed that the initial mechanisms of preservation of the gene duplicates can be different from those leading to prolonged NF. Such initial mechanisms include the partitioning of the ancestral functions of the gene copies according to the SF model, and the retention of the copies because of advantageous dosage effects (He & Zhang, 2005). A common explanation for the prevalence of NF in yeast is that the degenerative mutations that are assumed within the SF/duplication–degeneration–complementation model are not likely to become fixed by genetic drift in large-sized populations (Lynch et al., 2001). Based on a modeling approach, MacCarthy & Bergman (2007) proposed that NF and redundancy (retention) are more prevalent than SF in proteins that are components of gene networks. Thus, many of the mutations of proteins occurring in such networks are neutral, that is without phenotypic effects. However, the mutations will not be neutral indefinitely. Eventually, such mutations become visible to natural selection in certain environments, and may lead to NF and evolutionary innovations (Wagner, 2005).

Expansion of gene families involved in signal transduction pathways has been observed in several other organisms including animals and plants. In most cases, the expansion has been associated with whole-genome duplications (Blanc & Wolfe, 2004; Blomme et al., 2006; Freeling & Thomas, 2006). The gene balance hypothesis suggests that genes involved in complex functions (e.g. signal transduction and protein complexes), in which many proteins interact, are predominantly maintained after whole-genome duplications (Freeling & Thomas, 2006). The loss of one such gene within a functional module leads to an imbalance in the stoichiometry between the interacting genes and is therefore selected against. Small-scale duplications of single genes involved in such complex functional modules will lead to an imbalance and will thus be removed through purifying selection. Notably, there is no evidence for whole-genome duplications nor for larger blocks of segmental duplications in the L. bicolor genome (Martin et al., 2008). Sterck et al. (2007) have suggested that genes involved in interactions between organisms (responses to biotic stimuli) have a higher frequency of retention regardless of the mode of duplication because of the constant need for rapid adaptation (Sterck et al., 2007). Thus, it is tempting to speculate that the large number of protein kinases and RAS small GTPase have been retained in the L. bicolor genome as a result of positive selection imposed by the symbiotic interactions with plants.

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

This study was supported by grants from the Swedish Research Council (VR). We wish to thank the Joint Genome Institute (US Department of Energy) and the Laccaria Genome Consortium for access to the L. bicolor genome sequence prior to publication. The EST sequencing and transcriptional analysis were funded by the US Department of Energy, INRA ‘AIP Séquençage’, the European network of excellence EVOLTREE and by Région Lorraine grants. We also thank the referees for valuable comments on the manuscript.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Table S1 Classification, paralogs, catalytic domains, expression and pseudogenes of protein kinases within Family-2

Table S2 Conserved sites identified in the catalytic domain of the IKS1-like and PLK-like protein kinases

Table S3 Classification, paralogs, catalytic domains, expression and pseudogenes of small GTPases within Family-6

Table S4 Conserved sites identified in the catalytic domain of the RAS2-like GTPases

Table S5 Conserved sites identified in the catalytic domain of the RAS1-like GTPases

Table S6 Conserved sites identified in the RhoA domain of RHO1-like GTPases

Fig. S1 Sequence alignments of the expanded clade of RHO1-like GTPases.

Fig. S2 Nonsynonymous to synonymous substitution ratio (dn/ds) of recently duplicated Laccaria paralogs of the ‘Other/IKS’ and ‘RHO1-like’ subgroups.

Fig. S3 Phylogenetic reconstruction of protein kinases and RAS small GTPases using the minimum evolution and maximum parsimony methods.

Please note: Wiley–Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.

FilenameFormatSizeDescription
NPH_2860_sm_TablesS1-S6.doc599KSupporting info item
NPH_2860_sm_FigS1.doc33KSupporting info item
NPH_2860_sm_FigS2.ppt405KSupporting info item
NPH_2860_sm_FigS3.ppt250KSupporting info item