Evolution of nucleotide sequences and expression patterns of hydrophobin genes in the ectomycorrhizal fungus Paxillus involutus


Author for correspondence: Anders Tunlid Tel: +46 46 2223757 Fax: +46 46 2224158 Email: Anders.Tunlid@mbioekol.lu.se


  • • Hydrophobins are small, secreted proteins that play important roles in the development of pathogenic and symbiotic fungi. Evolutionary mechanisms generating sequence and expression divergence among members in hydrophobin gene families are largely unknown.
  • • Seven hydrophobin (hyd) genes and one hyd pseudogene were isolated from strains of the ectomycorrhizal fungus Paxillus involutus. Sequences were analysed using phylogenetic methods. Expression profiles were inferred from microarray experiments.
  • • The hyd genes included both young (recently diverged) and old duplicates. Some young hyd genes exhibited an initial phase of enhanced sequence evolution owing to relaxed or positive selection. There was no significant association between sequence divergence and variation in expression levels. However, three hyd genes displayed a shift in the expression levels or an altered tissue specificity following duplication.
  • • The Paxillus hyd genes evolve according to the so-called birth-and-death model in which some duplicates are maintained for a long time, whereas others are inactivated through mutations. The role of subfunctionalization and/or neofunctionalization for preserving the hyd duplicates in the genome is discussed.


Duplications of genes or larger chromosome regions in combination with mutations that cause functional divergence of the duplicates is considered to be the most important mechanisms generating evolutionary novelties, including new gene functions and expression patterns (Ohno, 1970). Such duplications have most likely played a substantial role both in the rapid change in organismal complexity apparent in deep evolutionary split, and in the adaptation and diversification of more closely related strains or species (Prince & Pickett, 2002; Long et al., 2003). Analyses of genome sequences suggest that gene duplications arise by very high rates. Some of these new genes are preserved, but a majority are silenced within a few millions of years (Lynch & Conery, 2000). Accordingly, there is a relatively narrow time window for evolutionary explorations until gene inactivation becomes the most likely outcome. The fate of the recent duplicated genes and the evolutionary forces that drive their fixation and divergence are, however, not yet clear (Long et al., 2003).

Recently, we have used DNA microarrays to screen for duplicated and rapidly evolving genes that could be associated with symbiotic adaptations in the ectomycorrhizal (ECM) fungus Paxillus involutus (Basidiomycetes; Boletales). Strains of P. involutus with various abilities to form ECM were analysed by comparative genomic hybridizations using a cDNA microarray containing 1076 putative unique genes. Approximately 17% of the gene representatives available on the array were detected as rapidly and presumably nonneutrally evolving within Paxillus (Le Quéréet al., 2006). Among them were several genes encoding hydrophobins. Hydrophobins are small, secreted proteins that can self-assemble and form visible aggregations of protein rodlets on fungal surfaces (Kershaw & Talbot, 1998; Wösten, 2001). Hydrophobins are known to play a role in a range of different processes related to growth and development in fungi. For example, hydrophobins are involved in the formation of aerial structures such as spores and fruiting bodies, they can mediate adhesion of pathogenic fungi to plant host surfaces and they can function as toxins (Kershaw & Talbot, 1998; Wösten, 2001). Hydrophobins have also been shown to be developmentally regulated by ECM-forming fungi during the infection of the host plant (Tagu et al., 1996; Mankel et al., 2002; Duplessis et al., 2005; Le Quéréet al., 2005). Hydrophobin genes are commonly found in multigene families. The members in these families display a large divergence in nucleotide sequences and expression patterns (Wessels et al., 1991; Segers et al., 1999; Duplessis et al., 2001).

In this study we have examined the evolutionary mechanisms that could be responsible for generating sequence and expression divergence among members of the hydrophobin gene family in P. involutus ATCC 200175. Seven hydrophobin (hyd) genes were characterized. Orthologs were isolated from several Paxillus strains and one closely related species (Le Quéréet al., 2004) and compared with available sequences from other fungi using phylogenetic analysis and tests for selection. Gene expression patterns were inferred using data from several cDNA microarray experiments.

Materials and Methods

Fungal cultures and DNA extractions

Five strains of P. involutus (Batsch: Fr.) and one closely related species Paxillus filamentosus (Table 1) were grown on cellophane-covered agar plates and liquid medium, and DNA was prepared as previously described (Le Quéréet al., 2002). The obtained DNA extracts were finally treated with RNase A (Promega, SDS Biosciences, Falkenberg, Sweden).

Table 1.  Fungal strains used in this study
Species/strainAbbreviationSite and mycorrhizal hostOriginReferences
  1. ECM, ectomycorrhiza.

Paxillus involutus
 ATCC 200175ATIsolated close to birch trees. Forms ECM with birch, pine, spruce and poplar (in the laboratory)ScotlandChalot et al. (1996)
 Pi01SeSeIsolated from a pine forestSwedenS. Erland (unpublished)
 Pi08BeBeForms ECM with pine, spruce and poplar (laboratory)BelgiumBlaudez et al. (1998)
 MajMjIsolated close to poplar trees. Forms ECM with poplar and birch (laboratory)FranceGafur et al. (2004)
 NauNuIsolated close to oak trees. Does not form ECM with poplar or birch but with oak (laboratory)FranceGafur et al. (2004)
Paxillus filamentosus
 Pf01DePfIsolated close to alder treesGermanyJarosch & Bresinsky (1999)

Isolation of hydA to hydG

Putative hyd genes were identified in a P. involutus ATCC 200175 expressed sequence tag (EST) database. The database contains 19 188 ESTs (assembled into a uniset of 3700 fungal gene representatives) originating from 11 different cDNA libraries and have partly been published (Johansson et al., 2004). The uniset can be estimated to correspond to approx. 50% of the total number of genes in P. involutus, assuming a gene content of 7700 (Le Quéréet al., 2002). The cDNA libraries represent material harvested from different tissues and under different growth conditions including saprophytically growing mycelium and ECM root tips associated with birch (Betula pendula; Johansson et al., 2004), mycelium, cords and ECM root tips growing in soil (D. Wright et al. unpublished) and mycelium growing in liquid cultures with different nutrient composition (M. Caillau et al. unpublished). Homology searches of the EST sequences were performed against public databases using the blastx program (Altschul et al., 1990). Within this database seven contigs displaying a hydrophobin signature motif (Kershaw & Talbot, 1998) were identified.

For the complete sequencing of cDNAs corresponding to these hydrophobin genes (designated hydA to hydG (Table 2), for gene nomenclature refer to the Supplementary Material, Table S3), plasmid clones were retrieved from the collection of EST clones (Johansson et al., 2004). Plasmids were prepared and the inserts were amplified and sequenced using vector-specific or template-specific primers (see the Supplementary Material, Table S1) as previously described (Wright et al., 2005). Overlapping sequences were aligned and trimmed using the sequencher software (v. 3.1.1b4) (Gene Codes Corp., Ann Arbor, MI, USA).

Table 2.  Characterization of hydrophobin genes and their gene products in Paxillus involutus ATCC 200175
GeneaLength (bp)Exonsb (bp)Introns (bp)Number of intronsGCc (%)Protein (aa)ESTs (library)dMean expression level (log2)eStrainsf
  • EST, expressed sequences tag.

  • a

    Accession numbers are given in the Supplementary Material, Table S3.

  • b

    The boundaries of exons and introns were identified by comparing genomic and cDNA sequences.

  • c The GC content in exons (GC) were calculated using the dambe-4.13 program (Xia & Xie, 2001).

  • d

    Total number of ESTs clustered for each gene. The numbers in the brackets refer to the distribution of ESTs among 11 cDNA libraries.

  • e Means of relative expression levels (log2-transformed) calculated from 25 microarray experiments (Table 3). NA, No data available because the array did not contain reporters for hydG.

  • f Total number and names of Paxillus strains (cf. Table 1) in which orthologs of hydA to hydG were amplified (cf. Figure 2). Accession numbers for all sequences are given in the Supplementary Material, Table S3.

hydA929426169357141 23 (1) 0.786 (AT,Be,Se,Mj,Nu,Pf)
hydB637336113263111114 (9) 0.724 (AT,Be,Se, –,Nu, –)
hydC779324116253107 21 (5) 0.546 (AT,Be,Se,Mj,Nu,Pf)
hydD607327112261108  5 (1)−1.155 (AT,Be,Se,Mj,Nu, –)
hydE614324107259107 16 (3) 2.583 (AT, –, –,Mj,Nu, –)
hydF602351107260116  1 (1)−0.596 (AT,Be,Se,Mj,Nu,Pf)
hydG596327  0058108  2 (2)NA1 (AT, –, –, –, –, –)

Phylogenetic analyses

Putative homologues for hydA to hydG were identified by querying the GenBank database (http://www.ncbi.nlm.nih.gov) and genome sequences of Coprinopsis cinerea (Coprinus cinereus, Sequencing Project, Broad Institute of MIT and Harvard, http://www.broad.mit.edu) using blastn search algorithm. Nucleotide sequences (in total 82) with an E-value ≤ 10e-5 were translated into polypeptide sequences which were then aligned using the muscle (version 3.6) (Edgar, 2004) and bioedit version (http://www.mbio.ncsu.edu/BioEdit/bioedit.html) softwares. Sequences that were shorter than expected, or without identified start and stop codons, or not containing codons for all the eight cysteine residues according to the hydrophobin signature motif (Kershaw & Talbot, 1998), or sequences distantly related to hydA to hydG were excluded from further analysis. A final set of 26 protein sequences remained after filtering. These sequences together with hydA to hydG and a hydrophobin from Emericella nidulans (outgroup) (cf. Figure 1) were realigned using muscle. The resulting protein alignment was used as a template to align the corresponding nucleotide sequences. In the nucleotide alignment, codons upstream to the first cysteine codon (i.e. upstream to the hydrophobin core motif; Wösten, 2001), along with gaps and a few ambiguous sites were removed before the phylogenetic analysis. The final alignment containing 201 bp were used to construct a phylogenetic tree by the Maximum Likelihood method in paup* software (version 4.0b8) (Swofford, 1998). The software modeltest (Posada & Crandall, 1998) was used to evaluate the appropriate models and parameters.

Figure 1.

Phylogenetic relationships between hydrophobins from Paxillus involutus ATCC 200175 and those from other basidiomycetes. The maximum likelihood tree shows values at the nodes indicating the bootstrap support in percent for 100 replicates (only values > 50 are shown). Accession numbers for the nucleotide sequences are from top to bottom: AJ319663, Scaffold 6* (292824-293084, 293144-293185), Y10627, X90818, AJ225061, AF217808, M32329, AB126686, Y16881, Y10628, AB079128, AB079129, AB079130, AJ225060, Scaffold 25* (103205-103417, 103474-103515), Scaffold 24* (134686-104898, 104988-135029), Scaffold 18* (76134-75859, 75794-75750), Scaffold 7* (95562-95287, 95225-95184), U29606, AF097516, AY048578, U29605, Y15940, AJ007504, M32330 and M61113 (Emericella nidulans) as an outgroup for rooting the tree. Asterisks (*) indicate sequences obtained from Broad Institute (http://www.broad.mit.edu/annotation/). The shaded boxes indicate the seven hydrophobin genes (hydA to hydG) from P. involutus ATCC 200175 and their accession numbers are given in Table 2 and supplementary material Table S3.

The evolutionary relationships between the seven P. involutus ATCC 200175 hyd genes were analysed further using the SplitsTree method (Huson, 1998; see the Supplementary Material, Fig. S2).

Analysis of hydrophobin gene fragments in Paxillus strains

Based on the hydrophobin cDNA sequences obtained from the P. involutus ATCC 200175 (hydA to hydG), new primers were designed (see the Supplementary Material, Table S1) to amplify the corresponding genomic regions of hydA to hydG in various strains of P. involutus and P. filamentosus. The amplicons were ligated into the pGEM-T Easy Vector System I (Promega) and transformed into Escherichia coli DH5α. For each amplicon/primer pair, several clones were randomly picked and in total 98 genomic fragments were sequenced as already described. The sequences of the putative hydAhydG genes were aligned using muscle and bioedit software. The identity between the aligned sequences were estimated using matgat software (v 2.02) (Campanella et al., 2003). Sequences showing pairwise identities of > 99% to existing sequences and four truncated sequences identified as pseudogenes were removed. The phylogenetic relationships of the remaining 48 gene fragments were analysed by constructing a neighbour-joining (NJ) tree (Galtier et al., 1996). The phylogeny was constructed using a 50%-majority rule consensus of 1000 neighbour-joining bootstrap replicates, adjusted with the Kimura-2 model. Accession numbers of the amplified gene fragments are given in the Supplementary Material (Table S3).

The rates of nonsynonymous (dn) and synonymous (ds) nucleotide substitutions per site for each of the hydA to hydG orthologous groups were estimated separately using the crann software (Creevey & McInerney, 2003). The hydrophobin gene (Sc4) of Schizophyllum commune was used as outgroup.

Analysis of expression data

The expression patterns for six (hydA to hydF) of the seven P. involutus ATCC 200175 hyd genes (hydG was excluded because no reporter was available on the array) were examined by collecting processed relative expression levels from dual-label cDNA-microarray experiments (each including dye swaps and at least three biological replicates) from a number of studies (cf. Table 3) which included 25 different treatments covering a wide range of developmental and physiological conditions (Table 3 and Fig. 5). The microarray design, the sample preparation, the hybridization procedures and the statistical analyses of the array data have previously been described in detail by Johansson et al. (2004) and Le Quéréet al. (2005), and involved normalization and statistical analysis (mixed-model analysis of variance (anova)) as described by Wolfinger et al. (2001). From those analyses we retrieved the estimated relative expression level (log2), for each hyd gene and in each treatment, for the construction of transcriptional profiles (Fig. 5).

Table 3.  Microarray experiments of the Paxillus involutus ATCC 200175
Growth conditions and tissuesArrayaReferences
  • ECM, ectomycorrhiza.

  • a Two different batches of cDNA microarrays (Print 85 and 154) were used in the experiments. Both arrays were printed with reporters obtained from a nonredundant set of expressed sequences tag (EST) clones originating from the P. involutus ATCC 200175. Each reporter was replicated in at least quadruplicates on the array. Print 85 contained reporters for hydA to hydD and Print 154 contains reporters for hydA to hydF. A full description of the Print 85 array design is available from the EMBL-EBI ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/; Accession number A-MEXP-92) whereas the Print 154 array design is in preparation for submission.

  • b

    Treatments indicated by an asterisk (

  • *)

    are ECM tissues from associations with birch (Betula pendula).

P. involutus in association with birch (Betula pendula) in soil microcosms (•):Print 85Wright et al. (2005)
 CORCords (rhizomorphs)
 PCHExtramatrical mycelium growing in nutrient patch
 TIP*ECM root tips
Mycelium colonizing nutrient patches in soil microcosms. The following nutrients were added to the patches (•):Print 154Wright et al. (unpublished)
 APOAmmonium phosphate
 ASUAmmonium sulphate
 BSABovine serum albumin
Mycelium grown on agar with the following nutrient amendments (○):Print 154Caillau et al. (unpublished)
 BAmmonia + patch of phosphate
 CAmmonia + patch of phosphate + birch seedlings
 DComplete medium
 EComplete medium + birch seedlings
 FModified complete medium + birch seedlings
 GPhosphate + patch of ammonia
 HPhosphate + patch of ammonia + birch seedling
P. involutus in association with birch (Betula pendula) grown on agar substrate (○):Print 85Le Quéré et al. (2005)
 D02MYC*ECM root tips developed after 2, 4, 8, 14 and 21 d, respectively,
 D02REF Mycelium grown axenically for 2, 4, 8, 14 and 21 d, respectively,
Figure 5.

Transcriptional profiles for six hyd genes (hydA to hydF) from Paxillus involutus ATCC 200175 based on relative expression levels retrieved from a number of different dual-label cDNA microarray studies as listed in Table 3. Columns represent 25 different treatments (Table 3) and are covering a wide range of developmental and physiological conditions. Treatments indicated by an asterisk (*) are ectomycorrhizal (ECM) tissues from associations with birch (Betula pendula). The mycelia grown in soil microcosms (•) and on agar substrates (○) are indicated with respective symbols. The scale shows relative expression levels on a log2 scale (c.f. Materials and Methods).

The expression divergences for all hyd gene pairs were estimated by calculating the Euclidean distances of the log2-transformed expression levels (residuals) using data from 13 of the treatments listed in Table 3 (designated A to H, APO, ASU, BSA, CHI and GLN). These distances were related to the corresponding pair-wise nucleotide sequence divergence measured either as ds or as amino acid (aa) sequence distances (d); ds was calculated as described earlier. hydC was excluded from the analysis because of significant higher substitution rates at the second and third codon positions in comparisons to the other P. involutus ATCC 200175 hyd genes. The PROTDIST program from the phylip-3.64 package was used to estimate d from the alignment using the Dayhoff PAM matrix along with default parameters.

The estimated log2 relative expression levels of hydA to hydF were subjected to principal component analysis (PCA) using the Multivariate Statistical Package MVSP (Kovach, 1998).


Characterization of hydA to hydG

Seven putative hydrophobin genes designated hydA to hydG, were identified within a collection of EST clones originating from the P. involutus ATCC 200175. The corresponding cDNA sequence information were translated into polypeptide sequences and showed predicted sizes ranging between 107 and 141 aa residues (Table 2). All polypeptides were predicted to contain an N-terminal signal peptide as well as eight conserved cysteine residues, which are characteristic features of fungal hydrophobins (see the Supplementary Material, Fig. S1).

Three distinct gene structures were observed in the hyd genes. The hydA gene contained three introns, hydB to hydF genes contained two introns whereas the hydG gene consisted of a single uninterrupted exon (Table 2). The nucleotide sequences of hydA to hydG varied extensively. The hydD and hydG (88%) showed the highest pairwise identity followed by the pair hydB and hydE (76%). The nucleotide sequence of hydC appeared most divergent and displayed low identities (53–56%) against all the other P. involutus ATCC 200175 hyd genes (see the Supplementary Material, Table S2). In addition, hydC had a considerable lower GC content in its exon (53%) compared with that of other P. involutus ATCC 200175 hyd genes (57–63%) (Table 2).

Phylogeny of hydA to hydG

A phylogentic analysis showed that the hydA to hydG genes from P. involutus ATCC 200175 were found among genes of the Class I group of hydrophobins found in basidiomycetes (Wessels, 1997) (Fig. 1). The hydD, hydG, hydB and hydE clustered into one clade (bootstrap support value 54) with hydD and hydG (bootstrap support 99) as the most recently duplicated genes. The hydA, hydC and hydF genes were dispersed among other basidiomycete sequences. The hydA was found in a clade (support value 76) containing the hydPt-1 gene from Pisolithus tinctorius (Tagu et al., 1996). Two other hydrophobin genes (hydPt-2 and hydPt-3) have been isolated from P. tinctorius (Tagu et al., 1996) which were found outside this clade. The orthologs hydA and hydPt-1 encode for considerably longer polypeptides (141 and 140 aa) than those encoded by hydB to hydG and hydPt-2 and hydPt-3 (in the range 107–117 aa) (Table 2) (Tagu et al., 1996).

Owing to the rather poor resolution in the maximum likelihood tree, the evolutionary relationships between hydA to hydG of the P. involutus ATCC 200175 were also analysed using the split decomposition method, as implemented in the SplitsTree (Huson, 1998). The SplitsTree graph supported the finding that hydD and hydG are closely related. In addition, the analysis also indicates that hydE is more closely related to hydB than to hydD and hydG (see the Supplementary Material Fig. S2).

Evolution of hydrophobin genes within the Paxillus clade

Attempts were made to amplify, by polymerase chain reaction (PCR), genomic fragments corresponding to each of the seven hyd genes in five strains of P. involutus and in one closely related species P. filamentosus (Table 1). Sequence analysis showed that hydA, hydC and hydF orthologs were successfully amplified from all the six Paxillus strains/species (Fig. 2) and were found in well-resolved clades supported with high bootstrap values (82, 100 and 91, respectively). By contrast, hydB, hydD, hydE and hydG orthologs were less universal and were amplified in one to five strains/species. The relationships of the clades of hydB and hydE, and hydG and hydD, respectively, supports the finding (cf. above) that these genes represent two recently duplicated gene pairs. In addition, with the primers used, hydrophobin gene fragments were amplified that formed two additional clades, designated hydX and hydY. The hydX clade was closely related to the hydD and hydG clades, and the hydY clade was closely related to the clade of hydF.

Figure 2.

Phylogenetic tree showing relationships of hydrophobin genomic fragments that were polymerase chain reaction (PCR)-amplified from different strains of Paxillus. The gene fragments from Paxillus strains were amplified using gene-specific primers based on the hydA to hydG genes (see the Supplementary Material, Table S1). The first alphabets in the gene name represent the name of strain (e.g. AT) or species (e.g. Pf), separated by ‘-’, followed by gene names abbreviated as three lower-case italics and an upper-case italic letter (e.g. hydA). The italic number after the locus name refers to a clone number. Accession numbers of all the genes are given in Table S3. The tree was constructed using the neighbour joining algorithm on 48 hydrophobin sequences from the Paxillus strains along with seven hydrophobin genes from other closely related basidiomycetes Tricholoma terreum, Agaricus bisporus, Pisolithus tinctorius and Schizophyllum commune (cf. Fig. 1). The tree was rooted using a sequence from S. commune (S4). Values at nodes (only > 50 are displayed) represent the bootstrap support value in per cent of 1000 replicates. An asterisk (*) at the node indicates well-resolved clades that contain genes considered to have evolved directly from the same ancestral locus (i.e. they represent well-defined orthologs). hydX and hydY represent two clades of sequences with not yet identified homologs in the P. involutus ATCC 200175. Alternate hyd gene groups have been shaded.

Analysis of hydrophobin pseudogenes

Among the gene fragments amplified with primers designed against hydE, there were four sequences that appeared to be truncated, containing only part of the conserved hydrophobin signature motif. Two sequences (Mj-hydP1 and Mj-hydP2) were amplified from the strain Maj and another two (Nu-hydP1 and Nu-hydP2) from the closely related strain Nau (Le Quéréet al., 2004). These four sequences were 100% identical and the sequence of Mj-hydP1 was analysed in detail. From comparisons with the other hyd genes it was revealed that the above sequences represent a pseudogene of hydE (Fig. 3; Supplementary Material, Table S3). The Exons 2 and 3 which contain five out of eight conserved cysteine residues in the hydrophobin motif were intact in the pseudogene and this region displayed high sequence identity to hydE from the Maj strain. The Exon 1 and Intron 1 of this pseudogene displayed several degenerative mutations leading to the formation of a nonfunctional gene: one mutation introducing a stop codon, insertions of two single nucleotides in the first exon and a deletion of a larger fragment encompassing part of the first exon and the entire first intron.

Figure 3.

Reconstruction of events leading to the formation of a hydrophobin pseudogene in Paxillus involutus. The pseudogene (Mj-hydP1) is compared against hyd genes AT-hydE (P. involutus ATCC 200175 strain), Mj-hydE and Mj-hydX (Maj strain). The pseudogene (Mj-hydP1) was amplified from the strain Maj using primers designed for hydE. (a) Organization of the fully sequenced AT-hydE gene The gene has three exons (shaded boxes) and two introns (open boxes). The translational start codon (M), the codons for the eight cysteine residues(C) of the hydrophobin signature motif and the stop codon (*) are indicated. In the pseudogene (Mj-hydP1), the region corresponding to Exon 2 and Exon 3 is well conserved whereas a number of degenerative mutations have occurred in the region corresponding to Exon 1 and Intron 1 (underlined). The nucleotide (nt) sequence identity between the region (alignment had 216 sites including gaps and the stop codon) containing Exons 2 and 3 of the pseudogene and AT-hydE is 60%, and between the pseudogene and the Mj-hydE ortholog Maj is 71%. The Exons 2 and 3 of the pseudogene also display high sequence identity (62%) to another amplified hydrophobin gene fragment of Maj (Mj-hydX) that is found among a uncharacterized cluster of hydrophobin genes named hydX (cf. Fig. 3). (b) Enlargement of the region corresponding to Exon 1 and Intron 1. Enclosed in a box are the translated sequences (amino acid (aa) in upper case letters) corresponding to the region of Exon 1 in AT-hydE. Region corresponding to the primer binding site are excluded (+). Note the presence of a stop codon (*) in the pseudogene. Furthermore, the pseudogene lacks part of the nucleotides encoding Exon 1 and Intron 1 (nt in lower case letters). (c) Enlargement of the boxed region displayed in (b). The nt alignment of Exon 1 has been divided into four segments (1–4). Segment 1 is the primer binding site (+). Segment 2 shows a region where two nt have been inserted (position 4 and 19) in the pseudogene compared with the Mj-hydE, Mj-hydX and AT-hydE genes. Segment 3 of the pseudogene contains a variable region (enclosed in box) where insertions and deletions of nt are observed. Segment 4 shows the part of Exon 1 that has been deleted in the pseudogene. (d) Translation of the nt sequences in Exon 1 after removing the inserted nt (at positions 4 and 19 indicated in panel (c)) in the pseudogene. In the segment 2 of the pseudogene, seven of the first 10 aa residues are identical. The remaining nt sequences of Exon 1 that are absent in the pseudogene correspond to a region containing three of the eight cysteine residues of the hydrophobin motif.

Rate of nucleotide substitutions

The rates and patterns of nucleotide substitutions within the Paxillus hyd gene family were analysed by calculating the dn and ds values for the orthologs of hydA–hydF, and hydX. In total 143 pair-wise comparisons were performed of which 14 showed a brief period of dn/ds ratio > 1, indicating relaxed or positive selection (Fig. 4). Four of these comparisons included hydB, six hydD and the remaining four comparisons involved genes of the hydX group.

Figure 4.

Comparison of rates of nonsynonymous (dn) and synonymous (ds) nucleotide substitutions per coding site in the hydrophobin genes. Each alphabetic character represents a pairwise comparison made between the true orthologous hydrophobin genes belonging to hydA to hydX from Paxillus involutus (strains ATCC 200175, Pi08Be, Pi01Se, Maj and Nau) and P. filamentosus (Pf01De) (Fig. 2). The letter A corresponds to pairwise comparisons of hydrophobin genes within the true orthologous group indicated by an asterisks (*) in the hydA node in Fig. 2. The remaining letters, B, C, D, E, F and X, correspond to gene comparison of orthologous group hydB, hydC, hydD, hydE, hydF and hydX, respectively (Fig. 2). The diagonal line shows the neutral expectation where dn is equal to ds, if dn/ds > 1 then the pairwise comparisons occur above the diagonal line.

Divergence in expression profiles

The expression levels of the P. involutus ATCC 200175 hyd genes varied extensively depending on the growth conditions and the tissues being analysed (Fig. 5). Overall, the pattern of expression profiles for hydA, hydB, hydC and hydE were similar, whereas hydD and hydF were expressed at lower levels and showed different patterns of regulation.

To examine whether the divergence in expression levels were related to the divergence in sequence between gene duplicates, the distances in expression level were related to ds. This measure can be used as a proxy for the divergence time between gene duplicates (Gu et al., 2002). Although the expression divergence increased with sequence divergence, the association with ds was not statistically significant (r = 0.29, P = 0.45, n = 9). In addition, we tested the correlation between expression divergence and protein sequence divergence (d) of the hydrophobin gene pairs (Wagner, 2000). This association was weak and not statistically significant (r = –0.33, P = 0.22, n = 15).

To characterize the patterns of variation in expression levels more clearly, the data was examined by principal component analysis (PCA). We performed PCA on four hyd genes (hydA to hydD) for which expression data was available for a total of 25 microarray experiments. The first two principal components (PC1 and PC2) accounted together for 86% of the variation (Fig. 6a). The PC1 separated samples growing in soils from those growing on defined agar medium. The PC2 separated the samples of ECM root tips from those of mycelia and cords. A projection of the hydrophobins on the sample plane showed that the expression levels for hydA, hydB and hydC were positively correlated and explained mainly the variation along the PC1 (Fig. 6b). Conversely, hydD was more closely projected to the ECM samples along the PC2. Thus hydD appear to be specifically regulated in ECM root tips.

Figure 6.

Principal component analysis (PCA) of the patterns of expression levels of the hydrophobins hydA to hydD from 25 different microarray experiments (Table 3, Fig. 5). (a) Sample plots of 25 different microarray experiments. The PCA axis 1 and axis 2 explained 68% and 18%, respectively, of the total variation in the data. Treatments indicated by an asterisk (*) are ectomycorrhizal (ECM) tissues from associations with birch (Betula pendula). The mycelia grown in soil microcosms (•) and on agar substrates (○) are indicated with respective symbols. (b) Loading values of hydA to hydD. PCA axis 1 explains 68% and axis 2 18% of the variation.


Seven hyd genes were identified in a collection of EST clones from P. involutus ATCC 200175. Owing to the fact that the complete genome sequence of P. involutus is not available, the isolated genes might not include all the members of the hyd gene family in this fungus. The phylogenetic analyses indicated that four of the hyd genes characterized –hydB, hydD, hydE and hydG– have diverged rather recently, presumably after the separation of the Paxillus clade (suborder Paxillinieae) within the Boletales (Binder & Bresinsky, 2002) (Figs 1 and 2 and the Supplementary Material, Fig. S2). Furthermore, hydD/hydG and hydB/hydE represent two recently duplicated gene pairs. These apparent young duplicates were not found in all the Paxillus strains analysed. By contrast, orthologs for three other hydrophobins, hydA, hydC and hydF, were identified from all the Paxillus strains examined (Fig. 2). However, the evolutionary history of these hyd genes differed. hydA was found in a clade containing hydPt-1 from P. tinctorius. The two genes translate into proteins sharing a unique primary structure that is different from those encoded by other hydrophobin genes in P. involutus and P. tinctorius. Considering the fact that P. involutus and P. tinctorius belong to two evolutionary distant lineages (suborders) within the Boletales, Paxillineae and Sclerodermatineae, respectively (Binder & Bresinsky, 2002), we conclude hydA is an ancient copy, which has been maintained for a long time in the genome of Paxillus. The hydC is found in a branch containing no other hyd genes or orthologs from other species (Fig. 2). The hydC gene showed the lowest GC content in the exons and displayed the lowest pair-wise nucleotide identity towards the other P. involutus hyd genes. By contrast, hydF is found in a clade with another closely related but not yet characterized hyd gene (hydY) (Fig. 2).

In addition to hydA to hydG, we identified a hyd pseudogene (hydP) in the Maj and Nau strains. Phylogenetic analyses of internal transcribed spacer (ITS) sequences have shown that these strains are closely related and comprise a well-resolved clade within P. involutus (Le Quéréet al., 2004). Comparison of the hyd pseudogenes and their cognate ORFs in P. involutus showed that the pseudogene had a truncated hydrophobin motif retaining five out of the eight conserved cysteine residues (Fig. 3). Studies of the crystal structure of hydrophobins and analysis of mutants have shown that all eight cysteine residues are needed for producing a functional hydrophobin (Hakanpaa et al., 2004; Kershaw et al., 2005). There were several partly degraded pseudogenes recognized within gene families of Saccharomyces cerevisiae (Lafontaine et al., 2004). However, such pseudogenes are not expected according to the classical model of gene degradation, which assumes that random mutations are accumulated in the absence of a selection pressure. This raises the question whether pseudogenes can remain in the genome because they are involved in homology-dependent gene silencing mechanisms (Cogoni, 2001; Lafontaine et al., 2004).

The above data suggests that the Paxillus hyd gene family evolves according to the birth-and-death model (Nei & Rooney, 2005). This model predicts that new genes are created by gene duplication, and whereas some duplicates remain functional in the genome for long period of time, others are inactivated or deleted. Because of the lack of genomic data, the mechanisms by which the Paxillus hyd genes become duplicated are largely unknown. Nevertheless, the lack of introns and the phylogenetic position of the hydG gene suggest that this gene has arisen by retrotransposition of hydD transcript (Long et al., 2003). This is probably a recent event since hydG was only identified in the P. involutus ATCC 200175 strain. There is also a possibility that hydD and hydG correspond to alleles of same gene. Since the 5′- and 3′-untranslated regions of hydG are much longer than hydD (see the Supplementary Material, Table S3), we still favour the supposition that the hydG has arisen by retrotransposition of hydD.

The classical model for the origin of functional novelties following gene duplication postulates that gene duplication creates a redundant locus that is free to accumulate otherwise deleterious mutations as long as the original copy maintains the ancestral function (Ohno, 1970). The most likely outcome of this period of relaxed selection is that the redundant gene degenerates to become a pseudogene (nonfunctionalization). A less frequent outcome is that the redundant copy evolves a new function by a process known as neofunctionalization. In addition to neofunctionalization, it has also been proposed that preservation of gene duplicates can be brought by the process of subfunctionalization. In this model, the original function of the single-copy ancestral gene is partitioned between the two daughters (Prince & Pickett, 2002). The patterns of selection predicted by these models have been revealed by analysing gene duplicates in eukaryotes with sequenced genomes (Lynch & Conery, 2000). Similar to the results of these studies, we observed a period of relaxed selection in several of the young hyd duplicates (Fig. 4). Assuming that silent substitutions are not subjected to selection and that their number increases linearly with time, hydB and hydD, and a not yet fully characterized hyd gene (hydX) have experienced a phase of accelerated evolution, as confirmed from dn/ds ratios ≥ 1, which is an indicative of relaxed or positive selection (Nei & Kumar, 2000). A decline in the dn/ds ratio reflects a gradual increase in the magnitude of selective constraints (purifying selection). Notably, hydE, which is the closest duplicate to hydB, had a dn/ds ratio < 1. This suggests that the two recently duplicated hydB and hydE genes were diverging at different rates and under different selection pressures, as predicted by the neofunctionalization model.

Changes in gene expression are thought to be a major reason for the functional divergence and retention of duplicated genes (Ohno, 1970; Prince & Pickett, 2002). We asked whether the divergence in expression levels of the P. involutus hyd genes have increased with gene sequence divergence, that is, evolutionary time. This question has been examined for members in gene families in several eukaryotic model organisms using data from microarray experiments. These studies have provided various and in some cases contradictory results (Wagner, 2000; Makova & Li, 2003; Blanc & Wolfe, 2004; Haberer et al., 2004). We found that the expression divergence of the P. involutus hyd paralogs increased with time, but their association with protein sequence divergence (d) or ds or dn, was weak and not statistically significant. Thus, the evolution of coding regions and mRNA expression patterns appear to be uncoupled among the P. involutus hyd genes. A reason for not observing such correlations can be that the number of gene pairs compared was low and that the data set contains both old and young duplicates. Studies of gene duplicates in yeast, human and Arabidopsis suggest that evolution in expression patterns are rapid and correlate with sequence divergence only during a brief period of time after duplication (Gu et al., 2002; Makova & Li, 2003; Blanc & Wolfe, 2004).

Gene duplicates that are evolving according to the neofunctionalization and subfunctionalization models can be expected to accumulate mutations in their promoter sequences that can lead to shifts in their expression levels and tissue specificity (Prince & Pickett, 2002; Duarte et al., 2006). Notably, such shifts were detected in the expression of several P. involutus hyd genes. These shifts involved increased (hydE) and reduced levels (hydD and hydF) of expression, as well as increased tissue specificity (hydD) (Figs 5 and 6; Table 2). According to the so-called DDC (duplication–degeneration–complementation) model of subfunctionalization, degenerative promotor mutations can alter the level of expression of daughter genes to the point where both copies are needed to supply enough protein products (Force et al., 1999). Although, the DDC model can explain the shift in expression levels of the P. involutus hyd genes, the model cannot explain the altered tissue specificity of hydD. The subfunctionalization model predicts that the expression pattern of an ancestral gene is partitioned between its daughters. However, the expression pattern of the presumably ancestral hydA gene was not divided between hydD and other P. involutus hyd genes. Moreover, the expression pattern of hydD has not evolved according to the neofunctionalization model. Thus, the hydD expression pattern is not entirely novel, as other hyd paralogs were also expressed in the ECM root tissue. The above data suggests that the expression patterns of the P. involutus hyd gene family have evolved according to more complex combinations of the neofunctionalization and subfunctionalization models. Mixtures of these models have recently been noted in studies of duplicates in yeast, human, mouse and Arabidopsis (He & Zhang, 2005; Huminiecki & Wolfe, 2004; Duarte et al., 2006).

One concern in microarray experiments is that cross-hybridization between closely related paralogs and reporters may affect the result. We have previously shown, using similar array hybridization conditions referred to in this investigation, that the signals decrease rapidly when the sequence similarity drops below 90–95% (Le Quéréet al., 2006). Considering that the sequence identity between the hyd paralogs analysed was below 88% (see the Supplementary Material, Table S2), we do not feel that the expression data of these genes have been distorted by cross-hybridization.


This study was supported by grants from the Swedish Research Council. B.R. and P.S. were supported by grants from the Research School in Genomics and Bioinformatics. DNA sequencing was performed at the SWEGENE Center of Genomic Ecology at the Ecology Building in Lund, supported by the Knut and Alice Wallenberg Foundation through the SWEGENE consortium. We thank Eva Friman for help with DNA sequencing.