• Genome assemblies and raw reads presented in this study are deposited at the EMBL and SRA databases under accession numbers CAPO020000001–CAPO020013167, HF970591–HF972568 (N. africana), CAPP020000001–CAPP020022228, HF972687–HF975603 (N. sublineolata), CAPQ020000001–CAPQ020030909, HF975656-HF979206 (N. pannonica), and CAPR020000001–CAPR020040841, HF979207–HF985214 (N. terricola). Raw reads from N. africana RNA sequencing are available at the SRA under accession ERP002224. Sequence alignments are available at the digital repository Dryad under accession number doi:10.5061/dryad.4n9b4.


It is becoming increasingly evident that adoption of different reproductive strategies, such as sexual selfing and asexuality, greatly impacts genome evolution. In this study, we test theoretical predictions on genomic maladaptation of selfing lineages using empirical data from the model fungus Neurospora. We sequenced the genomes of four species representing distinct transitions to selfing within the history of the genus, as well as the transcriptome of one of these, and compared with available data from three outcrossing species. Our results provide evidence for a relaxation of purifying selection in protein-coding genes and for a reduced efficiency of transposable element silencing by Repeat Induced Point mutation. A reduction in adaptive evolution was also identified in the form of reduced codon usage bias in highly expressed genes of selfing Neurospora, but this result may be confounded by mutational bias. Potentially counteracting these negative effects, the nucleotide substitution rate and the spread of transposons is reduced in selfing species. We suggest that differences in substitution rate relate to the absence, in selfing Neurospora, of the asexual pathway producing conidia. Our results support the dead-end theory and show that Neurospora genomes bear signatures of both sexual and asexual reproductive mode.

Selfing is a widespread strategy allowing reproductive assurance (Aanen and Hoekstra 2007; Busch and Delph 2011), but comes with long-term evolutionary costs. Population genetic theory predicts that selfing populations will suffer from a reduced effective recombination rate and population size (Pollack 1987; Charlesworth and Wright 2001). Consequently, a reduced efficacy of purifying selection (Hill and Robertson 1966) and a more important role for genetic drift are expected in these populations (Otto and Lenormand 2002), which will show an excess of slightly deleterious mutations and lower frequencies of adaptive alleles (Charlesworth 1992). The long-term negative effects of selfing have been formalized in the “dead-end” hypothesis, which makes two assumptions: first, transitions from outcrossing to selfing are irreversible, and second, the reduced efficacy of selection and the reduced adaptive potential renders these species maladapted (Stebbins 1957; Takebayashi and Morrell 2001). The dead-end theory was initially formulated in plants, but holds for any population with reduced effective recombination rate, with the most extreme examples provided by asexual lineages (Normark et al. 2003).

The irreversibility of transitions from outcrossing to selfing/asexuality is supported from the study of a wide range of systems, for example, plants (Goodwillie 1997; Schoen et al. 1997; Igic et al. 2006) and fungi (Yun et al. 1999; Gioti et al. 2012). The most commonly used criterion to empirically test for maladaptation under less efficient purifying selection is the accelerated rates of protein evolution, that is, a relatively high dN/dS in a majority of coding genes. The use of this criterion provides confirming results in asexual animal phyla (Paland and Lynch 2006), but inconclusive results in selfing taxa of plants (Wright et al. 2002; Haudry et al. 2008; Escobar et al. 2010) and in Caenorhabditis elegans (Artieri et al. 2008; Cutter et al. 2008). Mating system may also affect the dynamics of transposable elements (TEs), the spread of which is predicted to depend on the balance between transposition rate (Wright and Schoen 1999; Morgan 2001) and negative selection (Charlesworth and Langley 1989; Charlesworth and Charlesworth 1995). These predictions have been most extensively tested with simulations in plants (reviewed in Wright et al. 2008), whereas in C. elegans polymorphism data revealed a reduced efficacy of selection for a class of DNA transposons (Dolgin et al. 2008). Finally, maladaptation expressed as a reduction in adaptive evolution can be tested through examination of the strength of synonymous codon usage bias, which can result from selective pressure for more efficient and accurate translation (Akashi 1994; Duret and Mouchiroud 1999; Duret 2000; Stoletzki and Eyre-Walker 2007). A correlation between selfing and reduced codon usage bias was shown in Arabidopsis (Wright et al. 2002; Qiu et al. 2011b), whereas mild reductions were observed in selfing Triticeae species (Haudry et al. 2008; Escobar et al. 2010) and C. elegans (Artieri et al. 2008; Cutter et al. 2008). Overall, previous studies have shown that genomic signatures of maladaptation can be subtle; they are easily obscured by even small degrees of residual outcrossing (Glémin and Galtier 2012) or if the age of selfing is relatively young (e.g., Cutter et al. 2008; Escobar et al. 2010).

It is of particular interest to study the dead-end theory in organisms with a predominant haploid life stage, such as Ascomycete fungi, for two reasons: First, haploidy allows one to exclude the confounding effect of purging of deleterious mutations, expected as a result of increased levels of homozygosity in diploid selfers. Second, haploid selfing implies fusion of two mitotic descendants of the same cell and is thus very different from diploid selfing as known in plants and animals (Billiard et al. 2012). Under this mode of selfing, effective recombination rate approaches zero, such that from a genetic point of view, this mating system is equivalent to asexuality (Nauta and Hoekstra 1992a). So far, the question of genomic consequences of selfing has been tackled in the fungal kingdom solely by population genetic models (Nauta and Hoekstra 1992b,a). These models were built on the genus Neurospora, within which a high degree of outcrossing is observed in heterothallic species, such as Neurospora crassa (Powell et al. 2003; Ellison et al. 2011a), and facultative outcrossing is observed in pseudohomothallic species, such as Neurospora tetrasperma (Raju 1992; Powell et al. 2001; Menkis et al. 2009); we will refer to species exhibiting these two mating systems as “outcrossing” in this study. Obligate haploid selfing is assumed in homothallic Neurospora species such as Neurospora africana, because these appear to lack the morphological structures important for outcrossing (Howe and Page 1963; Perkins 1987; Glass et al. 1990). We will refer to these as “selfing” here, although one needs to note that the absence of outcrossing in nature for these species has not been proven so far.

We previously made use of an exhaustive phylogenetic framework available for Neurospora (Nygren et al. 2011) and the structure of the mating-type (mat) locus, to confirm that mating-system shifts in the genus are unidirectional, from outcrossing to selfing. Thereby, we confirmed the first assumption of the dead-end theory on the irreversibility of transitions to selfing (Gioti et al. 2012). The four selfing species that were considered in this study represent independent mating-system transitions, because they belong to distinct phylogenetic clades (Nygren et al. 2011) and showed different mat locus architectures that could be explained by distinct mechanistic models for the transitions (Gioti et al. 2012). In this study, we sequenced the genomes of these four species and contrasted dN/dS, TE spread and silencing, base composition, and codon usage patterns between these and three Neurospora species reported to outcross in nature. Our results overall support the second assumption of the dead-end theory on maladaptation of selfing species.



The isolates used for genome sequencing, N. africana (FGSC 1740), Neurospora sublineolata (FGSC 5508), Neurospora pannonica (FGSC 7221), and Neurospora terricola (FGSC 1889) were obtained from the Fungal Genetics Stock Center (FGSC, University of Missouri, Kansas City, MI). Mycelia for DNA extraction were grown in 18 × 200 mm culture tubes containing minimal medium broth (Vogel 1964) with 1% sucrose. For total RNA isolation, N. africana was cultured: (1) In liquid synthetic crossing (SC) medium (Westergaard and Mitchell 1947), supplemented with sucrose (Sigma) to a final concentration of 2% or 0.1% for vegetative and carbon-starved vegetative growth, respectively. The cultures were shaken at 200 rpm at 22° with a 12 h:12 h photoperiod for 4 days. (2) In SC solid medium plates, kept in darkness until the first protoperithecia were observed in the microscope (6–8 days: early mating condition) or after abundant mature perithecia were observed (12–14 days: late mating).


Genomic DNA was extracted from 2 days old fungal mycelium using the Easy-DNA Kit (Invitrogen, Carlsbad, CA). We extracted total RNA by using TRI REAGENT (Molecular Research Center Inc., Cincinnati, OH) following the manufacturer's protocol. Tissues for RNA extraction were homogenized with a Dounce glass grinder and debris was filtered on Qiashredder columns (Qiagen, Chatsworth, CA). Total RNA was treated with DNase I according to the manufacturer's protocol (Fermentas, Burlington, Canada). RNA quality and quantity were analyzed by electrophoresis with an Agilent Bioanalyzer using the RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, CA).


Whole-genome sequencing was performed with the Illumina Genome Analyzer (GA) version II platform (Geneservice Source BioScience plc, Nottingham, UK), on genomic libraries sheared and gel- purified to select for 130 bp mean insert size (adapters excluded). Each library was sequenced for 55 cycles in paired-end mode, in separate lanes of two different flowcells; an additional run for 75 cycles was performed for N. africana. Image analysis, base calling, and filtering were performed with the GA pipeline software (version 1.3, Illumina). For whole-transcriptome sequencing in N. africana, we used a pooled sample of total 35.65 μg RNA: 7.15 μg from vegetative growth and 9.5 μg from each of carbon-starved vegetative growth, early mating and late mating conditions. A cDNA library was constructed through full random priming of polyA- mRNA, size fractioned, normalized, and subsequently sequenced by Eurofins MWG Operon (Germany) using Roche GS FLX (Titanium chemistry).


Genomic DNA sequencing reads were de novo assembled using the program Velvet version 0.7.58 (Zerbino and Birney 2008). Quality controls, optimizations of the assembly and estimations of error rates for the assembled genomes are described in Supporting Information: Methods. We extended the scaffolds with Mercator (Dewey 2007), using the finished genomes of N. crassa (Galagan et al. 2003), Neurospora discreta (FGSC 8579:, and N. tetrasperma (Ellison et al. 2011b; FGSC 2508: Note that data on extended scaffolds (N. sublineolata, N. pannonica) and corrected misassemblies (N. terricola) for the mat loci are separately deposited (Gioti et al. 2012). Core eukaryotic genes mapping approach (CEGMA) analysis (Parra et al. 2007) was run using a set of 248 core eukaryotic genes (CEGs) as queries against each genome. Neurospora africana RNA-sequencing reads were assembled by aligning to the N. africana draft genome with the Program to Assemble Spliced Alignments (PASA) (Haas et al. 2003).


To generate a genomic dataset from an outcrossing species comparable to the datasets from selfing species, we simulated two sets of paired-end reads from the genome of N. crassa (version 10) using the program wgsim ( A total of 29–30 million reads were generated for each dataset, using the expected base error rate after Illumina GAII quality trimming (0.02, as in Kircher et al. 2009; Luo et al. 2012), library insert sizes equal to 250 and 150 bp (Table S1) and read lengths set to 55 bp (as untrimmed data) and to 33–55 bp (as trimmed data). Furthermore, we downloaded ∼30 million Illumina reads (short read archive: SRA026343) of resequencing of a N. crassa mutant strain (McCluskey et al. 2011) and performed the same analyses as on sequenced reads from selfing species.


Assembled genomes were annotated using the program MAKER version 2.05 (Cantarel et al. 2008). Composite gene models, not further curated, were created by combining the following sources of evidence: (1) Ab initio gene predictions from AUGUSTUS (Stanke and Waack 2003) trained with N. crassa gene models, SNAP (Korf 2004) trained on N. crassa, and Genemark-ES version 2.3a (Ter-Hovhannisyan et al. 2008) with a self-trained model from each genome. (2) Protein alignments to a set of 41,708 proteins from the closely related species Fusarium graminearum, Magnaporthe grisea, N. discreta, N. tetrasperma, and Sordaria macrospora and clustered using CD-HIT version 4.0 (Li et al. 2001) with a protein identity threshold of 90%. (3) Nucleotide alignments of the 27,056 EST transcript sequences obtained from assembly of N. africana RNA-sequencing data. (4) Nucleotide alignments to the 9908 N. crassa near full-length transcripts inferred from the genome annotation (


A whole-genome seven-way alignment was constructed with Mercator (Dewey 2007) using the genomes of outcrossing species as “finished” and the genomes of selfing species as “draft genomes”. This alignment was combined with gene prediction data from MAKER to extract single-copy syntenic orthologs using custom Perl scripts ( We selected a strict set of orthologous genes that show microsynteny, by requiring each gene and its left and right flanking genes in N. crassa to be in a syntenic block. In cases where MAKER predicted two open reading frames (ORFs) for a single N. crassa gene at orthologous positions, the gene was excluded. The protein-coding sequences corresponding to orthologous genes were aligned using MUSCLE version 3.8.31 (Edgar 2004) and the alignments were “back-translated” using the sequence of the coding genes and the “” script of BioPerl (Stajich et al. 2002). To exclude low-quality alignments, we kept those that showed a proportion of residue pairs (equivalent to sum-of-pairs score) equal or higher than 99% following a “Heads or Tails” analysis (Landan and Graur 2007). We further excluded alignments where the estimated tree length based on the number of synonymous substitutions per codon was higher than five in preliminary analyses using the codeml program in PAML (branch models), as manual inspections revealed cases where the observed dS saturation was caused by misalignments.


The programs codeml and baseml, included in the PAML package version 4.4 (Yang 1997, 2007), were used to test models of codon and nucleotide substitution rate constancy across the Neurospora phylogeny. We used the following topology and branch designation for reproductive mode (when specified): (((N. discreta, (N. crassa, N. tetrasperma)), N. africana #1), (N. pannonica #1, N. sublineolata #1), N. terricola #1), where #1 designates branches delineating selfing species. The topology is derived from MrBayes (Ronquist et al. 2012) analysis of data from previously published genus phylogenies (Nygren et al. 2011) and branch designation is based on the assumption of an outcrossing ancestor of Neurospora (Gioti et al. 2012). The following alternative topology, derived from MrBayes phylogenetic reconstruction on the concatenated dataset of the 2789 reliable gene alignments identified in this study, was also used in Branch models and in baseml analyses: (N. africana#1, (N. discreta, (N. crassa, N. tetrasperma)), ((N. pannonica#1, N. terricola#1)#1, N. sublineolata#1)#1). For baseml analyses, we used a GTR+G model, the closest available substitution model to REV, which best fitted a dataset of four-fold degenerative sites from 500 randomly picked alignments according to a ModelGenerator (Keane et al. 2006) analysis. Tested models and summary calculations of dN/dS are detailed in Supporting Information: Methods. Likelihood ratio tests (LRTs) were performed by comparing twice the difference in log-likelihood values (−2ln ∆) between nested models using a χ2 distribution. The false discovery rate for multiple testing corrections (q < 0.05) was calculated using the Q-value package (Storey 2002); we chose the tuning parameter in the estimation of the proportion of true null hypotheses using the bootstrap method.


The genomes of all species were scanned for TEs with RepeatMasker (version open-3.3.0) using a library generated by the fusion of 430 fungal TE families available from RepBase release 15.02 (Jurka et al. 2005) with a previously described custom library comprising de novo-identified TEs (Gioti et al. 2012). Copies of nsubGypsy were retrieved in the genomes of Neurospora species using the nucleotide sequence as query in the web interface of TARGeT (Han et al. 2009), setting the minimum matched percentage of query at 50%. The copies were aligned using MUSCLE (Edgar 2004) and ambiguous characters and gap positions were trimmed off using trimAl version 1.3 (Capella-Gutiiérrez et al. 2009) with the “nogaps” option. The phylogenetic tree of nsubGypsy was constructed with PhyML (Guindon et al. 2010), using a GTR model (based on a Modeltest run) and NNI tree search, starting with a BioNJ tree. Support for branch nodes was obtained using 1000 bootstraps. Additional analyses aiming to consolidate our findings on TE content are presented in Supporting Information: Methods. We calculated the Composite Repeat induced point mutation Index (CRI) = (TpA/ApT) – (CpA + TpG/ApC + GpT), as defined in Lewis et al. (2009), in nonoverlapping 50 bp windows of TEs and in 500 bp windows of the assemblies, using custom Perl scripts available at Plots were generated with R version 2.9 and greater (


To identify the 100 most highly expressed genes (HEGs) of N. crassa and of N. africana, we generated spliced alignments of transcriptome data against genome assemblies using TopHat (Trapnell et al. 2009). Next, we used the Fragments Per Kilobase of gene per Million reads criterion within Cufflinks (Trapnell et al. 2010) to sort genes according to their expression level. For N. crassa, we used published Illumina-sequencing transcriptome data (Ellison et al. 2011a) and for N. africana, the 454-sequencing transcriptome data generated in this study. Codon usage tables and the Frank Wright's Nc statistic for the effective number of codons (Wright 1990) were calculated on coding sequences of all genes and HEGs using the programs “cusp” and “chips” available with emboss tools (Rice et al. 2000); both individual (for plots of distribution and statistical tests) and total (summed over all gene features) Nc values were computed. To assess statistical significance of Nc differences, we generated 10,000 randomized datasets of 100 coding genes from the N. africana and N. crassa genomes with custom Python scripts using Biopython version 1.57 modules (Cock et al. 2009); the Nc values of these datasets reflected the codon usage of all genes (N. africanaRandom = 53.64, N. crassaRandom = 53.26). P-values were calculated by dividing the number of occurrences in the randomized datasets where Nc was equal to the Nc of HEGs by the sample size.



Using solely Illumina paired-end sequencing and an optimized method for de novo assemblies, we obtained good coverage (19×–30×) and high accuracy (0.006–0.030% error rate) draft genomes of four selfing Neurospora species: N. africana, N. pannonica, N. sublineolata, and N. terricola (Table 1). Based on our assemblies, their estimated genome sizes range between 35.7 (N. sublineolata) and 39.3 Mb (N. terricola; Table 1). We identified between 86% and 95% CEGs in our assemblies as full-length alignments, whereas 93–98% were found as partial alignments (Table 1). 454 RNA-sequencing of N. africana resulted in 622,631 single (5′) reads of an average size of 340 bp, which we assembled to 27,056 EST sequences, corresponding to roughly seven-fold transcriptome coverage. Between 11,632 and 12,438 genes were predicted in the genomes of selfing Neurospora species (Table 1) using a combination of ab initio predictions, homology searches, and transcript evidence from N. africana. A seven-way genomic alignment incorporating three publicly available genomes from outcrossing Neurospora (N. crassa, N. discreta, and N. tetrasperma) to our draft assemblies identified 4355 single-copy syntenic orthologs. After quality filtering, we obtained 2789 reliable protein-coding gene sequence alignments, used in downstream molecular evolution analyses.

Table 1. Assembly parameters and features of selfing Neurospora genomes
SpeciesMil. readsAvg. coverage coverage (fold)Size (Mb)Nb scaffoldsaN50bMaxcError rate%dCEGs% full-partialPredicted protein-coding genes
  1. a

    Number of scaffolds after Mercator analysis, excluding scaffolds with size smaller than 300 bp. Due to exclusion of short contigs before assembly following EMBL rules for submission, this number is higher in the deposited data.

  2. b

    N50 is a weighted median statistic such that 50% of the entire assembly is contained in scaffolds equal to or larger than this value, here expressed in Kb.

  3. c

    Max refers to the size in Kb, of the biggest assembled scaffold.

  4. d

    Error rates were calculated based on the number of confident variants identified by GATK analysis and the total number of callable bases (for details, see Supporting Information: Methods).

N. africana41233619781595880.00795.1–96.712,438
N. sublineolata29.73035.72917856190.02294.3–98.311,632
N. pannonica28.32538.53551835220.00686.3–93.512,237
N. terricola28.51939.36008664020.03091.9–96.312,399


We used branch substitution models (using codeml in PAML; Yang 1997; Yang 2007) on the 2,789 individual gene alignments to identify genes differentially evolving between selfing and outcrossing species. We first tested for a statistically significant better fit of our data to a two-ratio model allowing for a difference in dN/dS rates between selfing and outcrossing lineages over a uniform rate. More than one-third (37%) of the investigated genes were by this method identified as “differentially evolving,” a result also confirmed by analysis of the concatenated dataset (Table 2, Fig. 1A). Of these differentially evolving genes, 83% showed a significantly (Wilcoxon-signed rank test P < 2.2E−16) higher dN/dS in selfing lineages, that is, a positive difference (dN/dS of branches leading to selfing species) − (dN/dS of branches leading to outcrossing species). These genes were termed as “selfing rapidly evolving” and overall represent 30% of all genes studied (Table 2). To test whether the differences in dN/dS rates were affected by potential differences in gene structures, we analyzed concatenated data where gaps were not treated as missing data and found similar results (Table 2). Moreover, we did not find any significant differences in gene length between the two groups of species. The significantly higher dN/dS in selfing lineages was also observed when we used the alternative tree topology, derived from phylogenetic reconstructions of the concatenated dataset of 2789 genes and when using a model allowing individual branches to have different rates of evolution (free-ratio model) on the concatenated dataset. The free-ratio model significantly fit the data better than the uniform model (Table 2), and estimated dN/dS ratios were higher in all terminal branches leading to selfing species compared to outcrossing species (Fig. 1B). Note that using the free-ratio model on the concatenated dataset, we found a weak positive but not significant, correlation between dN/dS and dS (Fig. S1A). The increased dN/dS of genes in selfing taxa seems to be due to an increase in dN rather than a decrease in dS, as shown by comparing these values (from the two-ratio model) between outcrossing and selfing species for all and the selfing rapidly evolving genes (Fig. S1B–E).

Table 2. Rates of codon and nucleotide substitutions for 2,789 Neurospora orthologs
 Codon substitution rates (codeml)Nucleotide substitution rates (baseml)
 LRT tests P-value       LRT tests P-value 
 Local-Free-dN/dSa SelfdN/dSa Out dN/dS All Global   Local-Free-Rself/
DatasetglobalglobalLocal modelP-value (Wilcoxon-signed rank test)model% DEb% SREc% SPSdglobalglobalRoute
  1. a

    dN/dS for individual genes summed over Self (selfing) or Out (outcrossing) branches comes from the following formula:

    display math

    where N is the number of nonsynonymous sites, S is the number of synonymous sites, dN is the number of nonsynonymous substitutions, dS is the number of synonymous substitutions.

  2. b

    DE (differentially evolving genes) = genes that show a q-value (multiple testing correction) in LRT tests < 0.05. LRT tests were performed between local and global models. Percentages for different categories of genes are expressed relative to the total number of 2789 genes.

  3. c

    SRE (selfing rapidly evolving genes) = DE genes for which [dN/dS Selfing dN/dS Outcrossing] > 0 in the local model. Percentages for different categories of genes are expressed relative to the total number of 2789 genes.

  4. d

    SPS (selfing positively selected genes) = SRE genes showing signs of positive selection. Percentages for different categories of genes are expressed relative to the total number of 2789 genes.

  5. e

    Rself/Rout = ratio of relative rates of nucleotide substitutions in the branches leading to selfing (self) vs. outcrossing (out) species according to the local mode.

  6. f

    The dataset “concatenated with gaps” is identical to the concatenated dataset, but gaps were not treated as missing data by setting the “cleandata” parameter to the nondefault value 1.

Individual genesnana0.1770.1472.2E−160.16036.7830.477.20nanana
Concatenated with gapsf000.1510.122na0.135nananananana
Figure 1.

Protein evolution analysis of 2,789 orthologs in the Neurospora genus. (A) Plot of dN/dS values in selfing (x-axis) and outcrossing (y-axis) branches coming from the local model. Only the differentially evolving genes are plotted; the diagonal line symbolizes the area occupied by genes that did not show statistically significant differences between the local and the global model. Selfing rapidly evolving genes fall below the indicated diagonal line. (B) Molecular evolution rates for each branch of the Neurospora phylogeny estimated from a free-ratio model on the concatenated dataset. The length of each branch corresponds to dS values, whereas the estimates on top of each branch correspond to dN/dS values. O and S symbols on top of each branch stand for outcrossing and selfing mating system, correspondingly, based on the assumption of an outcrossing ancestor for the genus as shown in Gioti et al. (2012).

To investigate whether lack of selective constraints or positive selection drives a higher dN/dS, we evaluated the proportion of genes that evolve fast in selfers due to positive directional selection. Specifically, we estimated substitution rates under a branch-site model that permits selfing species to have sites evolving under positive selection (modified modelA) and outcrossers to only have sites evolving under purifying/neutral selection (modelA-null). From the genes that had a significantly better fit to the selection model compared to a neutral model, we isolated a strict set of selfing positively selected genes as those with at least 5% of sites evolving under positive selection; 1.5 ≤ dN/dS < 10 (values outside these thresholds are related to weak evidence for positive selection or unreliably high estimates of dN/dS, respectively); at least three sites with bayes empirical bayes (BEB) posterior probabilities above 95% for the positive selection class. These genes identified as positively selected in selfing species account for 7.2% of all genes tested (Table 2).

We compared nucleotide substitution rates between selfing and outcrossing Neurospora by testing the goodness of fit of clock rate models on the four-fold degenerate sites of the 2789 genes (concatenated dataset) using baseml. The estimated clock rate was less than half (0.43) for branches delineating the selfing species than for those delineating the outcrossing species (Table 2, results identical between alternative tree topologies).


The TE content of genomes of selfing species was considerably lower (3.76–4.6%) than that of any of the outcrossing species (9.92–16.3%; Table 3). This pattern was unchanged when searching for TEs only using the library of known transposable elements or a library containing de novo–identified TEs from the genomes of selfing species (2.34–3.99% in selfers, 4.71–5% in outcrossers), which indicates that our results are not affected by any taxonomic bias in the TE libraries. In general, each class of TEs has fewer copies in selfing species. A characteristic example is nsubGypsy (Fig. 2), a novel LTR-Gypsy element recently identified in N. sublineolata (Gioti et al. 2012). The phylogenetic reconstruction of nsubGypsy copies shows many examples of species-specific expansions in the outcrossing species (e.g., clade I; Fig. 2) indicating independent bursts of the element within these genomes. In contrast, there were fewer total copies in the genomes of each selfing species, forming clades with copies from other selfing or outcrossing taxa (e.g., clade II; Fig. 2).

Table 3. Transposable element, GC and RIP content of Neurospora genomes
 % of different transposable element (TE) classesGC%RIP%
Species—repr. modeaSINELINELTRDNAUnclassifiedSmall RNASimple repeatsLCcTotalCDSIntronGC3 4 & 6-foldTEsTEsGenome
  1. a

    s = selfing, o = outcrossing.

  2. b

    RepeatMasker estimates are rounded; N. africana has eight degenerate copies of DNA transposons.

  3. c

    LC = low complexity.

N. africana—s0.020.931.530.00b2.
N. sublineolata—s0.030.341.990.011.340.
N. pannonica—s0.
N. terricola—s0.020.112.510.031.540.
N. tetrasperma—o0.060.805.650.261.910.030.740.639.9255.8246.9062.634.0474.7716.73
N. discreta—o0.060.828.
N. crassa—o0.061.558.900.632.770.810.890.7916.3056.0248.1363.131.2082.9523.7
Figure 2.

Phylogenetic tree of the transposable element nsubGypsy in Neurospora. Braches delineating outcrossing species are colored in red shades, whereas branches leading to selfing species are colored in blue shades. Copies of nsubGypsy in each species are numbered. Bootstrap values above 0.7 are shown in each node of the polar layout. Numbered clades I and II are cited in the text.

Note that the genomic proportions of TEs reported here are overestimated, as we searched for de novo–identified repeats that may constitute false positives, such as TE remnants and low complexity, high copy sequences. However, because we used the same library to search TEs in all species, this bias is unlikely to affect the observed trend. As TEs tend to be collapsed in de novo assemblies (Treangen and Salzberg 2012), we assessed the contributions of assembly construction methods to the observed differences. Repeating the assemblies without excluding high-coverage reads (an assembly optimization) revealed only marginal (0.01–0.04%) differences in TE content. Moreover, the content of N. crassa genomes reconstructed from simulated short reads (Table S1) is still higher (11.24–11.78%) than that of any selfing species and within the range of the TE% observed in outcrossing species. This is also true for an assembly of Illumina reads from sequencing of a N. crassa mutant strain (McCluskey et al. 2011), the TE content of which was estimated to be 9.28%. Finally, to obtain an unbiased measure of the TE content independent of the assembly procedures, we calculated the percentage of raw reads that map uniquely to our TE library (Table S2): This ranged from 1.34% to 2.12% in selfing species, and was again higher in N. crassa (2.84%).

In fungi, one mechanism for TE inactivation is Repeat Induced Point mutation (RIP), a process that mutates multicopy DNA during meiosis, changing CpN to TpN in N. crassa (Selker 1990). The patterns of silencing of TEs by RIP were strikingly different between selfing and outcrossing species (Fig. 3, Table 3), as shown by estimations of the composite RIP index (Lewis et al. 2009). Although 74–83% of the TEs in outcrossing species bear an RIP signature, this percentage is reduced to only 34–65% in selfing species. Similar differences are observed when comparing percent of the total genome with evidence for RIP (Table 3).

Figure 3.

Silencing of transposable elements by RIP in Neurospora genomes. Histograms of frequencies of the Composite RIP Index calculated with a sliding window approach (formula used in the x-axis) on transposable elements of selfing species (upper panel) and outcrossing species (lower panel). Thick vertical lines show the threshold (0) above which Composite RIP Index values are indicative of RIP. Percentages of TEs mutated by RIP are shown in each graph.


We enquired if predicted differences in effective recombination rate between selfing and outcrossing species would affect patterns of guanine and cytosine (GC) biased gene conversion (gBGC), a segregation distortion directly linked to recombination (reviewed in Duret and Galtier 2009). For this purpose, we inspected the base composition of protein-coding genes (CDS), introns, 3rd positions of four- and six-fold degenerate codons and TEs (Table 3; per-gene estimates and Wilcoxon rank sum test P values: Fig. S2). GC content of CDS in selfing species is slightly higher compared to outcrossing species (56–56.3% in selfing, 55.8–56% in outcrossing) and the differences are significant in all pairwise comparisons of selfing to outcrossing species, with the exception of N. africana and the N. pannonicaN. crassa pair (Fig. S2). The differences were more pronounced when focusing on GC of variant positions (described in Supporting Information: Methods) for all 4355 orthologous coding sequences (Fig. S3). The median GC for outcrossing species was 56.3% in contrast to 59.1% for selfing species (P = 2.2E−16). Considering assumed neutrally evolving sites, introns showed a significant increase in GC in selfing species (49–51% vs. 46–48% in outcrossing) and third positions of four- and six-fold degenerate codons were slightly higher in selfing species (63.3–63.9% vs. 62.6–63.1% in outcrossing). However, when inspecting the distributions of GC3 of degenerate codons per gene, a few pairwise comparisons were not significant and N. africana showed significantly lower GC in the selfing species compared to N. discreta and N. crassa (Fig. S2). For TEs, both the content and the distribution of GC% were markedly and significantly higher in genomes of selfing (40–48%) than outcrossing species (31–34%).

To estimate a potential contribution from sequencing technology to the increased GC content of all genomic features in selfing species, we used Illumina-sequencing data from a study of a N. crassa mutant strain (McCluskey et al. 2011). The GC content of these data was higher than the GC content of the finished genome of N. crassa, sequenced using Sanger technology (50% vs. 48%).


We compared adaptive evolution patterns between selfing and outcrossing species of Neurospora by inspecting codon usage bias. From whole-genome analysis, the most frequently used codons identified are identical between the two groups, and end in G or C (Table S3), as observed in many other species (e.g., Duret 2002). We next focused on optimal codons, defined as those most frequently used in HEGs, because selection for translational accuracy and/or efficiency is expected to be prominent in these genes (Duret and Mouchiroud 1999). We compared two species with available RNA-seq data, the selfing N. africana (this study) and the outcrossing N. crassa (Ellison et al. 2011a). The lists of top 100 HEGs from each species comprise genes with similar functions, commonly expressed at high levels, although only a small portion of these (11%) are shared orthologs in the two species. Gene length was not significantly different between the two datasets (median N. crassa: 889 bp, N. africana: 748 bp, P = 0.02). Although optimal codons are identical between N. crassa and N. africana (except for Glycine), their relative frequencies show a marked elevation in N. crassa, to the exception of the codon for Cysteine (Table S4). This elevation contrasts the frequencies of most frequent codons, which overall do not show any differences between the two species (Table S3).

We next looked at the effective number of codons (Nc), that is, the total number of used codons for each of the 20 amino acids in a gene (Wright 1990); Nc is expected to correlate with codon usage bias and to be reduced in HEGs. The Nc of all coding genes was very similar in the two species (N. crassaall = 53.31, N. africanaall = 53.69), and this is also true for orthologs (N. crassaall = 50.76, N. africanaall = 50.06). The top 100 expressed genes of both species showed a significantly lower Nc compared to all genes (N. africanahigh = 44.27, P = 0.00400 by randomization, N. crassahigh = 31.69, P = 0.0044), but in N. africana this was only slightly reduced compared to the Nc of all genes and exhibited a higher degree of variance (Fig. 4). The Nc of N. africanahigh genes is significantly higher than the Nc of N. crassahigh genes (Wilcoxon rank sum test P = 2.2E−16). Because the sets of HEGs were not orthologous, we also compared Nc of orthologous ribosomal protein genes, expected to be highly expressed and thus, under strong usage bias in both species. The Nc of these genes in N. africana (N = 77 single-copy orthologs) was higher than in N. crassa (N. africanaribo = 42.4, N. crassaribo = 38.7).

Figure 4.

Effective number of codons in Neurospora africana and Neurospora crassa. Boxplot distributions of the effective number of codons (Nc) used in the top 100 highly expressed genes (HEGs) in N. crassa and in N. africana (N. crassa-high, N. africana-high), vs. all genes (N. crassa-all, N. africana-all). Outliers (Nc < 20) were excluded for this plot. For each dataset, base composition (GC%) at third positions of four- and six-fold degenerate codons (GC3 4f, 6f) is also presented. Note that HEGs were excluded from the datasets N. africana-all and N. crassa-all so as to calculate GC3 4f, 6f without their influence.

To disentangle effects of selection efficacy on codon usage bias from those of mutational bias, we calculated the GC3 of HEGs and compared it with the GCI (GC% of their introns). In both species, the GC3 is significantly higher than GCI (P = 2.2E−16; Fig. S4). Although GCI does not differ significantly between the two species (P = 0.294), the GC3 of the outcrossing N. crassa is significantly (P = 1.7E−10) higher compared to N. africana. The same trend is significant for GC3 at degenerate codons (Fig. 4) of HEGs (P = 1.162E−05) and all genes expect HEGs (P = 2.2E−16).



This study presents whole-genome sequences from four practically unstudied species of selfing Neurospora, thereby establishing a basis for comparative genomic studies in this model genus. Even though our assemblies are not resolved to chromosomes, CEGMA analysis revealed a high level of completeness (93–96%, cf. 95–97% for the N. crassa finished genome). As the goal of this study was to evaluate differences in molecular evolution between selfing and outcrossing species, we focused on annotation of these genomes. Near-complete proteome sets were generated with the annotation pipeline MAKER supplemented with transcript evidence from multiple growth conditions of one species, N. africana. Our approach predicted as many as 12,000 genes in each species (Table 1), close to the roughly 10,000 genes found in N. crassa and consistent with typical overestimation (up to 20%) of gene counts seen in microbial genome studies (Ussery and Hallin 2004). Future work incorporating additional transcriptome data from each selfing species is needed to further improve our current annotations.

Comparing sequences of species with different reproductive modes allowed us to empirically test, and confirm, predictions of the dead-end theory for the first time in a fungal system. Future studies on this question would ideally include outcrossing species that belong to an independent clade of the Neurospora phylogeny. This will allow unequivocally confirming that the observed differences in genomic features correlate with reproductive mode rather than phylogenetic signal. Still, our results highlight a number of independent paradigms of genomic maladaptation in selfing Neurospora.


Comparing models of protein evolution revealed an elevated dN/dS ratio in branches leading to Neurospora selfing species (Fig. 1, Table 2) for a considerable fraction of the genome (one third of the 2789 studied orthologs). This is notable given that, for the sake of our comparisons, we grouped together selfing species from distant phylogenetic groups that represent distinct transitions to selfing (Nygren et al. 2011; Gioti et al. 2012). An indication that phylogenetic signal, namely the use of outcrossing species from the same clade, is unlikely to explain the observed dN/dS differences, comes from a previous study by Nygren et al. (2011), which included outcrossing species from a distinct clade and also reported a significantly elevated rate of evolution for seven genes in selfing lineages.

It is important to consider the potential contribution of sequencing errors and uncertainties related to the use of the dN/dS metric to these results. Our pipeline for assembly, whole-genome alignment, identification, and selection of orthologs for dN/dS comparisons ensured that regions of the Illumina-sequenced genomes with potentially higher sequencing error rate are excluded from consideration. The sequencing error rate for the 2789 orthologous genes ranges between 0.002% and 0.016% only. We confirmed that the collection of 2789 genes does not show sampling variance that would render the dN/dS to dS relationship negative, and thus dN/dS estimates unreliable (Wolf et al. 2009). The relationship between dN/dS and dS in our data is weakly positive (Fig. S1A). A positive dN/dS to dS relationship has been previously observed, for example, in mammals (Stoletzki and Eyre-Walker 2011) and was proposed to correlate with mutation rate (Wyckoff et al. 2005). However, the correlation is not significant, while it is outcrossing, and not selfing, species that show higher mutation rates in our data (Table 2). Therefore, we argue that mutation cannot explain the elevated dN/dS of selfing Neurospora.

Despite relying on realistic assumptions (Nachman 1998), the elevated dN/dS ratios are only a “proxy” for relaxed selection, as formal models for assessing neutrality signatures do not exist to our knowledge. Based on the strict positive selection test, we estimate that for 7% of tested genes (Table 2), elevated dN/dS can be explained primarily by positive selection. This percentage is likely overestimated, as the branch-site model cannot distinguish relaxed selective constraint from positive selection. Excluding these genes, we posit that the remaining genes showing accelerated protein evolution rates are evolving under neutrality. We interpret all the above results as signs of reduced efficiency of purifying selection in removing slightly deleterious mutations in selfing Neurospora.


Theory on TE dynamics predicts opposing trends on genomes of selfing species. Fewer TE copies are expected due to lack of outcrossing, which promotes transpositions, but an increase in TE abundance is expected due to relaxed negative selection, which removes TE insertions from the population (Charlesworth and Langley 1989). Our data indicate reduced spread of TEs in genomes of selfing species (Table 3, Fig. 2), although the exact TE content and the degree of reduction remain inexact measures, due to the nature of the compared data and the limited species-specific curated repeats available. We interpret these results as the influence of reduced transposition in all classes of TEs, which is consistent with the predicted long-term effects of selfing (Wright et al. 2008).

A more unexpected finding was that TEs in selfing species are less silenced by RIP (Table 3, Fig. 3). The percentages of TEs subjected to RIP were calculated per genome, thus, reduced RIP in selfing species indicates a potentially lower efficiency of this mechanism and not a reduction that is due to the decreased amount of target substrate sequences. In N. crassa, RIP has profoundly impacted the genomic landscape, with all TEs appearing inactivated in the OR74A reference genome (Galagan et al. 2003; Galagan and Selker 2004). It might be that our result reflects TE bursts and/or an increase in RIP efficiency that are specific to the outcrossing species studied here. One would need genome data from an outcrossing species that belongs to a distinct phylogenetic clade to explore these hypotheses. Because genome defense mechanisms, such as RIP, can protect against the deleterious effects of selfish DNA, we propose that less efficient RIP in selfing species could reflect reduced purifying selection.


One way to address the adaptive ability of a species at the genome level is to investigate the level of synonymous codon usage bias in highly expressed genes. Codon usage bias correlates with tRNA abundance (Ikemura 1982, 1985; Duret 2000) and with gene expression in a wide range of taxonomic groups (Sharp and Li 1987; Duret and Mouchiroud 1999; Cutter et al. 2006; Ingvarsson 2008; Qiu et al. 2011a). A correlation of codon usage bias with gene expression is also observed in Neurospora, considering the significant differences in Nc values of HEGs vs. all genes in both outcrossing N. crassa and selfing N. africana (Fig. 4). We observe a near complete identity of most frequent and optimal codons between the two species (Table S3), but find that the effective number of codons of N. africana HEGs (Fig. 4) and ribosomal genes is significantly higher compared to N. crassa, and the relative frequencies of optimal codons are lower (Table S4). It is worth pointing out that weak codon usage in N. africana is consistent with both a reduced adaptive ability and a relaxed negative selection for nonoptimal codons, and data on codon usage bias of the ancestor would be needed to disentangle between these possibilities.

Biases in synonymous codon usage may also result from mutational pressure (Osawa et al. 1988; Sharp et al. 1995), either biased mutation or biased gene conversion (gBGC), the latter mimicking selection (Marais 2003; Haudry et al. 2008). The higher GC3% of N. crassa and N. africana HEGs compared to related GCI values (Fig. S4) argues against this, similarly to previous studies (Whittle et al. 2011 and references within) and implies that GC3 does not evolve neutrally in Neurospora. Because optimal codons end in G or C (Table S4), the significantly higher GC3% of N. crassa HEGs compared to N. africana (Figs. 4, S4) would argue for higher selection on codon usage bias in the outcrossing species. However, this pattern seems independent of expression level (Fig. 4), suggesting that patterns of codon usage bias might be confounded by another factor driving nucleotide composition. Taken together, our data provide weak support for reduced selection for codons in the selfing N. africana compared to N. crassa. Because this was a pairwise comparison, it will be interesting to test the generality of our results once more transcriptome data from selfing and outcrossing species become available.


Besides being a confounding factor for molecular evolution studies (Marais 2003; Berglund et al. 2009), gBGC can also reflect differences in mating system. Simulations in A. thaliana showed that gBGC is inefficient in highly inbred species (Marais et al. 2004), but no significant correlation between gBGC and mating system was found in Triticeae plants (Escobar et al. 2010). The significantly higher GC% of selfing compared to outcrossing Neurospora species (Table 3, Figs. 2, 3) is surprising, because it is the opposite pattern of what one would expect under the influence of reduced effective recombination on gBGC. Cases where GC content is higher in outcrossers as expected through the action of gBGC are found only on third positions (N. africanaN. crassa and N. discreta comparisons; Fig. S2), where selection for codon usage may also interfere. In agreement with a role for gBGC, GC3 of degenerate codons is lower in N. africana compared to N. crassa for both HEGs and all genes (Fig. 4). Other neutrally evolving features, such as introns and TEs, show either no difference (GCIHEGs; Fig. S4) or higher GC% in selfing taxa (GCIall genes, GCTEs; Fig. S2). Because gBGC in theory affects all gene features, it seems unlikely from the present data that gBGC strongly affects GC content in Neurospora.

One explanation for our results is that gBGC may not strongly influence genomic base composition in fungi. In yeast, gBGC was proposed to represent a relatively weak force genome-wide (Harrison and Charlesworth 2011), whereas in Cryptococcus neoformans, no strong evidence for a correlation of gBGC with recombination was found (Pessia et al. 2012). A technical factor potentially related to the higher GC% in selfing species is the Illumina technology used for sequencing the genomes of these species, as opposed to Sanger used for N. crassa and 454 used for N. discreta and N. tetrasperma. Evidence for this artifact comes from the higher GC% of N. crassa Illumina reads compared to the finished genome; note however that the strains compared here are not identical. We interpret how this technical difference might affect our data with caution. Despite reports on a positive correlation of Illumina-sequencing coverage and GC% (Dohm et al. 2008; Minoche et al. 2011), we are not aware of a study that shows that this GC bias preferentially occurs in Illumina vs. Sanger and 454 technologies. In contrast, the fact that differences in GC% overall affect both coding and noncoding genomic elements (Fig. S2) indicates that a neutral process primarily drives base composition in Neurospora. We propose below that this process is mutation.


In contrast to protein evolution rates, the substitution rate is 2–3 times higher in outcrossing Neurospora species (Table 2). This finding could explain their lower GC%. An AT-bias among uncorrected mutations has been reported in yeast (Lynch et al. 2008), whereas many common types of damage cause AT-biased mutations (Wernegreen and Funk 2004 and references within). We would need to further explore this hypothesis by estimating the equilibrium GC in Neurospora, for example, by using the methods described in (Dutheil et al. 2012). RIP is an additional process that is increased in outcrossing species (Fig. 3) and contributes to base composition differences of TEs (Pearson's product–moment correlation for GC%TEs and RIP = −0.931, P = 0.001148). This is because RIP introduces C:G to T:A transition mutations (Selker 1990). Therefore, the high fraction of RIP-inactivated TEs in outcrossing species can explain their lower GC% (Table 3) and the considerable variation in TE base composition in these genomes (Fig. S2).

Assuming little/no selection at silent sites, nucleotide substitution rates approximate genomic mutation rate. Why is the mutation rate higher in outcrossing Neurospora? The most obvious explanation comes from the fact that these species, in contrast to selfing Neurospora, have an asexual reproduction pathway, which involves formation of structures called conidia through mitotic divisions (Springer 1993). Mutations arise when base misincorporations or insertion/deletions remain after proofreading by the replicating DNA polymerase. Therefore, it is plausible that an increased rate of mitotic divisions and thus, replication errors, during conidiation can contribute to the higher substitution rate of outcrossing species. Mitotic recombination can further cause point mutations, as in yeast (Strathern et al. 1995; Hicks et al. 2010). Alternatively, one could consider the contribution of higher effective meiotic recombination in outcrossing species. Indel-associated mutations depend on the level of heterozygosity, such that heterozygote indels could increase the point mutation rate at nearby nucleotides because of errors during meiosis (Tian et al. 2008). The observed differences in nucleotide substitution rates could thus reflect a reduced mutation rate in genomes of Neurospora selfing species as a consequence of low heterozygosity levels. However, this hypothesis relies on the assumption of a relatively high occurrence of indels, whereas whether meiotic recombination is mutagenic remains controversial (Webster and Hurst 2012). A final hypothesis to consider is that intrinsic mutation rate is lowered in selfing species; it was proposed that selective processes can modify genomic mutation rate (e.g., Kondrashov 1995; Dawson 1998).


Our study provides several lines of evidence for reduced strength of selection (elevation of dN/dS, reduced silencing of TEs by RIP, relaxed codon usage bias), in line with the prediction for maladaptation in selfing lineages. Along with a previous study on unidirectional shifts to self-fertility (Gioti et al. 2012), we overall confirm both postulates of the dead-end theory in a selfing fungus. Are then selfing Neurospora species reaching an evolutionary impass? The diversification of selfing lineages in Neurospora indicates that the origins of this mating system are not recent in the genus (Nygren et al. 2011). Ancient asexual lineages exist in the fungal and other kingdoms, providing examples of mechanisms that promote genetic diversity and thus, long-term persistence (Kuhn et al. 2001; Pouchkina-Stantcheva et al. 2007; Rice and Friberg 2007; Gladyshev et al. 2008; Boschetti et al. 2012). Our findings suggest two factors potentially counteracting the negative effects of selfing in Neurospora. One is the limited spread of TEs, which can protect the genomes of selfing species from their deleterious effects. A second factor is the absence of the conidiation pathway, which may offer genome-wide protection from a mutational load.

Estimating the theoretical potential for extinction of species relies on parameters that are currently unknown for selfing Neurospora, such as the effective population size. Differences in the ecology of Neurospora species may further contribute in understanding why obligate haploid selfing was favored and persisted in this genus. Our study implies that both sexual and asexual reproduction pathways affect genome evolution in a filamentous fungus. Therefore, assessing the meiotic and mitotic spore fitness and survival may prove very useful in understanding the history of reproductive systems (Nauta and Hoekstra 1992b). For example in yeast, it was proposed that heterozygosity among lineages correlates with a life-history trade-off that involves how readily the species switch from asexual to sexual reproduction when faced with nutrient stress (Magwene et al. 2011). Future population genomic studies are expected to shed light in the benefits and costs of adopting a selfing reproductive mode.


The authors thank B. Nabholz, J. Wolf, and S. Glémin for helpful discussions, M. Karlsson for help with the Bioanalyzer apparatus, S. Robb for sharing scripts on TE analyses, and D. Vanderpool for advice on phylogenetic reconstructions. Two anonymous reviewers are thanked for useful proposals on consolidation of our results. This work was supported by Carl Tryggers Stiftelse and Nilsson-Ehle foundations (to AG) and the Swedish Research Council (to HJ). Computational resources, including access to the Bioinformatics Core at UCR, were made available through initial complement funds to JES. The authors declare no conflict of interest.