Limited genomic consequences of mixed mating in the recently derived sister species pair, Collinsia concolor and Collinsia parryi



Highly selfing species often show reduced effective population sizes and reduced selection efficacy. Whether mixed mating species, which produce both self and outcross progeny, show similar patterns of diversity and selection remains less clear. Examination of patterns of molecular evolution and levels of diversity in species with mixed mating systems can be particularly useful for investigating the relative importance of linked selection and demographic effects on diversity and the efficacy of selection, as the effects of linked selection should be minimal in mixed mating populations, although severe bottlenecks tied to founder events could still be frequent. To begin to address this gap, we assembled and analysed the transcriptomes of individuals from a recently diverged mixed mating sister species pair in the self-compatible genus, Collinsia. The de novo assembly of 52 and 37 Mbp C. concolor and C. parryi transcriptomes resulted in ~40 000 and ~55 000 contigs, respectively, both with an average contig size ~945. We observed a high ratio of shared polymorphisms to fixed differences in the species pair and minimal differences between species in the ratio of synonymous to replacement substitutions or codon usage bias implying comparable effective population sizes throughout species divergence. Our results suggest that differences in effective population size and selection efficacy in mixed mating taxa shortly after their divergence may be minimal and are likely influenced by fluctuating mating systems and population sizes.


Understanding mating system transitions from outcrossing to selfing has been a long-standing goal in evolutionary biology, as mating systems affect key evolutionary processes including the distribution of genetic variation, rate of evolutionary change, and consequently speciation and extinction dynamics (Takebayashi & Morrell, 2001; Igic & Busch, 2013). Mating system shifts to high levels of selfing may decrease species' effective population size, Ne, promote the accumulation of deleterious mutations and hinder adaptation (Charlesworth & Wright, 2001; Takebayashi & Morrell, 2001; Charlesworth, 2003; Paland & Schmid, 2003; Glémin, 2007; Wright et al., 2013). However, the relative importance of different population genetic processes for reducing Ne and selection efficacy in highly selfing populations (hereafter selfers) remains unclear (Barrett et al., 2014).

Several processes can contribute to reductions in Ne and selection efficacy in highly selfing populations. First, high selfing rates alone can reduce Ne by up to half and raise linkage disequilibrium throughout the genome (Nordborg, 2000; Charlesworth, 2003, 2009; Marais et al., 2004; Glémin et al., 2006; Ness et al., 2010; Koelling et al., 2011). Increased linkage disequilibrium strengthens background selection, selective sweeps and Muller's ratchet, further diminishing diversity and Ne (Liu et al., 1999; Charlesworth, 2003, 2009, 2012; Cutter, 2006; Haddrill et al., 2007). Demographic processes, such as bottlenecks, experienced by selfers can also contribute to extinction risk through reductions in diversity and Ne. These bottlenecks can occur immediately following the transition to selfing, but their severity will depend on both the rate of the transition and the evolutionary processes driving it. Rapid shifts to selfing, often occurring through reproductive assurance (Foxe et al., 2009, 2010; Guo et al., 2009; Pettengill & Moeller, 2012), reduce population adaptive potential through severe genetic bottlenecks associated with founding events (Baker 1955; Schoen et al., 1996; Busch & Delph, 2011; Herman et al., 2012). In contrast, if selfing variants spread slowly, possibly through automatic selection or in response to selection that prevents heterospecific pollination, selfing populations may conserve more ancestral diversity (Schoen et al., 1996; Matallana et al., 2010; Busch et al., 2011). Following the shift to selfing, frequent colonization and extinction events resulting in high population turnover can further decrease Ne (Charlesworth & Pannell 2001; Ingvarsson, 2002) and make small selfing populations more vulnerable to extinction through demographic stochasticity (Lande, 1993; Wright et al., 2013). Demographic factors are then likely to affect extinction risk in selfers, but their importance relative to reductions in effective linked selection is not yet clear (Wright & Barrett, 2010; Wright et al., 2013). As background selection alone can severely reduce Ne in highly selfing populations even on short timescales, understanding the relative role of founder events, recurrent population bottlenecks and linked selection in reducing Ne remains an ongoing challenge (Brandvain et al., 2013; Barrett et al., 2014).

Genomic studies of mixed mating species may provide new insights about whether reduced diversity and selection efficacy mainly result from demographic factors accompanying the transition to higher rates of selfing from outcrossing or through the effects of linked selection (Wright et al., 2013). Although the intermediate outcrossing rates expressed by mixed maters should maintain sufficient recombination to prevent significant reductions in Ne and diversity through linked selection (Charlesworth et al., 1993; Charlesworth, 2012; Glémin & Galtier, 2012; Glémin & Ronfort, 2013; Wright et al., 2013), the evolution of higher selfing rates in mixed mating species may still somewhat reduce Ne and selection efficacy if speciation and selfing rate transitions involve strong species-wide bottlenecks or lead to frequent extinction and recolonization events from small numbers of founders (Glémin & Ronfort, 2013). Other demographic processes may also influence Ne in mixed mating species. As outcrossing rates often fluctuate across years and among individuals in mixed mating populations (Elle & Carney, 2003; Kalisz et al., 2004; Lankinen et al., 2007; Elle et al., 2010; Ruan et al., 2011; Carrió & Güemes, 2013), mixed maters could maintain diversity and Ne through periods of high outcrossing or moderate bottlenecks (Barrière & Félix, 2005). Like selfing, mixed mating may also increase census population sizes and population isolation and consequently among-deme differentiation and species-wide diversity (Charlesworth & Wright, 2001; Charlesworth, 2003; Pannell & Dorken, 2006; Wright & Andolfatto, 2008; Wright et al., 2013). Examining the population genomics of recently derived mixed mating species and comparing them to high selfers and outcrossers may help better characterize the relative role of demographic processes and linked selection during mating system evolution.

There are few studies addressing the impact of mixed mating on Ne. Although self-compatible taxa in the Solanaceae show elevated net extinction rates consistent with reductions in Ne (Igic et al., 2008; Goldberg et al., 2010), this signal may be driven primarily by highly selfing taxa (Wright & Barrett, 2010; Wright et al., 2013). Interestingly, a recent review that explicitly considered mixed mating species found no studies that clearly show mixed mating lineages evolving from selfing lineages or mixed mating lineages enduring longer than stochastic expectations (Igic & Busch, 2013). In contrast, the few empirical studies that examine relevant population genetic parameters in mixed maters fail to consistently demonstrate reduced Ne and diversity in these species, suggesting that mixed mating species may not often experience severe diversity loss and reduced selection efficiency due to higher selfing rates (Fenster & Ritland, 1992; Ranker, 1992; Hamrick & Godt, 1996; Sun, 1997; Ness et al., 2010; Barrett et al., 2014). However, many of these studies only examine a few genes and may not have sufficient power to detect subtler changes in selection efficacy and diversity throughout the genome. Population genomic analyses of mixed mating species are thus important to better understand the evolutionary dynamics and consequences of mating system evolution (Igic & Busch, 2013; Wright et al., 2013).

The mixed mating species of the annual genus Collinsia are an ideal system in which to study how mixed mating affects diversity and selection efficacy. Multiple recent speciation events involving transitions to higher selfing rates within the genus (Baldwin et al., 2011), as well as diverse outcrossing rates (Kalisz et al., 2004, 2012; Randle et al., 2009) provide a powerful system to examine the context and consequences of mixed mating. Collinsia species are bee-pollinated and are native throughout North America. Because all species are self-compatible, selfing rates are mainly dictated by quantitative floral traits, resulting in continuous variation in outcrossing rates (Armbruster et al., 2002; Kalisz et al., 2012). Collinsia species with high selfing rates show earlier stigmatic receptivity and are classified as prior selfing (sensu Lloyd, 1992), whereas species with low selfing rates exhibit later stigmatic receptivity that avoids autonomous selfing while outcrossing is possible; these latter species are classified as delayed selfing (sensu Lloyd, 1992; Armbruster et al., 2002; Lankinen et al., 2007; Kalisz et al., 2012). All Collinsia species have high rates of autonomous selfing in the absence of pollinators, and all tested species express low inbreeding depression (Kalisz, 1989; Mayer et al., 1996; Kalisz et al., 2004, 2012; Kennedy & Elle, 2008b). Hazzouri et al. (2013) recently found evidence of relaxed selection in the highly selfing C. rattanii compared to its recently diverged mixed mating sister species, C. linearis, that diverged approximately 1.45 MYA (Baldwin et al., 2011). Here, we begin to investigate whether a very recently diverged (< 0.2 MYA) mixed mating species pair has comparable differences in diversity and selection to that found by Hazzouri et al. (2013). We compare the transcriptomes of Collinsia parryi (outcrossing rate estimate 1.0, late stigmatic receptivity (Kalisz et al., 2012)) and C.  concolor (outcrossing rate estimate 0.51, early stigmatic receptivity (Kalisz et al., 2012)). Whereas these estimates imply a high rate of outcrossing in C. parryi and mixed mating in C. concolor, outcrossing rate estimates are based on a single population in a single year; thus, it is unclear whether they reflect long-term mating system evolution, particularly in species with a flexible mating system (Kalisz et al., 2012). Furthermore, as flower size in C. concolor is relatively large, in fact larger than C. parryi (Fig. 1), this system may reflect a particularly recent and dynamic mating system shift (Baldwin et al., 2011). We expect this more recently diverged mixed mating species pair to show more similar Ne than C. linearis and C. rattanii (Hazzouri et al., 2013) in the absence of strong speciation bottlenecks. As mixed mating and recent divergence likely precludes strong differences in background selection, strongly reduced diversity and selection efficacy in either species, particularly C. concolor which may have experienced a very recent increase in selfing rates, would suggest that demographic bottlenecks also occur following speciation in species with dynamic, mixed mating systems.

Figure 1.

Floral morphology of C. parryi and C. concolor.

To begin to investigate polymorphism and selection efficacy in this mixed mating species pair, we sequenced the transcriptome of one individual per species of C. concolor and C. parryi. We use divergence, diversity and codon bias measures to infer whether selection efficacy and Ne decreases with expected differences in selfing rate in these Collinsia species. If recent speciation in mixed mating lineages is associated with severe bottlenecks and a strong reduction in Ne, reduced purifying selection efficacy will increase the number of replacement relative to synonymous substitutions (dN/dS, ω) in the bottlenecked lineage causing it to fix more weakly deleterious substitutions (Charlesworth & Wright, 2001). Similarly, bottlenecked populations may show lower levels of codon bias. Codon bias occurs when particular codons are translated more accurately or at a higher level than other synonymous codons and experience weak positive and negative selection (Akashi, 2001; Cutter & Charlesworth, 2006; Hershberg & Petrov, 2008). Finally, species experiencing severe bottlenecks following speciation would show lower overall levels of silent and total diversity. In contrast, we find little support for these predictions, suggesting that mixed mating may allow Collinsia species to maintain high diversity and selection efficacy.

Materials and methods

Assembly and ortholog identification

Seeds of C. parryi and C. concolor were collected in the field in 2006 (C. parryi–San Bernadino Co, CA; N34.32440, W117.42252, Elev = 3760 m; C. concolor –Riverside Co, CA N33.61325, W116.93394). In 2012, the seeds were germinated in controlled environment chambers and raised to flowering in the University of Pittsburgh greenhouse complex. Fresh leaf tissue (~150 mg/individual) was collected prior to flowering and stored on ice until extraction. RNA was extracted from a single individual per species using the Qiagen mini RNA extraction kit and sent for Hi-Seq Illumina sequencing with one lane per sample at McGill University to produce 100-bp paired-end reads. We assembled the transcriptomes de novo using the Velvet–Oases (Zerbino & Birney 2008) assembler with kmer length of 57 and 65. As the 57 kmer assembly was higher quality, we used it for all subsequent analyses. We used EVOPIPES transpipes (Barker et al., 2010) to generate in-frame transcripts from the assembled contigs. Transpipes conducts BLASTX searches for each transcript against the nonredundant protein matrix and aligns them to the best-hit protein using the Genewise HMM algorithm to generate in-frame transcripts (Barker 2010"). We conducted reciprocal BLAST searches between in-frame C. concolor and C. parryi transcriptomes to identify orthologous genes through reciprocal best hits. We parsed reciprocal best hits using a custom-made Perl script.

Substitution analysis

We analysed the in-frame orthologs for evidence of relaxed selection in C. concolor or C. parryi. To test this, we included a second, more distantly related outcrossing and selfing Collinsia species pair (C. linearis and C. rattanii) in our substitution rate analysis to estimate the lineage-specific amount and direction of nucleotide changes in C. concolor and C. parryi. We obtained in-frame, assembled C. linearis and C. rattanii transcriptomes from Hazzouri et al. (2013). We identified C. linearis and C. rattanii transcripts orthologous to C. concolor by finding the reciprocal best BLAST hit between the in-frame C. concolor transcripts with C. parryi orthologs and the in-frame C. linearis and C. rattanii transcriptomes.

We ran PAML CODEML (Yang, 2007) using alignments of orthologous sequences of each transcript from each species generated using Emboss TranAlign (Rice, 2000), and a simple unrooted tree of the four species. As C. concolor transcripts were often shorter than transcripts from other species, we included sites with gaps in the CODEML analysis. We compared substitution ratios and log-likelihoods for all transcripts using three different models: (i) a model assuming a single ω for all branches, (ii) a model assuming independent ω for each branch and (iii) a model that assigned a uniform ω for all branches except C. concolor and C. parryi. We parsed CODEML results with a custom-made Perl script and compared models 1 and 2, and 1 and 3 using a log-likelihood test with the PAML chisq function.

To correct for multiple testing, we used the q-value package in R (Storey & Tibshirani, 2003) to convert the log-likelihood P-values to q-values. We conservatively estimated the proportion of transcripts where the null hypothesis of no difference between model fit is true based on the P-value distribution using the bootstrap method for estimation (πo). We then counted the number of transcripts that passed a 5% FDR cut-off based on q-values which showed a higher ω in C. concolor and C. parryi, and tested whether the counts differed using a sign test. We repeated this analysis using only the 5% highest and lowest expressed genes, the 20% longest transcripts, as well as genes that were differentially expressed between C. concolor and C. parryi. We filtered contigs with fewer than 50 synonymous substitutions and dS > 0.5 to eliminate extreme values likely due to problematic alignments when comparing dN/dS among species. We also used the nonparametric Kruskal–Wallis test to test for differences in ω ratios among species. We compared dN/dS between species in both significantly up-regulated and down-regulated genes using 1000 bootstrap replicates to generate 95% confidence intervals and P-values based on t-statistics. We also divided nondifferentially expressed transcripts into 10 equal expression-level categories based on their average expression and tested for differences in substitution ratios between species within each expression level using paired Wilcoxon tests with a Bonferroni correction, and among expression levels within each species using a Kruskal–Wallis test.

We used the program MAPP to estimate the effects of substitutions based on how they differ from a set of homologous residues (Stone & Sidow, 2005). MAPP uses the amino acids at orthologous sites to compute principal components from the properties of amino acids at each site. It then assigns each amino acid at each site a score based on its distance from the principal component origin to indicate the extent a residue's properties differ from those of other residues at that site, and categorically assigns it as either ‘good' or ‘bad' based on its probability of being deleterious (Stone & Sidow, 2005). We used the ten best hits resulting from BLAST search of all C. concolor and C. parryi orthologs to the eudicots to generate amino acid alignments with Muscle (Edgar, 2004). We constructed a tree out of each alignment using Semphy (Friedman et al., 2002) and used the alignment and corresponding tree to compute MAPP scores for each amino acid substitution between orthologous C. concolor and C. parryi transcripts. When assessing substitution effects in C. concolor and C. parryi, we included the orthologous C. linearis sequence in the alignment to possibly reflect the ancestral state in Collinsia. We then summed all ‘good' and ‘bad' substitutions for each transcript from each species using a custom-made Perl script and compared the total proportion of good and bad substitutions in each species pair using a z-test of proportions. Additionally, we summed substitution MAPP scores for each transcript for each species with another custom Perl script. We compared the MAPP score distributions by computing confidence intervals around the species means and a P-value based on t-statistics using 1000 bootstrap replicates for each species pair. This test and all subsequent analyses were carried out with SAS 9.4.

Read mapping, SNP calling and expression

We estimated nucleotide diversity within our C. concolor and C. parryi individuals. We mapped reads from both species to the assembled C. parryi transcriptome using BWA, Stampy and Picard tools (Li & Durbin, 2010; Lunter & Goodson, 2011; and then called SNPs with GATK. To remove contigs where paralogous genes mapped as variants, we also aligned reads from C. rattanii (highly selfing) and C. concolor (mixed mating) to the assembled C. concolor transcriptome using the same pipeline mentioned above and then removed any contigs that had shared heterozygous sites from the C. parryiC. concolor alignment. As shared polymorphisms between selfers should be extremely rare, they most likely reflect incorrectly mapped paralogs (as in Ness et al., 2011). We further filtered sites with a per sample depth below 20, a quality score below 30 and a quality to nonreference allele depth ratio below two. We then counted SNPs, fixed differences and sites with shared polymorphism using custom Perl scripts. We estimated heterozygosity for each species as the total number of unique SNPs relative to the total number of sites for each species. Note that because our diversity estimates are from single individuals, these estimates combine the effects of the history of inbreeding of a given sample with population-level diversity. Thus, the estimates should be considered minimum estimates of within-population diversity.

We also used the mapped reads to estimate expression level of each transcript and identify differentially expressed genes in each species using the Cufflinks package (Trapnell et al., 2012). We estimated expression in each species with Cufflinks and Cuffmerge using the read alignments generated above and used Cuffdiff to identify significantly differentially expressed transcripts between the two species. We used a general linear model to test for associations between log-transformed expression levels and MAPP score, ω, frequency of optimal codon use, GC content and transcript length.

Transcript properties

We estimated codon bias, GC content and transcript length using the program CodonW (Penden, 1997). CodonW calculates frequency of optimal codon use (FOP) as the number of times a preferred codon occurs in a transcript relative to the total number of synonymous codons in the sequence, and also outputs GC content, GC content at third position sites (GC3 content) and transcript length (Penden, 1997). We used the set of optimal codons identified by Hazzouri et al. (2013) based on tRNA frequencies in M. guttatus as codon preferences do not vary drastically between species (Wright et al., 2004).

We tested for differences in FOP between species using several approaches. First, we used 1000 bootstrap replicates to estimate confidence intervals for FOP in each species and calculate a P-value based on t-statistics. We repeated this analysis for GC and GC3 content. We also tested associations between FOP and species, expression level, GC content, GC3 content and transcript length with a generalized linear model based on a binomial distribution. Lastly, we tested for significant Spearman correlations among expression level, ω, MAPP score, FOP, transcript length, GC and GC3 content.


Assembly, ortholog identification and substitution analysis

Both C. concolor and C. parryi produced high-quality transcriptome assemblies (Fig. 2). The k = 57 assembled C. parryi transcriptome has an N50 of 1614 bp with 54 883 contigs, whereas the assembled C. concolor transcriptome has an N50 of 1540 bp and 39 991 contigs. Mean contig length was 954.2 for C. parryi and 943.0 for C. concolor, with a total transcriptome size of 52.4 and 37.7 Mb, respectively. Each transcriptome had less than 1030 ambiguous bases, and the shortest contig in both assemblies was 107 bp. Through reciprocal best BLAST hits of in-frame coding sequences, we identified 14 843 orthologs between the C. parryi and C. concolor transcriptomes, and 9391 and 9380 orthologs between C. concolor and C. linearis and between C. concolor and C. rattanii, respectively.

Figure 2.

Contig length histogram of each transcriptome assembly through Velvet and Oases using a maximum kmer of 57.

We included orthologous transcripts of a related selfing–outcrossing Collinsia species pair (C. linearis and C. rattanii; Hazzouri et al., 2013) in our analysis to improve substitution rate estimates, but most genes show similar substitution ratios in all four species. As C. linearis and C. rattanii belong to the ancestral Collinsia species clade in north-western California, they also help establish the direction of substitutions in C. concolor and C. parryi (Baldwin et al., 2011). Each species pair shared a similar distribution of synonymous substitutions, but higher synonymous substitution rates appeared more frequently in the earlier diverged C. linearisC. rattanii pair compared with C. concolor and C. parryi and the highest synonymous substitution rates occurred on the branch separating the two species pairs (Fig. S1, Table 1). A rough estimate of divergence time between the two species pairs based on the number of synonymous substitutions and assuming one generation per year and a mutation rate of 7 × 10−9 mutations per site per generation (Ossowski et al., 2010) suggests the species pairs diverged about 8.57 MYA, consistent with Baldwin et al.'s (2011) maximum age estimate of approximately 11.71 MYA.

Table 1. Median ω and mean dS for each species and their standard deviation from the mean generated through PAML assuming independent ω for each branch.
 Median dN/dSMean dSStandard deviation of dN/dS
C. concolor 0.130.028261.96
C. parryi 0.140.028261.56
C. linearis 0.140.022171.30
C. rattanii 0.110.024136.79

We found little evidence of differences in the strength of purifying selection between C. concolor and C. parryi from comparing nonsynonymous and synonymous substitution ratios among all species. Species showed similar median dN/dS ratios (Table 1) although dN/dS ranks differed significantly among species (Kruskal–Wallis inline image = 61.2, P > 0.0001). However, despite filtering for small and misaligned contigs (fewer than 50 synonymous sites, dS > 0.5), dN/dS distributions for all four species were somewhat bimodal with a high proportion (~7% for C. parryi and C. concolor, ~2% for C. linearis and C. rattanii) of loci showing extreme dN/dS ratios on certain branches (up to 999.00). These extreme values substantially increase the variance in dN/dS, especially in C. concolor and C. parryi, so comparisons of these distributions must be taken with caution. Such extreme ratios can result from loci with few or no synonymous substitutions but some replacement substitutions and often occur among very closely related species (Nickel et al., 2008). Contrasting models with a shared dN/dS ratio and distinct ratios among branches through log-likelihood tests are likely more informative.

Overall, log-likelihood tests failed to show that individual substitution ratios for each species fit the data better than uniform ω ratios (Fig. 3a). After correcting for multiple testing by estimating q-values, 74% of all loci were consistent with the null hypothesis of no species-specific differences in ω compared with the free ratio model. Additionally, 60% of the 20% longest contigs were consistent with the null hypothesis (Fig. 3b). Of these, only 172 contigs were significantly different from the null model at FDR < 5%; these tended to show significantly elevated dN/dS ratios in C. concolor compared with C. parryi (Kruskal–Wallis inline image = 27.13, P < 0.0001; Fig. S2). Similarly, all of the 5% most highly expressed genes failed to reject the null hypothesis. Although dN/dS ratios differed by expression level within each species (C. concolor: Kruskal–Wallis inline image = 20.89, P = 0.013; C. parryi Kruskal–Wallis inline image = 17.18, P = 0.046), they did not differ between species within each expression level (Fig. S3). Only 14% of all genes and 22% of the longest contigs supported a model that varied substitution ratios in only C. concolor and C. parryi, implying that most differences in dN/dS occurred between C. linearis and C. rattanii and between the two species pairs. A similar number of transcripts that passed a 5% FDR cut-off had a higher dN/dS ratio in C. concolor or C. parryi in both model 1 (single ratio) and model 2 (free ratio) comparisons. Additionally, although a marginally significant excess of transcripts had a higher ω in C. concolor compared with C. parryi when we compared model 1 to model 3 (different ratios only in C. concolor and C. parryi) in all transcripts, the difference was not significant for the 20% longest contigs (see Table S1).

Figure 3.

(a) Distribution of P-values from log-likelihood comparisons of PAML CODEML models assuming a single or independent dN/dS for each branch. (a) P-values for all filtered contigs. (b) P-values for the 20% longest contigs.

MAPP substitution effect estimation

MAPP assessment did not imply a clear, biologically meaningful difference in substitution effects between the recently derived sister pair. Collinsia concolor showed a slightly lower proportion of ‘good’ substitutions than C. parryi (C. concolor 0.79, C. parryi 0.80, Z = −5.99, P < 0.0001). However, this difference is likely too small to have real fitness consequences for either species, and significance mostly reflects the extreme power of this data set. Conversely, gene-wide MAPP scores, which are expected to reflect the overall deleterious effect of mutations, were significantly higher for C. parryi compared with C. concolor (C. concolor = 4.04 ± 0.12 bootstrap 95% CI, C. parryi = 4.54 ± 0.13 bootstrap 95% CI, t14873 = −5.25, P = 0).

Single nucleotide polymorphism analysis and heterozygosity

Mapping reads from C. concolor and C. parryi to the C. parryi reference transcriptome successfully mapped 92.85% of reads and yielded 7 739 579 single nucleotide polymorphisms (SNPs) after filtering, but produced heterozygosity results that contrasted with published mating system estimates. After removing contigs where paralogs could be mapping (50.16% of all sites), we found 0.39% of sites were SNPs unique to the C. concolor individual, whereas only 0.24% of sites were SNPs unique to the putatively more highly outcrossing C. parryi. Additionally, both species shared polymorphisms at 0.71% of sites and had fixed differences at 0.50% of sites (Fig. 4a). Collinsia concolor also shows nearly twice as many synonymous polymorphisms as C. parryi and both species show a high number of shared synonymous polymorphisms after removing loci with > 5% per site polymorphism in either species (Fig. 4b).

Figure 4.

(a) The total percentage of each class of polymorphic sites for each species after filtering and paralog removal. (b) Synonymous polymorphism counts for unique and shared SNPs and fixed differences between species.

Expression and codon usage

Expression analysis through Cufflinks identified only a small proportion of genes with significantly different expression between species, but a larger proportion of these show distinct substitution ratios between species. Only 294 genes had significantly different expression levels after correction for multiple testing using FDR of 5%. Of these, 216 transcripts were up-regulated in C. concolor, whereas 78 were up-regulated in C. parryi. Additionally, 64% of loci with significantly different expression levels between C. concolor and C. parryi were consistent with distinct ω ratios fitting the data significantly better. Genes significantly up-regulated in C. concolor had a slightly elevated dN/dS ratio in C. parryi compared with C. concolor, possibly reflecting slightly stronger purifying selection in C. concolor relative to C. parryi in these genes (t71.8 = −1.42, P = 0.17 after filtering small contigs and extreme dS values, Fig. S4). However, we found no significant differences in dN/dS among the genes up-regulated in C. parryi, likely due to the fact that only 16 such genes remained after filtering and the test had limited power (t15.3 = 1.80, P = 0.19). Among all transcripts, expression was also significantly associated with frequency of optimal codons (FOP), dN/dS ratio, length and GC content in both species (Table 2).

Table 2. anova results for log-transformed expression values from a general linear model.
SourceDegrees of freedomType III SSF-valueEstimateP-value
MAPP score/site17.913.30−0.0260.0007
GC content11363.60568.820.57<0.0001

The frequency of optimal codons did not differ significantly between species (t14954 = 0.47, bootstrap P = 0.254). Additional analysis of FOP further confirmed it is not associated with species but is associated with GC content, GC3 content, expression and length (Table 3). However, when considering only the 10% most highly expressed genes, C. parryi has a significantly higher proportion of optimal codons than C. concolor (t2217 = −3.06, P = 0.009). Additionally, GC and GC3 did not differ significantly between species (t14956 = 0.63, P = 0.85 and t14954 = 0.33, P = 0.88, respectively).

Table 3. Results of the generalized linear model on optimal codon use. These likelihood ratio statistics estimate the contribution of each factor to the model.
SourceDegrees of freedomF-valueχ2 valueP-value


In contrast with predictions based on outcrossing rate estimates, polymorphism and MAPP analyses suggest C. parryi experienced a slightly lower selection efficacy than C. concolor. Diversity in C. concolor significantly exceeded diversity in C. parryi, considering all polymorphic sites as well as only synonymous polymorphisms. Analysis of substitution effects also supports relaxed selection in C. parryi compared with C. concolor. Although most substitutions had low scores and C. concolor and C. parryi had similar proportions of good and bad substitutions and substitution effect distributions, substitutions in C. parryi were slightly but significantly more deleterious than in C. concolor (Fig. 3). Conversely, C. parryi showed slight but significant elevation of optimal codon usage in highly expressed genes, and, in some analyses, a slight but marginally significantly lower ratios of nonsynonymous to synonymous substitutions. Contrasting results among analyses and small differences between the species suggest that in the absence of strong demographic bottlenecks, mixed maters' labile mating system may produce variable signatures of reduced Ne in parameters reflecting different timescales.

Our result showing lower diversity and an elevation of deleterious amino acid changes using MAPP in C. parryi was unexpected based on differences in estimates of outcrossing rate, but in line with differences in floral morphology. It is likely common for outcrossing rates to contradict floral morphology in this genus as Collinsia's autonomous selfing ability and ample genetic variance in mating system traits may allow selfing rates to vary extensively with pollinator environment and potentially evolve rapidly through shifts in dichogamy (Kalisz & Vogler, 2003; Kalisz et al., 2004, 2012; Lankinen et al., 2007). For example, self-pollination rates in the field in the large-flowered species, Collinsia verna, C. grandiflora and C. linearis, closely coincide with pollinator abundance or population size (Kalisz & Vogler, 2003; Kalisz et al., 2004; Elle et al., 2010; A.M. Randle & S. Kalisz, unpublished data) resulting in population estimates that range from highly outcrossing to mixed mating across years. A history of variable selfing rates, with episodes of high selfing or mixed mating, is likely in part responsible for the observed low values of inbreeding depression quantified in selfed progeny of three large-flowered taxa, C. heterophylla, C. verna, and C. corymbosa, relative to outcrossed progeny (Kalisz, 1989; Mayer et al., 1996; Kalisz et al., 2004; Kalisz, S., unpub data). Finally, Hazzouri et al. (2013) also found evidence of an ongoing shift in mating system in northern C. linearis and C. rattanii where higher outcrossing rates may have evolved in primarily selfing C. rattanii. Their data emphasize that Collinsia's flexible mating system can produce substantial changes in selfing rates at short timescales and may even allow reversals from selfing to outcrossing. Taken together, our findings suggest that the smaller flowered C. parryi may have experienced elevated selfing rates in the past but recently evolved later stigmatic receptivity, delayed selfing and higher outcrossing rates.

Other factors can also explain these patterns, and the observed outcrossing rate estimates may still reflect overall mating trends in this pair. Firstly, slight differences in MAPP scores may not be biologically relevant. MAPP scores do not consider beneficial substitutions and our observed differences in scores may not be biologically significant. For example, Stone and Sidow (2005) found MAPP scores less than eight usually coincided with beneficial or intermediate fitness effects determined from mutagenesis studies of four genes, suggesting that substitutions in all species are generally mild. Furthermore, MAPP score per locus is only weakly correlated with gene expression (Spearman ρ14956 = −0.043, P < 0.001) further implying that high MAPP scores do not reflect severely degraded genes. Enhanced purging through historical selfing might also preserve more severely deleterious substitutions in C. parryi relative to C. concolor even if C. parryi is currently primarily outcrossing (Charlesworth, 1992; Glémin, 2007). Additionally, our heterozygosity estimates are based on a single individual from each species and are subject to sampling error. Recent episodes selfing and outcrossing in our samples' native populations could further confound our results, severely reducing or increasing polymorphism in each species, respectively. A larger transcriptome-wide sample from across both species' range would verify our heterozygosity and diversity estimates. Inaccuracies in our heterozygosity estimates could also arise from assembly, read mapping and SNP calling errors. Selfing rate, effective population size and heterozygosity estimates derived from wide population samples for this species pair would help validate our results.

Mixed mating limits the reduction in Ne

Although C. parryi may suffer reduced Ne through higher historical selfing rates or other demographic factors, polymorphism analysis suggests the reduction in diversity and Ne was mild. The loss of silent site diversity in C. parryi relative to C. concolor, although significant, was smaller than that in C. rattanii relative to C. linearis (proportion of per site diversity in the less diverse species relative to the more diverse species in each pair is 0.65 and 0.36, respectively (Hazzouri et al., 2013)). These results as well as a high proportion of shared polymorphism relative to fixed differences between the species imply the recently derived species pair maintained similar Ne throughout divergence. High heterozygosity in both species also suggests the pair likely experienced a gradual mating system transition that limited the loss of diversity, and that mixed mating maintains considerable heterozygosity even within individuals.

High Ne throughout divergence contrasts with the expectation of a strong bottleneck occurring when higher selfing variants spread quickly, often driven by reproductive assurance (Lloyd, 1992; Schoen et al., 1996; Charlesworth, 2003; Busch & Delph, 2011; Pettengill & Moeller, 2012). However, these results are consistent with previous studies which find similar levels of diversity in mixed mating relative to outcrossing species, and suggest mixed mating species may not often experience severe population bottlenecks as they transition to higher selfing levels (Fenster & Ritland, 1992; Busch et al., 2010; Foxe et al., 2010; Ness et al., 2010). Although higher selfing rates in Collinsia are favoured under ecological conditions of reproductive assurance (Kalisz & Vogler, 2003; Kalisz et al., 2004; Kennedy & Elle, 2008b), our results suggest that highly selfing variants may spread slowly and allow both species to largely maintain diversity and Ne. Additionally, the automatic transmission advantage of selfing variants may have contributed to their spread without lowering effective population size beyond the expected decrease due to selfing rate (Schoen et al., 1996). Given the lack of strong diversity reduction in recently derived mixed mating species, the action of linked selection, reduced gene flow and/or subsequent population bottlenecks may be the primary drivers of diversity reduction in highly selfing species, rather than severe founder events during their origins (Barrett et al., 2014).

Several different ecological circumstances could lead to the gradual evolution of higher levels of selfing in Collinsia through reproductive assurance. Intermediate levels of selfing can evolve when populations encounter increasingly variable pollination success due to factors such as low pollinator numbers with high population densities, variable climate or ephemeral habitats (Schoen & Brown, 1991; Barrett, 2002; Elle & Carney, 2003; Goodwillie et al., 2005; Morgan & Wilson, 2005; Elle et al., 2010). The transition to higher selfing rates in C. concolor in a variable pollination environment may have avoided strong population bottlenecks because variants promoting higher selfing rates would not be consistently advantageous (e.g. not advantageous in years with abundant pollinators, or at times when strong resource limitation diminishes fitness regardless of selfing ability). Higher genetic variance could persist throughout the evolution of earlier selfing in C. concolor because Collinsia species have substantial variation for floral traits influencing the timing of selfing (Kalisz & Vogler, 2003; Lankinen et al., 2007), and the ancestral outcrossing species can maintain high fitness through autonomous selfing in the absence of pollinators. Many individuals in a population may have reproduced successfully though selfing despite unreliable pollination. A slow transition to selfing possibly involving reproductive assurance through a variable pollinator environment or the automatic transmission advantage may explain high levels of diversity and shared polymorphism in C. concolor and C. parryi.

Substitution comparisons between C. concolor and C. parryi transcriptomes also suggest that Ne and selection efficacy is similar enough in these species to prevent an excess of deleterious alleles fixing in either species regardless of expression level, although the species pair's recent divergence strongly reduces the test's power. Like many previous studies, our substitution ratio analysis suggests that throughout most of the transcriptome, ω does not differ between C. concolor and C. parryi (Wright et al., 2002; Haudry et al., 2008; Wright & Andolfatto, 2008; Escobar et al., 2010). Whereas a proportion of contigs support a free ratio model over a uniform ratio model, only 14% of contigs favour different ω specifically between C. concolor and C. parryi. Few significant differences in ω between species may occur if they are too recently derived for their fixed substitutions to show relaxed purifying selection, especially if many variants experience weak selection (Charlesworth & Wright, 2001; Wright et al., 2002, 2013; Glémin, 2007; Escobar et al., 2010). Tests of divergence ratios were then likely not powerful enough to distinguish changes in selection efficacy in our recently diverged species pair, especially if they were subtle. Larger-scale polymorphism analysis would enable a more direct quantification of the contemporary strength of selection in the two taxa.

Positive selection may also fix replacement substitutions more effectively in outcrossers than selfers and obscure signals of relaxed purifying selection in selfers from substitution rate comparisons (Glémin, 2007). However, whether partial selfers fix beneficial mutations less effectively than outcrossers will depend on the dominance coefficients of new beneficial mutations, which are not well known (Charlesworth, 1992; Haudry et al., 2008; Slotte et al., 2010; Glémin & Ronfort, 2013; Wright et al., 2013). Additionally, ω did not differ between species even when considering only the most highly expressed genes that show lower substitution ratios and should be under the strongest evolutionary constraint (Akashi, 2001; Drummond et al., 2005). The effect of enhanced positive selection in our comparisons is probably subtle and our results likely reflect true similarities in ω across the C. parryi and C. concolor transcriptomes caused by comparable effective population sizes and recent divergence.

Optimal codon use, which is often sensitive to slight reductions in selection efficacy despite recent divergence, also indicates similar Ne in C. parryi and C. concolor. Although recent divergence can lead to inconsistent differences in codon bias between partial selfers and outcrossers (Akashi, 2001; Wright et al., 2002; Haudry et al., 2008; Hershberg & Petrov, 2008; Ness et al., 2012), it cannot fully explain our findings. Reduced codon bias was apparent in the highly selfing C. rubella relative to the outcrossing C. grandiflora despite their extremely recent divergence (Guo et al., 2009; Qiu et al., 2011), as well as within recently selfing and outcrossing populations of E. paniculata (Ness et al., 2012), suggesting slight changes in selection efficacy are detectable despite recent divergence. Additionally, although higher biased gene conversion in outcrossing relative to partial selfing species can confound measures of optimal codon use, C. concolor and C. parryi did not differ significantly in GC or GC3 content. Changes in selection efficacy, if present, were likely not substantial enough to result in differences in fixation rates between the species.

A history of mixed mating in both species might produce similar Ne in C. concolor and C. parryi, leading to similar levels of selection efficacy. Past selfing events could lower a species' overall Ne and contribute to similar Ne in both species (Charlesworth, 2003, 2009; Busch et al., 2010). Episodes of high outcrossing could also contribute to similar selection efficacy in both species as Ne only diminishes substantially with extreme selfing rates under models of linked selection, and even low levels of recombination can greatly reduce the linkage disequilibrium across the genome that contributes to mutation accumulation (Charlesworth et al., 1993; Glémin & Ronfort, 2013). Dynamic selfing rates in C. concolor and C. parryi are then probably important in explaining why we found fainter signals of reduced selection efficacy and diversity in C. concolor relative to C. parryi, compared with C. rattanii (outcrossing rate = 0.12) and C. linearis (outcrossing rate = 0.57) (Hazzouri et al., 2013). Further, selection on floral size and developmental traits may be occurring in C. parryi, but be unrelated to mating system in this species (reviewed in Strauss & Whittall, 2006). For example, smaller flower size and rapid floral development can be favoured in abiotically stressful environments (e.g. Galen, 2000; Mazer et al., 2010), but would be expected have little effect on Ne when mating system is not the target of selection. Large and widespread population samples that could yield estimates of linkage disequilibrium, Ne, and population history from each species could help further clarify the dynamics of mating system change in space and time, and how these factors influence selection efficacy in C. concolor and C. parryi. Additionally, similar comparisons of Ne from other Collinsia species pairs with different selfing rates would be useful for understanding more globally how Ne and selection efficacy changes with selfing rates.

Comparing differences in selection efficacy in C. concolor and C. parryi with C. rattanii and C. linearis highlights the importance of demographic processes and mating system lability in determining Ne. In contrast to our inference of similar selection efficacy in C. concolor and C. parryi, C. rattanii shows a significantly elevated rate of replacement to synonymous polymorphisms relative to C. linearis, indicative of a reduction in selection efficacy (Hazzouri et al., 2013). However, we found similar dN/dS ratios even in this latter pair throughout most of the transcriptome and a higher dN/dS ratio in C. linearis than C. rattanii for some contigs (Fig. S2). As C. rattanii and C. linearis also show the genus' typical mating system flexibility (Hazzouri et al., 2013), polymorphism ratios are likely more powerful for detecting recent reductions in selection efficacy than substitution ratios and codon usage as variable selfing rates may diminish the signal of effective population size reductions over longer timescales. C. rattanii may show evidence of a reduction in selection efficacy through polymorphism analysis as, unlike C. concolor and C. parryi, it may have experienced a recent bottleneck (Hazzouri et al., 2013). Further polymorphism-based analyses should help clarify the relative impact of recent demographic processes on selection efficacy in mixed maters of varying age.


Although patterns of diversity and substitution effects imply a reduction in selection efficacy in C. parryi, analysis of fixed substitutions and codon bias suggest there was only a mild loss of diversity and Ne, and thus, a history of mixed mating probably does not substantially reduce effective population size and selection efficacy in either species. Severe founder events may be necessary to sufficiently reduce Ne to increase extinction risk in mixed mating species otherwise, a meaningful drop in Ne may not occur shortly after divergence. These results emphasize the importance of demographic events in reducing selection efficacy when selfing rates and linked selection remain moderate, and understanding the extent to which such events are prevalent will require larger comparative data sets of species spanning a range of outcrossing rates.


We would like to thank Wei Wang for support with transcriptome assembly and analysis. The work was supported by a Natural Sciences and Engineering Research Council (NSERC) grant to SIW, and National Science Foundation awards NSF DEB 0709638 and 0324764 to SK.