Evolution under tight linkage to mating type


Author for correspondence: Marcy K. Uyenoyama Tel: +1 919 660 7350 Fax: +1 919 660 7293 Email: marcy@duke.edu


Recent large-scale sequencing studies of mating type loci in a number of organisms offer insight into the origin and evolution of these genomic regions. Extensive tracts containing genes with a wide diversity of functions typically cosegregate with mating type. Cases in which mating type determination entails complementarity between distinct transcription units may descend from systems in which close physical linkage facilitated the coordinated expression and cosegregation of the interacting genes. In response to the particular selection pressures associated with the maintenance of more than one mating type, this nucleus of low recombination may expand over evolutionary time, engulfing neighboring tracts bearing genes with no direct role in reproduction. This scenario is consistent with the present-day structure of some mating type loci, including regulators of homomorphic self-incompatibility in angiosperms (S-loci). Recombination suppression and enforced S-locus heterozygosity accelerate the accumulation of genetic load and promote genetic associations between S-alleles and degenerating genes in cosegregating tracts. This S-allele-specific load may influence the evolution of self-incompatibility systems.


In many organisms, sexual reproduction occurs only between distinct mating types. Dimorphic sex chromosomes define two mating types in most mammals, with dozens to hundreds of alleles determining hundreds to thousands of mating types in some plants and fungi. Recent information on the genomic structure of mating type regions, notably the human Y chromosome, has contributed to an emerging paradigm of mating type evolution. Here I address the origin and evolution of mating type regions, with particular consideration of the determinant of mating type in angiosperm self-incompatibility (SI) systems (S-locus). The various processes involved in the evolution of mating type proceed in parallel among the multiple mating types segregating in SI systems.

Complementary interactions between determinants of mating type

In a number of systems, mating compatibility or determination of mating type requires complementarity between distinct genes. Close physical linkage between the key determinants may be essential to their cosegregation and epistatic interaction, especially at the inception of the system, before the evolution of more refined means of regulation. This section illustrates the theme of physical linkage between distinct transcription units that display complementary interactions in the determination of mating type.

Sex determination in Silene

Flower development in most angiosperms reflects the ancestral and still predominant state of hermaphroditism. Dioecy derives from the coordinated developmental arrest or degeneration of the characteristics of one sex and expression of those of the other sex (Dellaporta & Calderon-Urrea, 1993). In dioecious forms of Silene latifolia, dimorphic chromosomes determine sex. Radiation-induced deletions of Y-linked factors with stamen-promoting function (SPF) give rise to asexual flowers, while deletions of those with gynoecium-suppressing function (GSF) give rise to hermaphroditic flowers (Farbos et al., 1999; Lardon et al., 1999).

Expression of GSF loci in females (XX) would impair fertility. This selective pressure would favor genetic modifiers throughout the genome that enhance linkage of GSF loci to male-determining factors on the Y chromosome (Nei, 1969). Alternatively, only those systems in which GSF factors initially arose in close physical proximity to male determinants may have given rise to sex chromosomes (Charlesworth, 1991). Most known GSF factors do in fact show Y-linkage, although at least one maps to an autosome (Lardon et al., 1999). Whether evolved or ancestral, tightly linked GSF and male-determining factors may have served as the core of the proto-Y chromosome (Charlesworth & Guttman, 1999).

Model systems of SI

Self-incompatible flowering plants exclude from fertilization pollen that expresses mating specificities in common with the pistil. In sporophytic SI (SSI) systems, pollen specificity is determined sporophytically by the S-locus genotype of the pollen parent. In gametophytic SI (GSI) systems, pollen specificity is determined gametophytically by the S-allele borne by the individual pollen grain or tube itself. In model systems of SSI and GSI, distinct genes determine mating specificity in pollen and pistil, and these genes show complete linkage to one another.

In the SSI system expressed in Brassica and close relatives, the determinant of pistil specificity SRK (Takasaki et al., 2000) encodes a serine/threonine protein kinase that spans the plasma membrane of papillar cells at the stigmatic surface. The extracellular receptor domain of SRK interacts with the anther-expressed determinant of pollen specificity SP11/SCR, a cysteine-rich protein predicted to reside in the pollen coat (Schopfer et al., 1999; Takayama et al., 2000). S-allele-specific recognition between SRK and SP11/SCR at the stigmatic surface may induce a nonspecific interaction between the intracellular kinase domain of SRK and MLPK, a receptor-like cytoplasmic kinase encoded by a gene unlinked to the S-locus (Goring & Walker, 2004; Murase et al., 2004). These interactions may mediate SI rejection by preventing hydration and germination of incompatible pollen grains (Kemp & Doughty, 2003).

The S-RNase-based systems of GSI expressed in three distantly related plant families, including the Solanaceae, appear to descend from a common origin (Igic & Kohn, 2001; Steinbachs & Holsinger, 2002). S-RNase, the determinant of pistil specificity, encodes a functional ribonuclease (Lee et al., 1994; Murfett et al., 1994). In the stylar transmitting tract, S-RNases enter both compatible and incompatible pollen tubes, but inhibit the growth of only incompatible pollen tubes (Luu et al., 2000). Sijacic et al. (2004) demonstrated that PiSLF determines pollen specificity in Petunia inflata. Roalson & McCubbin (2003) have proposed that, within pollen tubes, the F-box protein product of PiSLF might target for degradation through the ubiquitin ligase pathway all S-RNases, irrespective of SI specificity, but that highly specific interactions preserve activity of incompatible S-RNases alone.

In both model systems, complementary expression of pollen and pistil specificity is essential to self-incompatibility. Each functional S-allele must comprise genes that determine the same (incompatible) pollen and pistil specificities. Because all zygotes formed under complete expression of SI are heterozygous at both loci, a crossover event would permit self-fertilization by generating an S-allele that determines pistil rejection of a pollen specificity different from the one it encodes. Further, the generation of recombinant S-alleles would destabilize SI systems by permitting nonreciprocal pollen exchanges in which an advantage in pollen transmission accrues to the less accepting partner (Uyenoyama et al., 2001).

Fungal mating types

In the mushrooms Coprinus cinereus and Schizophyllum commune, sexual reproduction requires highly specific complementary interactions between mates (reviewed by Kües, 2000; Brown & Casselton, 2001). Locus A comprises one or more sets of divergently transcribed gene pairs encoding homeodomain proteins HD1 and HD2. HD1/HD2 heterodimers act as a transcription factor that regulates dikaryon-specific cellular processes. Locus B also comprises multiple subloci, each of which encodes a pheromone receptor and one or more pheromones. Recognition between a pheromone and its receptor regulates stages of dikaryon formation different from those affected by locus A (Kües, 2000).

For both A and B, mating requires interactions between proteins produced by different partners. For example, the HD1 and HD2 genes within a given set encode the same specificity, with active heterodimers forming only between proteins of different specificities. Mating compatibility requires formation of at least one HD1/HD2 heterodimer and recognition between at least one pheromone and its pheromone receptor. As is the case in the model systems of SSI and GSI in flowering plants, recombination is suppressed between genes within a given subcluster of locus A or B. Crossing over within a subcluster that conferred compatibility would presumably permit the inappropriate initiation of mating-specific cell processes in the recombinant.

In tetrapolar species, including the corn smut Ustilago maydis, A and B freely recombine. The closely related barley smut Ustilago hordei has a bipolar mating system in which the single mating type locus spans a half-megabase region containing both the homeodomain and the pheromone and pheromone receptor gene sets (Lee et al., 1999). In serologically defined forms of the bipolar yeast Cryptococcus neoformans, an agent of serious respiratory disease in humans, the large (>100 kb) mating locus encodes pheromones, pheromone receptors, and one homeodomain protein (Lengeler et al., 2002).

Maintenance of complementarity among alleles of A and B entails suppression of recombination within subclusters, but not necessarily between loci. Even so, close linkage between A and B may be advantageous. For example, physical proximity of related loci may promote their coordinated expression (Bakkeren & Kronstad, 1994). Kües (2000) noted the formation of barrages, slow-growth zones indicative of antagonistic interactions, between C. cinereus mycelia that were compatible at A but not B. Just as low fitness of intersexes promotes the evolutionary tightening of linkage between sex determinants (Nei, 1969), fusion of A and B into a single mating type locus may be adaptive under conditions in which semicompatibility is disadvantageous.

Other systems

Whether the origins of the mammalian Y chromosome lie in a tightly linked set of interacting loci remains speculative. By altering chromatin structure, the primary male-determinant SRY influences the transcription of numerous Y-linked genes (reviewed by Brennan & Capel, 2004). However, whether an ancestral SRY regulated the expression of linked core constituents of the proto-Y is unclear. For example, SRY may represent an evolutionary refinement, favored as a more effective means of coordinating expression of interacting genes than physical proximity alone. In particular, the mutual regulation of SRY and SOX9 and the ability of SOX9 to replace all functions of SRY (Chaboissier et al., 2004) suggest that SRY may not necessarily have served as the male determinant throughout the history of the Y chromosome.

An exception to the theme of interaction between linked mating type loci is the sex determination system of the honey bee (Apis mellifera), in which heterozygosity at a single transcriptional unit (csd) determines female development, and haploidy or homozygosity male development (Beye et al., 2003). Although CSD shows sequence homology with the SR protein family, a regulator of mRNA splicing, CSD lacks an RNA-binding domain. Beye et al. (2003) proposed that in heterozygotes, CSD heterodimers induce female development upon binding with another SR family protein that does contain an RNA-binding domain, with male development occurring in the absence of heterodimer formation, in haploids or homozygotes. Because the sex-determining interaction occurs between alleles of a single transcription unit, recombination need not be suppressed to ensure complementarity. Accordingly, recombination in the vicinity of the sex-determination locus occurs at a rate (44 kb cm1) which, although unusually high among eukaryotes, is comparable to that in the rest of the honey bee genome (Beye et al., 1999).

Mating type regions grow over evolutionary time

Large genomic blocks typically cosegregate with mating type (Table 1; Ferris et al., 1997; Hiscock & Kües, 1999). In this section I propose that such mating type regions may have expanded over evolutionary time from a tightly linked cluster of a few interacting loci, perhaps through transposon-induced rearrangements.

Table 1.  Characteristics of mating type regions
OrganismRegionCosegregating region (mb)Gene density (per 100 kb)
 HumanEuchromatic MSY23<1a
 PapayaMSY 4–5 8.6b
Chlamydomonas reinhardtiimt+/mt 1 3–4c
Cryptococcus neoformansMata/Matα 0.120d
BrassicaSRK-SP11/SCR SSI≈0.3e15f
PetuniaS-RNase-based GSI>4g 1h

Cosegregation of many loci with diverse functions

Comprehensive determinations of the number, expression pattern, and possible function of genes embedded in or closely linked to the mating type region are available for a number of systems. Many expressed genes of diverse functions, including essential metabolism and others not clearly related to reproduction, cosegregate with mating type.

Skaletsky et al. (2003) detected 156 transcription units within the 23-mb euchromatic component of the human male-specific Y (MSY) region. Over half of the 27 distinct proteins or protein families encoded within the MSY derive from X-degenerate genes, those that have their closest homologues on the X chromosome. Most of the remaining protein families derive from ampliconic segments, including large palindromes on the Y; their closest homologues occur throughout the genome. While all ampliconic genes show testis-specific expression, only one member of the X-degenerate class (the male determinant SRY itself) is expressed predominantly in testis. Table 1 indicates a low gene density within the human euchromatic MSY region compared with other mating type regions, perhaps reflecting a greater degree of silencing on the human Y, which began its divergence from X perhaps 300 million yr (MY) ago (Lahn & Page, 1999).

In the unicellular green alga Chlamydomonas reinhardtii, multiple chromosomal rearrangements have occurred in the domain in which the determinants of the alternative mating types mt+ and mt reside. A comprehensive analysis of transcription units across the region of recombination suppression has revealed genes expressed in vegetative as well as reproductive parts of the life cycle, including some (e.g. the TCA cycle enzyme pyruvate dehydrogenase kinase) which encode proteins essential to basic metabolism (Ferris et al., 2002).

Lengeler et al. (2002) defined the MAT locus of the pathogenic yeast C. neoformans as the chromosomal segment that shows considerable structural differences between mating types. In addition to essential determinants of mating type (encoding a homeodomain protein, pheromones and pheromone receptors), this region contains genes with no obvious role in sexual reproduction. About 20 expressed genes appear absolutely linked to MAT, with a roughly comparable density in immediately flanking colinear regions.

Any crossing over that might occur within the structurally diverse region of the S-locus would probably generate only unbalanced recombinants. Even so, some form of genetic exchange, perhaps mediated by gene conversion, appears to continue among pistil specificities (Kusaba et al., 1997; Awadalla & Charlesworth, 1999; Wang et al., 2001; Sato et al., 2002; Vieira et al., 2003). The S-locus of Brassica rapa appears to be gene-rich (Table 1; Suzuki et al., 1999; Fukai et al., 2003). As is the case for other mating type regions, transcription units under apparent absolute linkage to SRK and SP11/SCR show homology to known genes of diverse function, including catalysis or regulation of transcription or translation (Fukai et al., 2003).

Ongoing processes of chromosomal restructuring

A comparison of synonymous differences between X- and Y-linked homologues revealed multiple strata within the human MSY, suggesting the progressive cessation of recombination (Lahn & Page, 1999; Skaletsky et al., 2003). Iwase et al. (2003) invoked at least 10 inversions in the Y chromosome to account for differences between the sex chromosomes in the relative order of 11 subregions, extending from the short arm of the X from ZFX to one boundary of the pseudoautosomal region. Their analysis of amelogenin (AMEL) sequences from five primates and two other mammals revealed differences in phylogenetic history of two parts of AMEL, indicating that the pseudoautosomal boundary once lay within it. Marais & Galtier (2003) detected perhaps five distinct phylogenetic histories within AMEL, suggesting even more differentiation events within the region. These analyses indicate that multiple chromosomal restructuring events, arising independently in the various mammalian lineages in the 27–70 MY since their divergence from one another, have mediated the progressive suppression of recombination across the human Y chromosome.

In plants, the Y chromosome of papaya (Carica papaya) comprises a mosaic of subregions showing varying levels of sequence similarity to homologues on the X (Liu et al., 2004; Ma et al., 2004). In S. latifolia, silent divergence between the X- and Y-linked homologues differed between two loci, suggesting their linkage to the male-determining core in separate events (Atanassov et al., 2001).

Structural differences distinguish the alternative mating types Mata and Matα within serologically defined forms of the pathogenic yeast C. neoformans. Further, just as rearrangements of the Y have proceeded independently among mammalian lineages, gene order within a single mating type differs between serotypes A and D (varieties grubii and neoformans, respectively), which diverged perhaps 18.5 MY ago (Xu et al., 2000).

Model systems for both SRK-SP11/SCR SSI and S-RNase-based GSI show evidence of much chromosomal restructuring among S-alleles. Every functionally distinct S-haplotype that has been described has been found to be structurally distinct as well (Nasrallah, 2000; Fukai et al., 2003; Wang et al., 2003). Even S-haplotypes that determine the same specificity or very similar specificities in Brassica oleracea and B. rapa, close relatives which diverged perhaps 5 MY ago, show rearrangements (Kimura et al., 2002).

Accumulation of repetitive elements

The high density of Y-linked repetitive sequences, from long palindromes to short transposons, presented a major challenge to the construction of the high-resolution map of the human MSY (Skaletsky et al., 2003). Interspersed repeats constitute a lower percentage of euchromatic regions throughout the genome than in the euchromatic MSY, especially in X-transposed genes. Similarly, the mating type regions of Chlamydomonas (Ferris et al., 2002) and C. neoformans (Lengeler et al., 2002) exhibit many repetitive elements, transposons and remnants of transposons.

In Drosophila, in which the absence of recombination in males ensures complete cosegregation of any region physically linked to the male determinant, transposable element (TE) insertions and various deleterious changes accumulate more rapidly in neo-Y than in neo-X regions (Bachtrog, 2003).

Compared with a sample of the papaya genome, the papaya Y chromosome showed a higher density of repetitive elements, including a nearly threefold increase in inverted repeats (Liu et al., 2004). In Petunia, which expresses S-RNase-based GSI, Wang et al. (2004) determined the complete sequence of a 328-kb region surrounding S-RNase. They reported a fivefold increase in density of highly repetitive sequences in this S-locus region compared with the tomato genome.

Transposable elements tend to accumulate in regions of suppressed recombination (Charlesworth & Langley, 1989). If excision events only rarely permit recovery (after loss by genetic drift) of nonrecombining haplotypes bearing the fewest TEs, neutral or nearly neutral elements may increase in frequency or become fixed through the operation of Muller's ratchet. Further, interference between selective pressures on advantageous and deleterious factors within the cosegregating region increases the probability of fixation of deleterious TEs (Hill & Robertson, 1966; McVean & Charlesworth, 2000). Sheltering by X-linked homologues of deleterious Y-linked factors promotes their accumulation. However, both mechanisms of accumulation can operate even in the absence of sheltering, in haploids as well as diploids (Gordo & Charlesworth, 2000).

Steinemann et al. (1993) demonstrated that TEs can silence adjacent protein-encoding genes, even without disrupting the coding or promoter regions. They proposed that silencing by TEs represents a major or even necessary stage in the degeneration of Y chromosomes. However, if silencing of essential genes incurs substantial penalties in fitness, the maintenance or fixation of such Y haplotypes would be unlikely.

Alternatively, a major consequence of the accumulation of neutral or nearly neutral repetitive elements within mating type regions may be the expansion of the region under recombination suppression through transposon-mediated chromosomal rearrangements. Transposons can induce simple to very complex structural changes (Lim & Simmons, 1994; Gray, 2000). Casals et al. (2003) enumerated a number of studies establishing an association between TEs and chromosomal breakpoints in Drosophila. Rearrangements within the region of recombination suppression are unlikely to impair fertility because of the production of unbalanced recombinants in structural heterozygotes. Those that avoid major disruptions of gene expression may accumulate within the mating type locus. Large structural differences between homologous chromosomes may interfere with synapsis and crossing over, not only in the rearranged segment, but across colinear flanking regions as well. In C. reinhardtii structural differences in a central 200-kb domain distinguish mt+ and mt mating types, but recombination suppression extends over a tract nearly a megabase in length, including colinear flanking regions (Ferris & Goodenough, 1994; Ferris et al., 2002). Through the accumulation of nearly neutral TEs and transposon-mediated rearrangements, the genomic region under the thrall of the mating type locus may expand over the course of evolutionary time.

Evolutionary consequences of linkage to the S-locus

Comparative studies of various mammals have revealed the independent restructuring of Y chromosomes (see review and supporting material in Iwase et al., 2003). Chromosomal rearrangements and degenerative processes also proceed in parallel in the numerous S-alleles that typically segregate in SI systems. Segregating S-allele lineages appear to have diverged from one another in excess of 30 MY ago in the solanaceous S-RNase-based GSI system (Ioerger et al., 1990), and on a comparable timescale in the Brassica SSI system (Uyenoyama, 1995). Genomic regions flanking different S-alleles may undergo considerable degeneration nearly independently of one another. The consequent S-allele-specific load may bear implications for mating-system evolution.

Degenerative processes

Low effective population size promotes the operation of degenerative processes. Cosegregation with mating type alleles held in polymorphism by balancing selection in effect subdivides the population of linked sites into multiple backgrounds (Barton, 1995; Barton & Navarro, 2002). For Y-linked genes this subdivision reduces the effective population size fourfold relative to autosomal genes; for the S-locus the segregation of dozens or even hundreds of mating type specificities (Lawrence, 2000) intensifies the effect.

Suppression of recombination accelerates the accumulation of genetic load by reducing the effectiveness of selection opposing deleterious mutations and favoring advantageous mutations (Hill & Robertson, 1966; McVean & Charlesworth, 2000; Glémin et al., 2001). Further, negative linkage disequilibrium evolves among deleterious mutations in populations of finite size (Hill & Robertson, 1968). Under the enforcement by SI expression of S-locus heterozygosity, negative linkage disequilibrium suggests S-allele-specific load: the association of S-alleles with distinct arrays of deleterious factors.

S-allele-specific load

Deterministic analyses indicate that SI expression affects linkage disequilibrium with the S-locus or heterozygosity at linked sites only under exceedingly tight linkage (Strobeck, 1980, 1983; Leach et al., 1986). Recent empirical studies have revealed the virtually complete cosegregation with the S-locus of numerous expressed genes (Table 1).

Wang et al. (2003) suggested that genomic tracts comprising at least 4.4 mb cosegregate with the S-locus in the S-RNase-based GSI system of Petunia. Takebayashi et al. (2004) obtained a maximum-likelihood estimate of perhaps 10−4 cm between S-RNase and a marker (48A) for which examination of hundreds of meiotic products had failed to detect any recombinants in Nicotiana alata (Li et al., 2000). This estimate reflects a >17-fold deficiency of segregating neutral variation at 48A relative to S-RNase (Takebayashi et al., 2003). Using the figure, suggested by Wang et al. (2003) for Petunia, of at least 17.6 mb cm−1 in the S-locus region (20-fold higher than the genomic average), our estimate would suggest a separation between 48A and S-RNase of at least 1.8 kb (and possibly much more). While the function of 48A is unknown, Takebayashi et al. (2003) speculated that its unusual base composition, reflecting a high frequency of charged amino acids, and restriction of its expression to developing pollen, suggest a role in the hydration of pollen grains.

Glémin et al. (2001) reported unpublished results of numerical finite-population simulations which indicated that the accumulation of deleterious factors at sites incompletely linked to the S-locus had negligible effect on the number and frequency of S-alleles. However, their published results suggest that the tightest level of linkage they considered exceeded the estimate for 48A by perhaps two orders of magnitude. Elucidation of the nature and magnitude of S-allele-specific load and the conditions required for its generation remain an open area of research.

Empirical studies have documented pleiotropic effects of poppy (Papaver rhoeas) S-alleles on seed dormancy and segregation ratios at linked loci (Lawrence & Franklin-Tong, 1994; Lane & Lawrence, 1995). In Solanum carolinense, which expresses S-RNase-based GSI, Stone (2004) found that for some S-alleles, S-locus homozygosity accounts for much of the inbreeding depression expressed in incompatible crosses achieved by bud pollination, but detected no effect for other S-alleles. Bechsgaard et al. (2004) reported distorted segregation ratios causing up to fourfold differences in rates of transmission of S-alleles in Arabidopsis lyrata, which expresses the brassicaceous form of SSI.

To explore the implications of S-allele-specific load, I extended Wright's (1939) classic diffusion approximation to accommodate declines in zygote viability with the time since divergence of the constituent S-alleles (Uyenoyama, 2003). Relative to expectations from the Wright model, this form of S-allele-specific load reduces the number of S-alleles maintained at approximate steady state for a given effective population size and rate of generation of new S-alleles. Further, deleterious interactions between a newly arisen specificity and its immediate parent specificity tend to extend divergence times among segregating S-allele lineages.

Together with the additional assumption of the progressive intensification of S-allele-specific load over evolutionary time, these findings appear consistent with observations of significantly long-terminal branches in genealogies of S-RNase-based GSI alleles (Uyenoyama, 1997; Richman & Kohn, 1999). Similar genealogical distortions have been detected for other mating type regions (Brassica SSI S-locus: Schierup et al., 1998; mushroom homeodomain locus: May et al., 1999) and targets of other forms of balancing selection (ADH in Drosophila melanogaster: Hudson & Kaplan, 1986; major histocompatibility complex class II locus Ab in mouse: Richman & Kohn, 1999). Tajima's D tends to negative values under conditions conducive to the operation of Muller's ratchet (Gordo et al., 2002). An excess of rare variants would suggest long-terminal branches, consistent with the proposal that the observed distortions in S-allele genealogies may reflect the accumulation of deleterious mutations.


I thank Teh-hui Kao, who graciously provided papers and information in advance of publication, and anonymous reviewers for constructive comments. US Public Health Service grant GM 37841 provided support for this study.