Incompatibility recognition systems preventing self-fertilization have evolved several times in independent lineages of Angiosperm plants, and three main model systems are well characterized at the molecular level [the gametophytic self-incompatibility (SI) systems of Solanaceae, Rosaceae and Anthirrhinum, the very different system of poppy, and the system in Brassicaceae with sporophytic control of pollen SI reactions]. In two of these systems, the genes encoding both components of pollen-pistil recognition are now known, showing clearly that these two proteins are distinct, that is, SI is a lock-and-key mechanism. Here, we review recent findings in the three well-studied systems in the light of these results and analyse their implications for understanding polymorphism and coevolution of the two SI genes, in the context of a tightly linked genome region.
The evolution of functionally complex systems is an important question. Self-incompatibility (SI) pollen-pistil recognition systems in flowering plants are particularly fascinating, but understanding their evolution is more difficult than for other recognition systems because the component genes are highly polymorphic within species. It is easy to imagine the evolution of recognition systems involved in developmental signalling: changes in one component may not immediately abolish the old function; an allele encoding a protein with an additional useful developmental function, even if imperfect, can thus replace the species’ pre-existing gene, and the new function can later improve by fixation of alleles with further changes in either component (Löhr et al., 2001). In the evolution of S-loci, however, gaining a new specificity requires losing the old one (to preserve the unique specificities of alleles). Moreover, changes in specificity of just the pollen or the pistil should lead to full or partial self-compatibility; thus, both recognition components must change, and both changes must occur in the same haplotype.
We are, however, gaining a better understanding of SI evolution as a result of the combination of molecular genetic and molecular evolutionary work. SI systems have evolved at least three times independently, providing opportunities for identifying common features in the different systems and thus for testing evolutionary hypotheses. Studies of SI should lead to a general understanding of how variation in these genes is maintained and how selection, including coadaptation between the pollen and pistil components, affects the evolution of genome regions around the incompatibility loci. In turn, an understanding of such genomic effects in well-characterized model systems is necessary to rigorously evaluate evidence on where, in genomes, selection is maintaining variability.
Knowing the components of polymorphic recognition systems does not necessarily indicate what maintains the variants. There is still uncertainty for vertebrate major histocompatibility complex (MHC) systems, and the maintenance of variability in incompatibility systems in fungi (e.g. Casselton, 1998) and some sessile animals (e.g. Grosberg, 1988) is also not yet understood. Plant SI systems are excellent model systems for using to understand such variability because we know that the chemical recognition of pollen by pistils prevents self-fertilization, creating an advantage for any allele that is rare in a population. This frequency-dependent selection (a form of balancing selection) leads to the long-term maintenance of many alleles with different incompatibility types (Vekemans & Slatkin, 1994), and alleles thus become widely dispersed throughout a species’ different populations (Muirhead, 2001).
General features of different plant SI systems
Although plant SI systems all prevent self-fertilization through recognition and rejection of pollen by pistils expressing ‘cognate’ allelic specificity (described later), very different cellular mechanisms are involved in the three systems where the genes have been characterized. The genetics and mechanisms of these systems have recently been reviewed in detail (Nasrallah, 2002; Kao & Tsukamoto, 2004). Molecular data are so far available only for systems in which a single polymorphic ‘S-locus region’ controls the different incompatibility specificities. Pistil SI proteins are often soluble and can be studied in pistil extracts. Thus, the first molecular information, from Brassica and Nicotiana, came through the electrophoresis of major pistil proteins, combined with cDNA analysis. Other self-incompatible species in these two plant families were subsequently shown to have homologous S loci. Petunia species (and other Solanaceae) and Rosaceae have stylar S-RNases, and the important genus for molecular research, Arabidopsis, has a system similar to that of Brassica. A. lyrata is self-incompatible, with a system like that in Brassica, while its close relative, A. thaliana, is autogamous and highly self-fertilizing in nature (Abbott & Gomes, 1988; Nasrallah et al., 2004) with nonfunctional S-locus homologues (Kusaba et al., 2001).
It has long been thought that the S-locus region contains separate pistil and pollen protein genes (i.e. that recognition in SI is a ‘lock-and-key’ system, with components as different as in other ligand–receptor systems), and this has now been confirmed. Evidence for a distinct pollen determinant was, until recently, indirect, based mainly on mutagenesis studies, in which pollen and pistil incompatibility can be affected independently (Lewis, 1947; Kao & Tsukamoto, 2004). Potential ‘pollen-S’ genes have now been identified in laborious map-based searches of large genomic regions around the pistil S genes of Brassicaceae, Solanaceae, Antirrhinum and Rosaceae, and tested in transgenic experiments (described later). For these species (but not yet in Papaver, with a system quite different from either the Brassicaceae or the S-RNase system), the lock-and-key model is thus at last confirmed, and the alternative (recognition through expression of the same proteins in pistil and pollen) is excluded.
We have explained in the previous section that S genes should have higher polymorphism than other genes. To test this, diversity must be quantified in samples of alleles from natural populations. Genomic gel blotting (restriction fragment length polymorphism analysis) can provide some data, but shows merely that variants exist, without adequate quantification. Allele numbers depend on recombination rates and so are an unsatisfactory measure of diversity (they can be enormous for sequences with several polymorphic sites that recombine, such as MHC genes). Diversity should thus be quantified as ‘nucleotide diversity’, using measures such as the mean fraction of sites that differ between two randomly sampled alleles at a locus, π. Amino acid and nonsynonymous site diversity (πn) will be particularly high in codons directly affected by balancing selection, and regions with high πn or πn > πs (synonymous site diversity) can often help to identify these (Table 1; Richman et al., 1996). As a basis for comparison, πs in maize has been shown to range up to 3.6% across a sample of 21 loci (Tenaillon et al., 2002).
Table 1. Diversity per nucleotide of different regions of genes of several recognition systems
Values shown are π or similar estimates. The different methods used to correct for saturation are not specified here, as all values are approximate when diversity is high. (Note that values > 100% are possible after such correction.)
The table shows the high diversity in regions likely to be under balancing selection, and also in other regions of the S-gene sequences [comparing either the S-domain and kinase domains of SRK, or positively selected vs conservatively evolving sites, or sites within hypervariable (HV) regions vs other regions]. Comparable data for a major histocompatibility complex (MHC) gene (HLA-DRB1) are shown to illustrate the lower synonymous site diversity in regions not under balancing selection, in contrast to S loci.
It is now understood that variability will also often be high at unselected sites close to sites where frequency-dependent selection maintains many alleles for long time-periods (Schierup et al., 2000; Takebayashi et al., 2003). If functionally different S alleles rarely recombine, their sequences will evolve like alleles in isolated populations with low ‘effective population sizes’, and hence diversity among different instances of the same functional allele will be low. Between different alleles, however, synonymous and even nonsynonymous differences can accumulate in the absence of recombination (Vekemans & Slatkin, 1994; Charlesworth et al., 2003). Given estimated plant recombination values of approx. 1 Mb per centiMorgan, the predicted diversity peaks should span only a few hundred nucleotides, which would nevertheless greatly hinder precise identification of the sites within S proteins that are involved in recognition (Charlesworth et al., 2003), a chief reason for obtaining sequences. In the MHC systems, recombination occurs and diversity is indeed largely highest at sites near known peptide-binding amino acids (Table 1).
In SI systems, maintaining coadapted combinations of pollen and pistil alleles should result in the evolution of low-recombination frequencies between the S loci. The S-locus genome region is physically too small (Fig. 1) to test this by genetic mapping with feasible family sizes, but population genetic approaches may be able to test for low recombination through detecting haplotypes of only certain pistil and pollen S-allele combinations (linkage disequilibrium). If recombination is rare, diversity may be elevated over large genomic tracts, possibly including other genes and sequences located within the S-locus region.
High diversity is a necessary condition for accepting a gene as an S locus, and candidate genes can sometimes be eliminated because of low diversity (Casselman et al., 2000; Takebayashi et al., 2003). If the S loci lie within a nonrecombining region, however, high diversity may hinder even identifying the functional S genes. Moreover, diversity data from SI plants are scarce for non-S loci, so one often cannot test whether candidate S loci have significantly elevated diversity. Transgenic experiments are thus also essential to show that candidate genes change the incompatibility reaction appropriately.
In the Brassicaceae, the pistil protein is a receptor kinase, SRK. Initial work in Brassica identified an S-domain protein, the highly polymorphic SLG (reviewed in Sato et al., 2002), but the evidence from polymorphism was misleading: SLG has no major role in recognition, but seems merely to enhance the SI response – its polymorphism probably results from close linkage to the true pistil-recognition gene, SRK (Nasrallah, 2002). SRK is anchored to the stigma membrane, presenting its extracellular S domain to ligand molecules on the pollen surface (Nasrallah, 2002). Both domains are extraordinarily polymorphic (Table 1), although the S domain is presumably the sole region subject to balancing selection. This suggests that the S locus may lie within a region of unusually low recombination frequency. As shown in Fig. 1, physical mapping shows that allelic S-locus regions are organized into haplotypes with the genes in different orders and orientations (Boyes et al., 1997). Moreover, high synonymous and also nonsynonymous site diversity in the A. lyrata SRK kinase domain (Table 1) are difficult to explain unless recombination with the S domain is infrequent. These findings are all consistent with the S-locus region containing at least two coadapted genes, the most plausible evolutionary reason for a low recombination rate. The alternative, that the region is located in a genome region, such as a centromeric region, that rarely recombines, seems not to apply in A. lyrata (Kusaba et al., 2001) whose gene locations within chromosome arms are probably similar to those in A. thaliana (Kuittinen et al. 2004).
A breakthrough in establishing the existence of separate pollen S genes was the identification of the Brassica and Arabidopsis pollen SI determinant. The candidate gene for the ligand in Brassicaceae is a small, cysteine-rich pollen-coat protein, SCR (also called SP11; Schopfer et al., 1999; Takayama et al., 2000). Its sequence diversity in Brassica is high (Watanabe et al., 2000; Shiba et al., 2002); however, a full picture of variability is not yet available because alleles in dominant haplotypes differ so greatly that not all genetically detectable alleles can be sequenced. In A. lyrata, with only two alleles compared to date, many differences were found in this short protein sequence (Kusaba et al., 2001).
Direct evidence from transgenic experiments shows that SCR(SP11) is indeed the pollen S gene. Transfer of SRK and SCR alleles of a particular A. lyrata haplotype causes SI in A. thaliana buds, though only transiently in many strains (Nasrallah et al., 2004), and transferring Raphanus SCR alleles into B. campestris (rapa) caused the recipients’ pollen to be rejected by Brassica plants with a particular incompatibility type (Sato et al., 2004), showing that SCR determines the pollen incompatibility type and also that certain alleles in the two species have the same incompatibility type (despite several sequence differences).
Even with this detailed knowledge from Brassicaceae, unexpected properties of S genes continue to emerge, notably expression differences of SCR(SP11) alleles. In heterozygotes with alleles of differing dominance, only the more dominant allele is expressed (Kusaba et al., 2002; Shiba et al., 2002). This mono-allelic expression is like that in other chemoreceptor systems, including mammalian odorant receptors (Goldmit & Bergman, 2004), and ensures that each pollen grain has a single SI type. Unlike other receptor systems, expression choice is deterministic: dominant alleles are expressed in both the anther tapetum and, gametophytically, in pollen, whereas recessive alleles are not expressed in pollen. Expression is presumably controlled by different upstream sequences, which suggests that the haplotype structure of S-locus regions may embrace those regions also. New alleles may thus be constrained to have the same dominance as their progenitors. This has not yet been incorporated into evolutionary models, but it may explain the clustering of Brassica allele sequences according to their dominance.
Solanaceae, Anthirrhinum and Rosaceae
Unlike the SI of Brassicaceae, where inhibition occurs on the stigma surface, in these species incompatible pollen tubes are inhibited in the stylar transmitting tract. Consistent with this, the pistil S-RNase protein is expressed in style tissue. It is taken up by growing pollen tubes, degrading RNAs within incompatible pollen tubes and causing subsequent arrest of growth in the style (Kao & Tsukamoto, 2004). Like SRK, S-RNases are highly polymorphic (Table 1), although most sequence data come from the reverse transcription–polymerase chain reaction amplification of pistil cDNA. Synonymous and nonsynonymous diversity are both high (e.g. Richman et al., 1996). Only rarely are segregation tests carried out to verify that sequences are allelic and that different incompatibility classes in families correspond with different sequences. Natural population studies generally simply assume that different sequences represent different alleles (presumably treating cases with few differences as sequencing errors).
The pollen determinant of SI has now been identified in species of all three families with S-RNase systems (Solanaceae, Rosaceae and Antirrhinum), and the findings suggest a mechanism for these systems, and may explain some of their odd features. Genes encoding F-box proteins have been candidates for the gene controlling pollen recognition since the discovery of S-linked F-box (SFB) genes in A. hispanicum (Lai et al. 2002) and Prunus mume (Rosaceae, Entani et al., 2003). However, formal proof was needed that an SFB gene determines pollen specificity, as this is a very large gene family (694 estimated genes in A. thaliana; Gagne et al., 2002), and several such genes have been found in the S-locus region of some species (Entani et al., 2003; Kao & Tsukamoto, 2004). Transformation experiments now confirm F-box genes as the pollen S of both P. inflata (Sijacic et al., 2004) and A. hispanicum (Qiao et al. 2004).
The Petunia study made elegant use of the ‘competition interaction’ phenomenon, which creates difficulties for transgenic experiments with pollen S genes, but provides a characteristic specific for the male SI determinant. In many known gametophytic systems, pollen grains heterozygous for S alleles may be compatible with plants carrying both alleles (Lewis, 1947). Thus, self-compatible plants in species with S-RNase systems commonly arise by tetraploidy or duplication of the S locus, making some of the pollen effectively heterozygous for the pollen S gene. This does not occur in the Brassicaceae: tetraploids maintain normal incompatibility reactions (Mable et al., 2004). The pollen SFB protein belongs to a protein family involved in ubiquitin-mediated protein degradation. F-box proteins often play crucial roles, delivering appropriate targets to the ubiquitin–protein ligase complex (Gagne et al., 2002). The ‘inhibitor model’ for SI proposes that interactions of pollen-recognition proteins with nonself S-RNases do not allow RNase activity, whereas activity does occur after interaction with the cognate S-RNase (Entani et al., 2003; Kao & Tsukamoto, 2004). Unexpectedly, the new results offer a simpler model and can explain the competition phenomenon. Competition suggests active destruction of RNase activity by any pollen S-protein not recognized as the cognate one, for example compatibility resulting from degradation of all nonself S-RNase proteins (Fig. 2).
Surprisingly, expression of the P. inflata PiSLF gene peaks in immature bicellular pollen in the anthers, not during pollen tube growth. Diversity data and duplications in self-compatible strains (Tsukamoto et al., 2005), however, support the conclusion that the true pollen S has been found, although further data are needed. An approximate quantification for PiSLF, using a few alleles, suggests extremely high amino acid diversity (Sijacic et al., 2004), consistent with the extremely high S-RNase πs and πn values compared with other plant loci (Table 1). However, in some Prunus and Petunia species with S-RNase systems, the S loci may lie close to centromeric regions (with low recombination). Until linkage has been quantified, and diversity estimated for nearby loci, evidence from diversity of candidate genes is incomplete.
In Papaveraceae, only the pistil S protein has been characterized to date. In Papaver rhoeas, it is the pistil S protein that is a small extracellular signal molecule (see Thomas & Franklin-Tong, 2004), presumably interacting with a receptor on pollen tube surfaces. Despite the pollen determinant still being hypothetical, the cellular mechanism of SI is known in considerable detail. Self-stigma S proteins elicit a rapid increase in Ca2+ within pollen tubes growing in vitro, suggesting the involvement of programmed cell death (PCD) processes in the SI response. Indeed, inhibiting caspase-3-protease, a key PCD enzyme, abolishes endonuclease activity and prevents DNA fragmentation and pollen-tube growth inhibition that occur in normal incompatibility (Thomas & Franklin-Tong, 2004).
Downstream biochemical pathways involved in SI, and evolution of self-compatibility
Downstream cellular pathways leading to self-pollen rejection are also becoming better known for the Brassica and S-RNase systems (Vanoosthuyse et al., 2003; O’Brien et al., 2004). This should illuminate how secondary self-compatibility arises, a common evolutionary change in plants (Goodwillie, 1999; Nasrallah et al., 2004). SI can be lost by mutations in either the S locus or the downstream pathway. Many components of the Brassica SI process are now known, including the ARC1 protein that interacts with SRK (Stone et al., 2003), and the MLPK gene, recently characterized in a self-compatible B. rapa cultivar, which is unlinked to the S locus but encodes a SRK-like membrane-anchored cytoplasmic protein kinase (Murase et al., 2004).
Incompatibility breakdown in cases of natural loss of SI is less well understood. In A. thaliana there is evidence for several potential mutations, and the timescale of SI loss is uncertain. In the Columbia strain, the SRK and SCR orthologues are pseudogenes (Kusaba et al., 2001), suggesting loss of SI through mutations in an S locus. Transgenic plants expressing SRK and SCR from A. lyrata indeed show some incompatibility, and strain C24 can be made strongly incompatible in such experiments, suggesting that this strain's downstream pathway is functional (Nasrallah et al., 2004). However, other strains expressing the same transgenes remain compatible, indicating variability for mutations in downstream pathway genes. SRK sequences in A. thaliana are also surprisingly polymorphic (Shimizu et al., 2004; Nasrallah et al., 2004). If compatibility evolved by selection of a mutant S gene, a ‘selective sweep’ should have occurred, and this region's diversity would be lower (or at least no higher) than that of other genes, unless recombination is extremely frequent so that the selective sweep could affect just the mutated S gene, which seems improbable. Low diversity was indeed found for one SCR(SP11) pseudogene (Shimizu et al., 2004). The high diversity of SRK and other flanking loci (Nasrallah et al., 2004; Shimizu et al., 2004) suggests that there has been no rapid fixation in this species of any particular S haplotype.
How do new specificities evolve?
The origin of new specificities is a major outstanding puzzle. Both well-understood SI systems are two-component (lock-and-key) systems. The origin of new haplotypes thus requires at least two mutations, yet a change in the specificity of one component will lead to self-compatibility. One possibility is thus that such a mutation causing self-compatibility spreads in the population, but another suitable mutation in the mutant haplotype restores incompatibility before all S alleles are lost from the population (Uyenoyama et al., 2001). Targeted mutagenesis and domain-swapping experiments with pistil-recognition genes show that chimaeric ‘dual-specificity’ proteins can be formed which reject two different pollen types (Kao & Tsukamoto, 2004), suggesting another way that new specificities might evolve. Experiments in Brassica show that the SCR-binding affinity for SRK can be altered without losing the ability of the resulting complex to elicit incompatibility responses in vivo (Chookajorn et al., 2003). SCR and SRK proteins of each specificity might thus form ‘clouds’ of sequences that may subsequently split under disruptive selection, optimizing recognition.
If several slightly different SRK sequences have the same specificity, a selective advantage accrues to an allele encoding a pollen protein variant that escapes rejection by some of these, providing it with more compatible mating opportunities than other variants of this specificity (Uyenoyama et al., 2001). However, unless the advantage of this subspecificity diminishes as it spreads among its allele class, it would merely replace the initial allele, not add a new one (Uyenoyama & Newbigin, 2000). Occasionally, however, the pollen gene of another variant in the same allele class may acquire a second change that leads to rejection exclusively by a different pistil variant, replacing one initial incompatibility haplotype by two new functionally different ones with similar sequences (Fig. 3).
Is the necessary polymorphism present among sequences of pistil and pollen genes with the same specificities? Such sequence differences are expected to be rare because of the small effective population sizes of S alleles. The few data on diversity within S-allele types in P. rhoeas (O'Donnell et al., 1993), Brassica (Kusaba et al., 2000) and A. lyrata (Charlesworth et al., 2003) suggest occasional differences, including some amino acid differences, between sequences of the same S alleles; no data are yet available for S-RNase systems.
A question for future study is how SI systems evolved in the first place. The evolution of these systems, with two components involved in recognition, seems difficult unless a previously existing cellular recognition system changed function to become a self-pollen recognition system. New knowledge about the detailed mechanisms of SI should illuminate the origins of SI systems. The three cases reviewed here each appear to have co-opted different cellular systems. All S-RNase systems may have a common origin (Steinbachs & Holsinger, 2002), while the Brassica and poppy systems are different (though ubiquitination may also be involved in the Brassica signalling process, Stone et al., 2003). With plant genome sequences, relationships between different SI systems may become evident, and suggestions of similarities to functions such as pathogen-recognition systems (Thomas & Franklin-Tong, 2004) will become testable.
Many other self-incompatible plants are known, but molecular details are so far lacking. Another important task will be to learn about these incompatibility systems, including the important self-incompatible monocotyledon (see Hackauf & Wehling, 2005) and legume species, as well as the classic SI species, Oenothera organensis. The approaches and understanding developed in the three currently well-studied systems should be helpful for studying the others. Other systems may reveal common features, giving mechanistic and evolutionary understanding, or new systems, exposing new plant cellular-recognition processes.
We thank the following for funding: NERC (DC); The Royal Society (SG); and CNRS-Life Science Department, FEDER fund from the EU and ARCIR grant from the Région Nord-Pas de Calais (XV and VC). We also thank the authors of the many interesting papers we were unable to cite in this short review for their contributions towards understanding SI systems.