The use of quantitative disease resistance (QDR) is a promising strategy for promoting durable resistance to plant pathogens, but genes involved in QDR are largely unknown. To identify genetic components and accelerate improvement of QDR in legumes to the root pathogen Aphanomyces euteiches, we took advantage of both the recently generated massive genomic data for Medicago truncatula and natural variation of this model legume.
A high-density (≈5.1 million single nucleotide polymorphisms (SNPs)) genome-wide association study (GWAS) was performed with both in vitro and glasshouse phenotyping data collected for 179 lines.
GWAS identified several candidate genes and pinpointed two independent major loci on the top of chromosome 3 that were detected in both phenotyping methods. Candidate SNPs in the most significant locus ( = 23%) were in the promoter and coding regions of an F-box protein coding gene. Subsequent qRT-PCR and bioinformatic analyses performed on 20 lines demonstrated that resistance is associated with mutations directly affecting the interaction domain of the F-box protein rather than gene expression.
These results refine the position of previously identified QTL to specific candidate genes, suggest potential molecular mechanisms, and identify new loci explaining QDR against A. euteiches.
Plant diseases are major yield-limiting factors and identification of durable and broad-spectrum resistance traits is therefore crucial to improving crop productivity (Poland et al., 2009; Kou & Wang, 2010). Most breeding strategies used to select resistant cultivars have been based on so-called qualitative resistance conferred by single resistance (R) genes (Van der Biezen & Jones, 1998; Jones & Dangl, 2006). R gene-mediated disease resistance is generally highly efficient in controlling a given pathogenic strain but often is overcome rapidly by new pathogenic variants (Young, 1996; Jones & Dangl, 2006; Lannou, 2012). A promising strategy for obtaining durable resistance would be to exploit quantitative resistance (Palloix et al., 2009). Unlike qualitative resistance, quantitative resistance is generally controlled by multiple genetic loci and leads to the reduction and/or delay of disease development. Quantitative resistance is also called partial resistance, in contrast with the hypersensitive response typical of qualitative resistance.
Quantitative resistance loci (QRL) have generally been identified by bi-parental QTL mapping. An alternative strategy to identify the genetic basis of quantitative disease resistance (QDR) is to conduct high-density genome-wide association studies (GWAS) with the objective of discovering most of the components involved in the genetic architecture of complex phenotypic traits (Rafalski, 2010; Ingvarsson & Street, 2011). GWAS in plants was first applied to the model species Arabidopsis thaliana using polymorphism data captured by array technologies and re-sequenced data concerning candidate genomic regions (Aranzana et al., 2005; Atwell et al., 2010). Such studies not only detected loci previously identified using linkage mapping, such as the flowering time gene FRI (Johanson et al., 2000) or disease resistance genes such as Rpm1, Rps5 and Rps2 (Stahl et al., 1999; Tian et al., 2002; Mauricio et al., 2003), but also identified causal genes involved in other agronomic or developmental traits (Chao et al., 2012; Filiault & Maloof, 2012). Recent GWAS studies have focused on plant species of agricultural relevance such as rice (Huang et al., 2010, 2012), barley (Shu et al., 2012), oat (Newell et al., 2012), maize (Wang et al., 2012), wheat (Kollers et al., 2013) and sorghum (Morris et al., 2013). The current trend in GWAS is to analyse large SNP datasets generated by next generation sequencing (NGS). However, for most studies in plants, low to medium genomic coverage has generally restricted the fine mapping of causal mutations and the discovery of putative additional genomic regions contributing to the heritability of phenotypic traits (Stanton-Geddes et al., 2013).
Medicago truncatula is an ideal model plant for studying the evolution and genetic architecture of biotic interactions because this species engages root symbioses with both nodulating N-fixing bacteria such as Sinorhizobium meliloti (Jones et al., 2007) and arbuscular mycorrhizae fungi (Parniske, 2008). Medicago truncatula is also a natural host for various crop legume pathogens (Tivoli et al., 2006; Samac & Graham, 2007; Ameline-Torregrosa et al., 2008). Additionally, this species can be easily manipulated and many genetic resources are available, including mutants and core collections, a high-quality reference genome (Young et al., 2011), and high-density SNP mapping of more than 200 genotypes. The first GWAS on M. truncatula was performed recently using high-resolution NGS SNP data developed in the Medicago truncatula HapMap Project (www.medicagohapmap.org/) which involved sequencing 288 Medicago accessions by Illumina technology (Stanton-Geddes et al., 2013). This study highlighted the advantages of using high-resolution SNP data for studying the genetic architecture of complex traits compared to array-based reduced genomic representations.
In the present work, we performed GWAS on M. truncatula quantitative genetic resistance to the soil-borne root pathogen Aphanomyces euteiches. This oomycete naturally infects M. truncatula as well as several economically important legume crops including pea (Pisum sativum), the most widely cultivated legume in Europe, and alfalfa (M. sativa), the second most widely cultivated legume in USA (Gaulin et al., 2007; Moussart et al., 2008). Previous QTL mapping in M. truncatula identified one major QTL located on the top of chromosome 3 (Djébali et al., 2009; Pilet-Nayel et al., 2009; Hamon et al., 2010). This locus confers broad-spectrum resistance as it has been detected in two partially resistant accessions (A17 and DZA45.5) against various pea and alfalfa isolates of A. euteiches differing in virulence and aggressiveness (Djébali et al., 2009; Pilet-Nayel et al., 2009; Hamon et al., 2010). Here we investigated the natural genetic variation of resistance to a pea strain of A. euteiches in a core collection of 179 M. truncatula accessions sampled from the Mediterranean basin, which is the natural range of this species (Ronfort et al., 2006). To characterize phenotypic variation in disease development and minimize technical phenotypic variation, we conducted complementary and standardized in vitro and climatic chamber inoculation assays. To finely dissect the genetic architecture of quantitative resistance to A. euteiches we used these phenotypic data in an association analysis with high-density SNP data generated by the Medicago truncatula HapMap Project. The extensive genomic coverage of SNP data allowed us to identify candidate genes and determine their precise positions in the M. truncatula genome. The two SNPs most highly associated with variation in resistance are located in the promoter and coding region of an F-box protein encoding gene. Analyses of gene expression revealed no evidence that the expression of this gene differed between resistant and susceptible lines. By contrast, analyses of coding region sequences revealed that alleles with a nonfunctional F-box are associated with resistance, suggesting that this protein may act as a negative regulator of plant resistance to A. euteiches.
Materials and Methods
Plant material and Aphanomyces euteiches inoculation
A set of 179 Medicago truncatula (Gaertn.) accessions (Supporting Information Table S1, extracted from www.medicagohapmap.org/hapmap/germplasm) belonging to the core collection CC192 (Ronfort et al., 2006) generated by INRA Montpellier was used for phenotyping experiments and GWAS. Seeds of M. truncatula genotypes were obtained from the INRA Medicago truncatula Stock Center (Montpellier, France;www1.montpellier.inra.fr/BRC-MTR/) and SNP data were obtained from Illumina sequencing technology through the Medicago truncatula HapMap project (Stanton-Geddes et al., 2013).
Two inoculation protocols – in vitro inoculations and climatic chamber tests, described by Djébali et al. (2009) and by Pilet-Nayel et al. (2009), respectively – were used to phenotype the M. truncatula core collection. All inoculation assays were conducted with Aphanomyces euteiches Drechs, strain ATCC 201684, which belongs to the main French pathotype (Wicker et al., 2003). For both inoculation assays, zoospores of the A. euteiches strain ATCC 201684 were produced using a well-established protocol (Badreddine et al., 2008). The two assays differ primarily in the way plants were inoculated and the age of plants at inoculation; for the in vitro assays, zoospores were placed directly on the roots 1 d after planting, whereas the climatic chamber assay plants were inoculated by flooding roots with a zoospore suspension 12 d post germination (additional details given in Methods S1).
For in vitro assays several parameters described in Djébali et al. (2009) as relevant indicators of M. truncatula resistance to A. euteiches were recorded at 15 and/or 21 d post inoculation (dpi) (Fig. 1a). These include the proportion of brown symptomatic tissues on roots and stem for each plant (‘brown_15dpi’, ‘brown_21dpi’), the amount of cotyledon yellowing (‘yellow_15dpi’, ‘yellow_21dpi’) and the proportion of dead plants (‘dead_21dpi’). In addition, the number of secondary roots (‘RII_15dpi’, ‘RII_21dpi’) and plant FW (weight_21dpi) were measured for evaluating plant development. The ratios ‘inoculated vs noninoculated’ for RII_ratio_15dpi, RII_ratio_21dpi, and weight_ratio_21dpi were also calculated. A total of 6669 plants were phenotyped for these eleven parameters. Percentages of brown tissues from the bottom of the primary root to the top of the stem were calculated by image analysis software (Image Pro-Plus, Media Cybernetics, Silver Spring, MD, USA), following the scan of all the in vitro plates that contained inoculated plants. All of the other values were obtained following visual observations.
For the climatic chamber assay, plants were uprooted 14 dpi and disease severity was scored on each plant using a 0–5 disease scoring scale described in Pilet-Nayel et al. (2009) and illustrated in Fig. 1(b). A total of 3356 inoculated plants in 716 pots were scored.
For association analyses we calculated the per accession mean value for each of the phenotypic parameters. For the in vitro assay, mean values were calculated as least square means (LSmeans) for each accession after accounting for mean differences among experiments (both the climatic and in vitro assays were done across multiple temporal blocks) or box (spatial blocks within experiments) using the linear model yijkl = μtot + accessioni + experimentj + boxk + εijkl (yijkl, the phenotypic parameter value for the lth plant located in the kth box of the jth experiment for the ith accession; εijkl, the residual). Association analyses were then conducted on the mean value of the individual measures of resistance and growth (yellow_15dpi, brown_15dpi, weight_ratio_21dpi and dead_21dpi) as well as the first principal component (PC1) from a principal component analyses (PCA) that was used to reduce the dimensonality of the resistance and growth traits. PC1, which captured 79% of the variance, integrates multiple phenotypic parameters and therefore has the advantage of integrating across multiple effects of inoculation. For climatic chamber data, a Root Rot Index (RRI) was calculated as the LSmean of disease score index for each M. truncatula accession, by taking into account the random block experimental design, using the linear model yijk = μtot + accessioni + blockj + εijk (yijk, the disease score index for the kth plant of the jth block of the ith accession; εijk, the residual). These statistical analyses were performed using the R software package (R Development Core Team, 2013).
We used SNP markers from the M. truncatula HapMap project (Stanton-Geddes et al., 2013) identified by aligning Illumina 90-bp sequence reads from 288 M. truncatula accessions to the M. truncatula A17 reference genome assembly v.3.5. As described in greater detail in Stanton-Geddes et al. (2013), each accession was self-fertilized for a minimum of three generations before DNA extraction from c. 30-d-old dark-grown seedlings. Illumina sequencing of paired 90 mer or 151 mer reads (trimmed to 90 mers for analysis) was performed according to standard methods. Illumina image analysis pipeline with default parameters was used for base-calling, quality filtering, and to remove adapter and PhiX contamination. Reads that passed initial quality control filtering were aligned to the M. truncatula reference genome v.3.5 (Young et al., 2011), www.medicagohapmap.org) using GSNAP (Wu & Nacu, 2010). Reads with < 91% identity to a genomic region, that aligned to ≥ 5 locations, or aligned to locations where > 500 reads aligned, were excluded before variant calling. We called SNPs when: (1) a position was covered by ≥ 2 unique reads for the 26 accessions sequenced to c. 15× mapped coverage (Branca et al., 2011) or ≥ 1 unique read for other accessions; and (2) reads that called a nonreference allele had a quality score ≥ 10 and variant nucleotides were called by > 70% of reads. The > 70% of reads calling a variant means that no heterozygous sites were identified within individuals. This should have minor effects given high selfing rates in natural populations (> 95%; Bonnin et al., 2001; Siol et al., 2008) and ≥ 3 generations of selfing before DNA extraction.
The raw dataset of 16 515 723 SNPs was imputed with the genotypic information for 288 M. truncatula accessions using the software Tassel (Bradbury et al., 2007). Details of this procedure are indicated in Methods S1. An SNP marker was used in GWAS if the Minor Allele Frequency (MAF) was ≥ 0.05 in the sample of 179 accessions studied. A total of 5107 697 of these imputed SNPs satisfied this criterion (448933, 624845, 710016, 728638, 943556, 456398, 629761 and 565550 on chromosome 1–8, respectively). We did not independently validate imputed or called SNP in the entire 288 accession dataset. However, calls made in a subset of 26 accessions, for which we required two reads rather than a single read in order to call a variant, had a false positive rate of 3.5%, based on c. 45 kbp of Sanger sequence data collected from each of 16 accessions (Branca et al., 2011; De Mita et al., 2011). For at least two reasons erroneously called SNPs are not expected to have strong effects on the GWAS. First, GWAS was conducted only with SNPs that were present at a MAF > 0.05 and if falsely called SNPs are randomly located the probability of the same error being called in 9 (i.e. 0.05 × 179 lines) or more lines is quite small. Second, falsely called SNPs would be expected to be random with respect to phenotype and thus would not be identified as a potentially causative variant.
The version of the A17 reference genome assembly used for SNP calling (Mt v3.5) contains a gap of c. 0.9 Mb on the top of chromosome 3 that is included inside the confidence interval of AER1, one of the major QTL of resistance to A. euteiches (Hamon et al., 2010). To fill in this gap, we used a sequence scaffold from a whole genome shotgun sequence assembly (presented in Notes S1) and identified 1138 additional suitable SNPs in this region (Table S2). Virtual genomic positions were assigned to these additional SNPs (c3_2745773 to c3_2916323) in order to facilitate the visualization of GWAS results. This scaffold and the SNPs identified within this region are available from the authors and will soon be available through the Medicago Hapmap website (www.medicagohapmap.org).
GWAS was performed on the RRI data from the climatic chamber assays and PC1 and individual measures of resistance and growth from the in vitro assays using the compressed mixed linear model (MLM) approach (Yu et al., 2006; Kang et al., 2008; Zhang et al., 2010) implemented in Tassel (Bradbury et al., 2007) which uses the EMMA (Kang et al., 2010) and P3D (Zhang et al., 2010) algorithms to reduce computing time. We used the Q + K method as a MLM function (the statistical model for association mapping is described in Methods S1), which first requires estimating population structure (Q) and kinship (K) matrices. The Q matrix, which contains assignment probability of each accession to a given genetic group, was estimated using two methods: (1) a model-based Bayesian clustering assignment algorithm implemented in the software Structure v2.3.3 (Pritchard et al., 2000); and (2) a discriminant analysis of principal components – Dapc – a multivariate method which employs PCA to reduce the number of correlated variables (SNP markers) to be analysed using a discriminant analysis (Jombart et al., 2010) implemented in the R package Adegenet. A set of 968 unlinked SNPs was used for Structure analysis and 34 550 intergenic SNPs were used for Dapc analysis. The K matrix was estimated using Tassel with 14 564 intergenic SNPs (details on population structure and kinship matrix estimation are given in Methods S1). We set a nominal 5% genome-wide significance threshold with two sorts of corrections: a Bonferroni correction for the number of tests and a Bonferroni correction for the number of informative SNPs estimated by the number of LD blocks in the genome (accounting for correlations between SNPs because of LD). Details on LD and number of LD blocks estimation are given in Methods S1.
RNA extraction and quantitative RT-PCR
Total RNA was isolated from the roots of control or inoculated plants, reverse-transcribed to cDNA as described by Rey et al. (2013), and used in a quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR primers and conditions in Methods S1). Each qRT-PCR reaction was conducted in triplicate for three biological replicates. Two M. truncatula genes – Medtr4g097170.1/Mtr.16911.1.S1_s_at, encoding an histone-3-like protein (Rey et al., 2013), and AC233140_53.1/Mtr.42993.1.S1_at, encoding an unknown protein with a constant and weak level of expression in roots (similar to expression detected for the candidate F-box gene) – were used to normalize plant gene expression. Five time points (8 h, 1, 3, 6 and 13 dpi) were selected to compare gene expression from very early to late stages of the infection.
Sequencing of genomic DNA and cDNA
Genomic DNA was isolated from M. truncatula leaves using Qiagen DNeasy plant kit. Three overlapping PCR products (primers and conditions in Methods S1) covering the Medtr3g011020 gene region were sequenced for 20 M. truncatula HapMap accessions (13 resistant accessions with PC1 values between 3.5 and 6.2: HM000, HM002, HM004, HM028, HM036, HM067, HM072, HM080, HM083, HM096, HM097, HM163 and HM189; and 7 susceptible accessions with PC1 values between −3 and 0: HM006, HM010, HM015, HM019, HM027, HM114 and HM126). The sequences were aligned and nucleic acid polymorphisms were identified in a ClustalW2 (www.ebi.ac.uk/Tools/msa/clustalw2/) multiple sequence alignment. A phylogenetic tree was constructed using Mega4 (Tamura et al., 2007) and protein secondary structure prediction was inferred using GOR software (www.expasy.org/proteomics).
Natural variation of Medicago truncatula quantitative resistance to Aphanomyces euteiches
Two complementary phenotyping approaches were performed to assess the resistance level of the 179 M. truncatula accessions. The first approach was an in vitro inoculation assay which makes it possible to record 11 parameters describing symptom extent or plant development (Fig. 1a) that are considered to be reliable indicators of plant resistance (Djébali et al., 2009). The second approach, the climatic chamber assay, was used to estimate an RRI (Fig. 1b) that relies on final root rot symptoms 14 dpi on plants grown in vermiculite in a growth chamber (Pilet-Nayel et al., 2009). Detailed measurements for in vitro and climatic chamber parameters are shown in Table S3.
All phenotypic parameters recorded during in vitro inoculation assays were moderately to highly correlated (Table S4). The proportions of brown tissues, cotyledon yellowing and weight ratio were all highly correlated with one another (0.85 < |r| < 0.98), indicating that they are all robust indicators of plant resistance/susceptibility. Measures of plant development such as secondary root number and root ratio between infected vs control plants, previously shown to be associated with the A17 partial resistance (Djébali et al., 2009), showed significant but weaker correlations (0.53 < |r| < 0.84) with the former indicators of plant resistance. A Principal Component Analysis (PCA) of the 11 in vitro parameters indicated that the first principal component (PC1) captured almost 80% of the variation among genotype correlations and discriminated two major plant phenotypes, suggesting that PC1 provides an integrative measure of plant resistance (Fig. 2a). A graphical representation of PC1 among all accessions compared to the ones obtained with browning tissues, yellowing of cotyledons and proportion of dead plants illustrates that PC1 captures a greater range of plant resistance level than any individual parameter (Fig. S1). The frequencies of PC1 classes were distributed bimodally, indicating a clear discrimination between susceptible (low PC1 values) and resistant (high PC1 values) accessions, which might suggest oligogenic control of resistance to A. euteiches (Fig. 2b). RRI calculated from plants grown in climatic chamber also showed a bimodal distribution, although this was less pronounced than the PC1 distribution (Fig. 2c).
Estimates of PC1 and RRI sensu stricto heritability (hss2) from genome-wide SNP data, 0.54 and 0.47, respectively, indicated a high genetic control of resistance/susceptibility of M. truncatula to A. euteiches. The difference in hss2 between PC1 and RRI reflects a higher residual variance for RRI, probably due to a more variable environment and a lower number of plants scored in climatic chamber than in the in vitro assay. Finally, a sizeable negative correlation (r = −0.82) between PC1 and RRI suggested that both phenotypic approaches captured consistent resistance (PC1) or susceptibility (RRI) variation among genetically diverse accessions of M. truncatula (Fig. 2d). While PC1 and RRI behaved similarly, we performed subsequent GWAS using both parameters independently in order to highlight stable – that is, independent from development stage and phenotyping approach – candidate loci with large effects, and to identify specific QTLs more strongly associated with the plant development stage and/or the experimental process of each assay.
Population structure of Medicago truncatula HapMap accessions panel
Two complementary methods, Structure and Dapc, were used to assess the population genetic structure of the 179 M. truncatula accessions. Both analyses uncovered two major genetic groups: a ‘Far West group’ comprising accessions sampled mostly in the extreme west of the Mediterranean basin (Spain, Portugal, Morocco, west Algeria), and a ‘Circum group’ comsisting of accessions sampled from locations widely distributed throughout the Mediterranean basin with only few occurrences in the extreme west of the Mediterranean basin (Fig. S2). These results are in agreement with previous analyses of M. truncatula population structure using microsatellite (Ronfort et al., 2006) and SNP (Paape et al., 2013) data. From these results an assignment probability matrix (Q matrix) was generated for subsequent association analyses.
Genome-wide association study
We focused the GWAS on PC1 and RRI using a panel of 5107 697 imputed SNPs. We set a nominal 5% genome-wide significance threshold with Bonferroni correction (P <9.8 × 10−9), or Bonferroni correction linked to the number of LD blocks in the genome (P <10−6, ‘Bonferroni block’, see the 'Materials and Methods' section). PC1 and RRI genome scans identified a high number of candidate SNPs associated with M. truncatula resistance to A. euteiches (36 and 13 SNPs, respectively, with P-values < 10−6). Both scans identified a locus on the top of chromosome 3 that contained SNPs highly associated with M. truncatula resistance to A. euteiches (Fig. 3a). For PC1, two SNPs in this locus were highly associated with resistance variability: c3_2612996 and c3_2614037 (P-value = 1.17 × 10−9 and 1.61 × 10−9, respectively, Table 1). For RRI, the most highly significant SNP in this region was c3_2773973 (P-value = 5.53 × 10−9, Table 1). A detailed scan of this 1.5-Mb region revealed two distinct peaks, with a more significant contribution of the left-most peak to the GWAS signal (Fig. 3b).
Table 1. Genomic location of candidate SNPs and annotated genes associated with Medicago truncatula resistance to the root oomycete pathogen Aphanomyces euteiches
Protein of unknown function (DUF303_ acetylesterase_putative)
C (non syn)
TE (unknown Protein)
C (non syn)
Unknown Protein (Genomic DNA chromosome 5 P1 clone MXC9)
TE (Gag-pol polyprotein)
TE (Gag-pol polyprotein)
Calcium dependent protein kinase 25_EF-Hand type
TE (Heat shock 22 kDa protein)
TE (HAT family dimerisation domain containing protein)
The two most significant SNPs, c3_2614037 and c3_2612996, are located in the 5′ noncoding region and in the coding sequence of an F-box protein coding gene (gene model Medtr3g011020), respectively. The SNP c3_2612996 is predicted as a nonsynonymous substitution close to the F-box domain (glutamic acid to lysine polymorphism), according to genomic information available in the Medicago HapMap project. The neighboring region covering the other highly significant SNP, c3_2773973, is in an unassembled region of the present Medicago v3.5 reference genome. Manual annotation of the region encompassing this SNP revealed that it is located in the noncoding region between genes encoding an adenylate isopentenyltransferase (IPT) and a gibberellin and abscisic acid-regulated MYB (GAMYB) transcription factor (TF). Two of these three candidate SNPs (c3_2612996 and c3_2614037) are in nearly complete linkage disequilibrium −LD (D′ = 1, r2=0.88) but these two SNPs show very low LD to c3_2773973 (0.41 < D′ < 0.43 and 0.13 < r2<0.17). It should be noted that the IPT/GAMYB region is not well covered by the NGS data and it is thus difficult to obtain precise estimates of LD. However, c3_2612996 and c3_2614037 are located on a well-defined haplotype block (22 kb, positions 2599–2621 kb) showing a clear LD pattern (Fig. 4). More precisely, c3_2612996 and c3_2614037 are flanked by two recombination segments that likely reduce the size of this candidate region for resistance to a 12-kb genome fragment (2605–2617 kb) that includes the F-box protein coding gene and two transposable elements. A high additive genetic variance () was explained by c3_2612996/c3_2614037 and c3_2773973 (R2=0.23 and 0.18, respectively, for PC1 Table S5).
With less stringent 10−6 significance threshold based on the number of linkage blocks, 36 candidate SNPs were detected for PC1, 12 of which could be assigned to the 22-kb haplotype block on the top of chromosome 3 (Fig. 4), and one (c3_2773973) located between the IPT and GAMYB genes (Table 1 and S5). RRI GWAS identified 13 candidate SNPs, with 5 SNPs located on the two major candidate regions on chromosome 3 (Tables 1, S5). Four SNPs, three inside the 12 kb haplotype containing the F-box protein coding gene and its promoter region (c3_2612996, c3_2613932, c3_2614037), and c3_2773973 between the IPT and GAMYB genes, were common to PC1 and RRI, suggesting that these two major genomic regions are stable QTL with regard to plant development stage and environmental conditions. A noteworthy difference between PC1 and RRI GWAS was that PC1 GWAS identified more significant SNPs in the F-box protein coding gene region, reflecting a stronger and more defined association signal (Fig. 4).
Other candidate SNPs (P-value < 10−6) in the M. truncatula genome were found in or near annotated genes, most with annotated functions related to plant response to biotic stress (see the 'Discussion' section, Tables 1, S5). PC1 GWAS revealed: (1) two SNPs close to an ethylene-responsive transcription factor (Medtr3g095190); (2) one SNP within an intron of a calcium-dependent protein kinase gene (Medtr7g054260); (3) three SNPs associated with the two adjacent genes ATP-dependent RNA helicase HAS1 and molybdopterin synthase catalytic subunit; and (4) 11 SNPs associated with eight transposable elements. RRI GWAS with imputed SNP data revealed: four SNPs associated with four transposable elements, among which one SNP was shared with PC1 (c2_17311572); one SNP in the coding region of a serine/threonine protein kinase gene (Medtr3g092390); and one SNP close to a leucine-rich repeat family gene (Medtr3g092350). Finally, 13 SNPs were associated with genes of unknown functions on chromosomes 2–6 in both PC1 and RRI GWAS.
For completeness, GWAS also was performed with individual symptom data, namely the proportion of brown tissues at 15 dpi, yellow cotyledons at 15 dpi, weight ratio at 21 dpi and proportion of dead plants at 21 dpi. Nearly 25% (10 of 41) of candidate loci detected by GWAS with single parameters were common with SNP candidates detected by GWAS with PC1 values, among which were SNPs located in or near the IPT/GAMYB, F-box protein, ATP-dependent RNA helicase HAS1 and the calcium-dependent protein kinase coding genes (Table S6). These results suggest that the main genetic components of M. truncatula resistance to A. euteiches identified by combining results from GWAS with individual symptom data are captured by GWAS with PC1 alone, hence supporting the use of PC1 as a synthetic parameter. Analyses of the individual symptom data also identified several SNPs that were not identified in the analyses of PC1. These SNPs may be associated with specific aspects of infection or resistance and suggest that synthetic (i.e. PC1) and individual parameters can give complementary results in a GWAS analysis.
Bioinformatic and gene expression analyses of the F-box protein coding gene
Several significant SNPs were located in the promoter region of the F-box protein coding gene (Medtr3g011020, Fig. 4) and hence we tested the hypothesis that differences in the level of gene expression could explain the observed differences in resistance level. We performed qRT-PCR experiments on two resistant (HM000 and HM097) and two susceptible accessions (HM006 and HM010) that were selected based on their PC1 values. Time-point kinetics analyses of the gene expression were performed in all four lines before or after inoculation with A. euteiches. Time-points were selected according to the different stages of the infection, known in susceptible or resistant plants (Djébali et al., 2009; Djebali et al., 2011) from very early (8 h post inoculation – zoospore encystment and germination) to late (13 dpi – stele colonization by mycelium in susceptible lines) stages of infection. The qRT-PCR results showed that the gene was expressed at a low level in all conditions and no expression polymorphism was observed between resistant and susceptible accessions (Fig. S3).
Then, additional sequence (Sanger method) data were collected to investigate the potential causative role of the candidate SNP c3_2612996 located in the coding region of the F-box gene. The complete DNA sequence of the F-box protein coding gene was obtained from 20 HapMap accessions belonging to either resistant (PC1 > 3.5, 13 accessions) or susceptible (PC1 < 0, 7 accessions) phenotypes. cDNA sequences of the four accessions used for the qRT-PCR experiment were also obtained. This sequencing identified 64 genic SNPs not detected or retained by GWAS for technical reasons (but see the alignment in Notes S2). In addition, we found that cDNA sequence had the same length as the genomic coding sequence, indicating the absence of introns. A phylogenetic tree constructed from the protein coding sequences showed that: (1) resistant and susceptible accessions cluster into two separate groups supported by high bootstrap values (90 and 94, respectively), with the exception of the HM004 sequence that is more similar to sequences of susceptible accessions; and (2) accessions with the most diverged sequences (HM036 and HM072) are resistant, suggesting that the M. truncatula F-box protein coding gene ancestral sequence may have conferred resistance (Fig. 5). More interestingly, based on the cDNA length susceptible lines (for instance HM006/F83005.5) express a complete and putatively functional F-box protein, containing one F-box domain and one interaction domain while most resistant accessions (for instance HM000/A17) exhibit a stop codon, resulting in a truncated protein with only the F-box domain. Two other forms of protein associated with the resistance phenotype also were detected. In HM004 (DZA45.5), the F-box protein shows a complete structure, highly similar to HM006, but an amino-acid change is predicted to modify the protein secondary structure (an α-helix is replaced by a coil structure). In addition, the secondary structure of the protein of the most divergent sequences (HM036 and HM072) has a shorter interaction domain than HM004 and HM006 (Fig. 5). From these results, we suggest that the candidate SNP c3_2612996 is not a causal mutation, but rather is tightly linked to putative causal mutations that can trigger protein truncation (c3_2612765) or alteration of secondary structures (c3_2612632, c3_2612264).
Although plant QDR to pathogens has been observed in numerous crops, the underlying molecular mechanisms remain largely unknown (St Clair, 2010). To investigate the natural genetic diversity of QDR to A. euteiches and identify associated molecular actors, we performed a high-density GWAS by exploiting a total of 5107 697 experimental and imputed SNPs using two complementary (in vitro and climatic chamber inoculation) phenotyping techniques in a collection of 179 M. truncatula accessions. The analyses identified 44 significant SNPs associated with 32 candidate loci, revealing the quantitative nature of M. truncatula partial resistance to A. euteiches. Twenty loci were detected only with the in vitro technique, 7 loci were detected only with the climatic chamber assay and 5 were common to both methods. This suggests the presence of stable QTL and also reveals possible new resistance loci that have not been described by classical bi-parental mapping (Hamon et al., 2010).
The major candidate resistance gene codes for an F-box protein
The GWAS identified two peaks of highly significant SNPs delimiting two loci separated by c. 150 kb that explained a large part of the observed resistance. The most highly associated SNPs (c3_2612996 and c3_2614037) account for 23% of the additive genetic variance of the resistance to A. euteiches. They are located in the middle of the 135-kb prAe1 locus (Djébali et al., 2009), in the 5′ noncoding and coding regions of the Medtr3g011020 gene that encodes an F-box protein. F-box proteins are one component of the ubiquitin ligase complex leading to the ubiquitination of target proteins and their ultimate degradation by the proteasome. By allowing the degradation of positive or negative regulators, F-box proteins have been found to be involved in the regulation of various processes associated with plant development and adaptation to environmental conditions including biotic stress (Lechner et al., 2006). Many hormonal pathways are regulated by F-box proteins and this may explain how an F-box protein could impact both plant development and plant resistance to a pathogen (Kim & Delaney, 2002; Navarro et al., 2006). Such a mechanism might occur in resistance to A. euteiches, which is very often accompanied with an increase of lateral root formation (Djébali et al., 2009, 2013).
In order to understand the mode of action of this F-box protein, we examined both the expression and similarity of the coding sequences of resistant and susceptible accessions. Because several significant SNPs were identified in the promoter region of the Medtr3g011020 gene, qRT-PCR was first performed to search for gene expression differences between resistant and susceptible M. truncatula accessions. No clear signal of expression polymorphism was detected. Gene sequences obtained from 20 accessions revealed additional SNPs in the coding region that were not used in GWAS due to quality reasons but which are in strong LD with the significantly detected SNPs. The sequence analysis revealed that sequences associated with susceptible phenotypes code for a complete and functional protein with intact F-box and protein interaction domains involved in ubiquitination of, and specific interaction with, the targeted protein, respectively (Kipreos & Pagano, 2000). By contrast, there are two major forms of the protein found in resistant accessions and both are predicted to be impaired in the interaction domain or in secondary structures. The predominant first form is found among resistant accessions containing a SNP that introduces a stop codon after the F-box domain. The second resistant form is a complete protein but with changes in secondary structure that might either directly affect the interaction domain or the general three-dimensional protein structure.
Because a nonfunctional F-box is associated with resistant accessions, it is likely that this protein may act as a negative regulator of plant resistance to A. euteiches. Identification of its target will be a major challenge for further analyses aimed at a better understanding of the QDR to A. euteiches. If this F-box protein is involved both in A17 (HM000) and DZA45.5 (HM004) partial resistance to A. euteiches, an intriguing issue will be to discover the mechanisms that are involved in the recessivity (A17) or the dominance (DZA45.5) of resistance (Djébali et al., 2009; Pilet-Nayel et al., 2009). From the results shown in Fig. 5 we can hypothesize that the stop codon leading to a lack of the interaction domain needs to be present in both gene copies to promote A17 resistance, while in DZA45.5, only one copy of the gene encoding the F-box protein with an alteration of the interaction domain is sufficient to activate the observed resistance, resulting in a dominant negative effect of this protein.
Identification of new loci involved in QDR to Aphanomyces euteiches
The GWAS refined the position of previously identified QTL (Djébali et al., 2009; Pilet-Nayel et al., 2009; Hamon et al., 2010) to a single F-box protein coding gene, but also identified a large number (c. 30) of new loci outside the prAe1 and AER1 regions. Notably, they include a second locus on the top of chromosome 3, close to prAe1, that accounts for 18% of the additive genetic variance, and involves either an adenylate isopentenyltransferase (IPT) protein that catalyzes the first steps of cytokinin biosynthesis (Kakimoto, 2003), and/or a gibberellin and abscisic acid-regulated MYB (GAMYB) transcription factor. Both genes are interesting candidates because IPT might be associated with an increase in cytokinin concentrations, well known to modulate root development (as observed in most resistant accessions), but also to directly activate plant immunity mechanisms (Robert-Seilaniantz et al., 2011). The GAMYB transcription factor could also have a direct effect on plant resistance activation, because MYB factors have previously been reported to play a key role in plant–pathogen interactions (Vailleau et al., 2002; Canonne et al., 2011; Marino et al., 2013). However, the GAMYB gene family has until now mainly been associated with plant development (Woodger et al., 2003).
Other candidate genes were identified elsewhere in the genome. Among genes of known function that were associated with one or several significant SNPs, annotation indicates that three could be associated with plant resistance signaling. One of these genes codes for an ethylene-responsive transcription factor, ERF 110 (Medtr3g095190). In general members of this TF family are activated by the presence of ethylene, a hormone whose biosynthesis genes were shown to be activated by A. euteiches infection (Rey et al., 2013). Some ERF genes are also involved in plant immunity by coordinating plant defenses in response to chitin treatment or inoculation with various pathogens (Gutterson & Reuber, 2004; Son et al., 2012). Additionally, a calcium-dependent protein kinase (CDPK) (Medtr7g054260) and a Leucine Rich Repeat Receptor-like Kinase (LRR-RLK) (Medtr3g092350), protein families that are potentially involved in early signaling events of plant–pathogen interactions, were also detected. Recent data on a member of the CDPK family in A. thaliana showed that CDPK regulate ROS production, defense gene induction, cell-death and amounts of phytohormones such as ethylene or jasmonic acid (Ludwig et al., 2005; Dubiella et al., 2013).
Significant SNPs associated with QDR to A. euteiches have also been detected in two other closely linked annotated genes encoding an ATP-dependent helicase containing DEAD/DEAH box (Medtr3g041730) and a molybdopterin synthase catalytic subunit (Medtr3g041720). A rice homolog (OsBIRH1) of the former gene was shown to play a key role in plant response to biotic and abiotic stress, notably by enhancing oxidative stress tolerance (Li et al., 2008). Molybdopterin synthase coding genes have not been reported to have a direct role in plant immunity. However, two oomycetes pathogens, Hyaloperonospora arabidopsidis and Albugo Laibachii, have independently lost molybdenum-cofactor-requiring enzymes, probably as a result of reduced selection for maintenance of this biosynthetic pathways (Kemen et al., 2011). On this basis, one can hypothesize that a modulation of the expression of this gene in M. truncatula could hamper or delay A. euteiches growth and promote resistance of M. truncatula to this pathogen.
Significant SNPs also were detected inside genes with unknown functions. In these cases further work is needed to clarify how the candidate genes could modify M. truncatula resistance to A. euteiches. Intriguingly, a high proportion of transposable elements (up to 40% of the detected loci) were detected, two of which were found with both in vitro and climatic chamber assays. The enrichment of transposable elements in genome-wide candidate regions either suggests that they have been integrated in genes that play a role in plant immunity, or reveals an unsuspected role for these elements in M. truncatula resistance to pathogens, as recently described in Arabidopsis (Yu et al., 2013).
Evolutionary insight into the F-box candidate gene
Our GWAS analysis allowed for fine-structure localization of a 12-kb haplotype block containing the F-box candidate gene surrounded by two recombination segments. The length of this haplotype block is in accordance with the observed pattern of LD in M. truncatula which indicated that LD significantly decays beyond 10 kb on average (Branca et al., 2011). This result confirms the prediction that whole-genome recombination and LD pattern in M. truncatula are expected to positively affect mapping resolution using high-resolution SNP data (Branca et al., 2011; Stanton-Geddes et al., 2013). It may be noteworthy that the two recombination segments associated with the 12-kb haplotype block are separated by two transposable elements (Medtr3te011010 and Medtr3te011040) oriented in opposite directions. Transposable elements are known to affect plant genome evolution and to alter gene expression and function (Lisch, 2012). Some gene families, such as R genes in Arabidopsis, are prone to duplication and transposition and are often associated with TEs (Freeling et al., 2008; Malacarne et al., 2012). Similarly, the F-box candidate gene region is part of a larger region (135 kb) enriched in F-box genes and others TEs. One could hypothesize that, among other F-box protein coding genes, the F-box candidate gene (Medtr3g011020) was mobilized by TEs before M. truncatula speciation because all accessions of the core-collection display the F-box candidate gene region structure. One such event might even be more ancient because accessions of the more distant M. murex (Yoder et al., 2013) also share this local genomic structure.
The phylogenetic tree constructed from the F-box gene sequences suggests that the ancestral M. truncatula F-box gene sequence may have conferred resistance, and that susceptibility might have evolved more recently (Fig. 5). This observation seems counter-intuitive as one could expect that A. euteiches exerted an intense selective pressure over time on M. truncatula populations, leading to fixation of M. truncatula-resistant F-box protein in these populations. To address this paradox, two hypotheses can be formulated. The first is that host-pathogen co-evolution has led to the occurrence of a mix between resistant and susceptible phenotypes, along with more or less aggressive phenotypes in the co-occurring pathogen populations. From the plant's point of view, a gene involved in such an evolutionary arms race should exhibit a polymorphism pattern driven by balancing selection. We calculated Tajima's D (Tajima, 1989) based on the genomic sequence of the F-box protein coding gene. Tajima's D was −0.57 and this value is among the highest 12% in the entire genome, according to a 20 000 coding sequences dataset which has a mean D value of −1.5 (Paape et al., 2013). Based on these genome-wide empirical data one cannot exclude balancing selection acting on the F-box protein coding gene. A second hypothesis is potentially more likely. The geographical distribution areas of M. truncatula and A. euteiches show little overlap; M. truncatula occurs throughout the Mediterranean basin (including Southern Europe, North Africa and Middle East) while A. euteiches has been predominantly reported in Northern Europe – northern France, UK, Norway, southern Sweden, Denmark – (Papavizas & Ayers, 1974; Gaulin et al., 2007). Based on these observations (which need further investigation), one could speculate that M. truncatula resistance to this soil-borne pathogen might correlate with other ecological variables such as rhizosphere microbial community variation and/or physico-chemical soil properties. A correlation between water deficit and plant resistance to A. euteiches was proposed in a recent study performed on Tunisian M. truncatula accessions (Djébali et al., 2013). As a consequence, it is thus possible that the observed resistance to A. euteiches in a M. truncatula accession is a by-product of some other ecologically relevant biological process where this F-box protein could also have a central role. Such a process could involve the modification of root architecture which would allow for adaptation to both abiotic and biotic stresses.
The authors thank the bioinformatics platform Toulouse Midi-Pyrenees (Genotoul) and Yves Martinez for his help in figures layout. This work was funded by CNRS, Université Paul Sabatier, INRA and the French Agence Nationale de la Recherche (ANR-10-GENM-0007, ‘Immunit-Ae’) and was performed in the LRSV (Toulouse, France), part of the ‘Laboratoire d'Excellence’ (LABEX) entitled TULIP (ANR-10-LABX-41), and in the IGEPP research unit of INRA, Rennes, France. Additional support came from the US National Science Foundation Plant Genome Program project IOS-1237993 to Minnesota and NGCR.