• Open Access

Multiplex single nucleotide polymorphism (SNP)-based genotyping in allohexaploid wheat using padlock probes


* Correspondence (fax +44 117925 7374; e-mail K.J.Edwards@Bristol.ac.uk)


Single nucleotide polymorphisms are the most common polymorphism in plant and animal genomes and, as such, are the logical choice for marker-assisted selection. However, many plants are also polyploid, and marker-assisted selection can be complicated by the presence of highly similar, but non-allelic, homoeologous sequences. Despite this, there is practical and academic demand for high-throughput genotyping in several polyploid crop species, such as allohexaploid wheat. In this paper, we present such a system, which utilizes public single nucleotide polymorphisms previously identified in both agronomically important genes and in randomly selected, mapped, expressed sequence tags developed by the wheat community. To achieve relatively high levels of multiplexing, we used non-amplified genomic DNA and padlock probe pairs, together with high annealing temperatures, to differentiate between similar sequences in the wheat genome. Our results suggest that padlock probes are capable of discriminating between homoeologous sequences and hence can be used to efficiently genotype wheat varieties.


Current evidence suggests that bread wheat arose through the hybridization of diploid Aegilops tauschii (D genome) with the related tetraploid Triticum turgidum (A and B genomes) to produce allohexaploid Triticum aestivum (AABBDD). In common with many other polyploid crop species, T. aestivum exhibits a higher yield potential and a greater ability to grow over a wider geographical range than its progenitors (Hancock, 2004). The ability of polyploids to grow under conditions not tolerated by their progenitors is probably a result of the increased number of related genes (alleles, homoeologues and paralogues), which provides a selective advantage because of the increased level of genetic diversity (Hegarty and Hiscock, 2008).

Bread wheat is the dominant arable crop in the UK and, together with maize and rice, is one of the three most important crops in world agriculture (Aquino et al., 1999). The increasing world population, issues of food security and the use of wheat as a possible source of bioethanol emphasize the need to grow increased amounts of wheat in the future. However, this new generation of higher yielding, more disease-resistant wheat lines capable of growing in changing conditions will require efficient breeding practises, including the use of marker-assisted selection (McLauchlan et al., 2001). The use of molecular markers in wheat breeding and diversity studies is increasing, but the allohexaploid nature of bread wheat has limited the development of high-throughput genotyping to a relatively few gene-based markers linked to agronomic traits. Work by various groups is slowly resulting in an increase in the number of useful markers, such that there are now a large number of microsatellites and single nucleotide polymorphisms (SNPs) available for general use (Quarrie et al., 2005). However, when the situation in allohexaploid wheat is compared with that in diploid barley, where several thousand SNP-based markers are available, the considerable disadvantages of working with a polyploid species can immediately be seen (Rostoks et al., 2005).

As SNPs are the most common genetic variation between individuals of the same species, it is not surprising that the development of the technology for SNP-based genotyping has been the subject of intense activity, with the result that several different platforms now exist (Kim and Misra, 2007). These genotyping platforms have invariably been developed for either human genotyping or for species with defined diploid genomes for which numerous SNPs have previously been developed, for example barley (Rostoks et al., 2005). However, such platforms are of limited use for the genotyping of polyploids because the presence of highly related, but non-allelic homoeologous sequences complicates the genotyping process. Most systems are incapable of discriminating between highly related, but distinct sequences, and so are unable to differentiate between inter- or intragenomic allele variation. Therefore, in polyploid species, it is difficult to determine: (i) the allelic content of many of the individual homoeologues within the genome; and (ii) the contribution made by a specific homoeologous allele to the breeding material.

Work by our group has suggested that, in allohexaploid wheat, considerably more SNPs exist between the three homoeologous genomes than exist between alleles of the same homoeologous genome within different genotypes (Poole et al., 2007; Edwards et al., 2008). One way to overcome the constraints posed by homoeologous sequences on genotyping is to identify and use SNP-containing single-copy sequences. These sequences, by definition, will not contain homoeologous copies and will therefore not suffer from the interference usually encountered. Several such sequences have already been identified and used to genotype wheat: for instance, the purindoline gene Pinb-D1 on chromosome 5D is a single-copy gene of considerable importance to wheat breeders because of its effect on the determination of grain hardness (Huang and Roder, 2005). However, although the number of described single-copy sequences is increasing, the numbers that have useful SNPs is still surprisingly small and, as yet, insufficient for most genotyping purposes.

The recently developed Diversity Array Technology (DArT) uses a multiplexed hybridization-based approach to identify random single-copy sequences in the genome under investigation (Jaccoud et al., 2001). As DArT relies on a screening approach to identify single-copy sequences, it can be used for both diploid and polyploid species, and has recently led to many hundreds of anonymous markers being generated in bread wheat (Akbari et al., 2006). However, a drawback of DArT is its inability to target individual genes, specifically those involved in agronomic traits, such as grain hardness or bread-making quality. In major crops, such as wheat, where selection is often based on a relatively small number of agronomic traits, such as flour quality, it is essential that a technology is available that can screen both whole genomes (as is the case with DArT) and individual genes of agronomic importance.

Several technologies now exist which can be used to screen numerous SNPs in known genes. One of these is based on the use of padlock probes, which are oligonucleotides that become circularized by DNA ligation in the presence of the appropriate target sequence (Nilsson et al., 1994). Padlock probes are highly specific because of: (i) their requirement for two target-complementary sequences, one for each end of the probe; and (ii) the inability of thermostable DNA ligase to ligate DNA at a mismatched sequence located at the ligation junction (Antson et al., 2000). The use of padlock probes to carry out multilocus genotyping in humans has been reported previously (Baner et al., 2003). In their study, Baner et al. (2003) used padlock probe pairs to successfully genotype 13 SNPs within the human Wilson's disease-related ATP7B gene. A modified padlock probe procedure has also been used to simultaneously genotype 1500 human SNPs (Hardenbol et al., 2003). The application of padlock probe technology to genotype human DNA samples suggested that the requirement for two highly complementary sequences and the presence of the correct terminal base might make padlock probes suitable for use in polyploid genomes, where homoeologous sequences can be found which contain both inter- and intragenomic SNPs. In this article, we describe the development and application of a system, based on padlock probes, that uses both randomly selected expressed sequenced tags (ESTs) and genes of agronomic interest. Based on the results obtained, this assay appears to be capable of simultaneously genotyping numerous wheat lines with several hundred gene-based SNP markers.

Results and discussion

Establishing the padlock probe assay in wheat

Our initial padlock probe experiments focused on reproducing the protocols of Baner and Hardenbol in bread wheat (Baner et al., 2003; Hardenbol et al., 2003). Although we had some success with the protocol employed by Baner et al. (2003), we were unsuccessful in our attempts to reproduce the procedure of Hardenbol et al. (2003) (data not shown). We hypothesized that our inability to repeat the procedure of Hardenbol et al. in wheat was a result of the large size of the wheat genome and the presence of multiple similar (homoeologous) copies (Poole et al., 2007). We further hypothesized that, because the protocol of Hardenbol et al. relies on the ligation of the circularized padlock probe via a single cycle, the procedure is less sensitive than that adopted by Baner et al., which employs a ligase amplification step consisting of up to 20 cycles. However, we found it impossible to use the procedure of Baner et al. to genotype more than 10 loci simultaneously (data not shown). Only by modifying the protocol, as described in Experimental procedures, was it possible to genotype several hundred loci as described below and as shown in Figure 1.

Figure 1.

Padlock probe procedure developed to genotype allohexaploid wheat. (a) Paired padlock probes were designed to include two target complementary sequences at the 5′ and 3′ ends (grey line); the allele-specific nucleotide was positioned at the 3′ end. As described in Baner et al. (2003), the target-specific ends were connected to 20 bases (in 5′ to 3′ order) to one sequence common for all probes used for amplification (black line), either of two allele-specific tags (red or green lines) and a tag sequence unique for each locus (blue line). Only probe arms (here shown as allele 2) which correctly hybridized to the target sequence were joined by the thermostable ligase and hence protected from subsequent digestion by exonucleases. (b) Circularized probes were amplified with the appropriate allele-specific primers (here the allele 2 primer), together with the common forward primer. Following amplification, the products were denatured and hybridized to microscope slides containing an array of spotted oligonucleotides complementary to the locus-specific tags. To aid detection, the hybridization mix also contained two oligonucleotides, one labelled with cyanine-3 (Cy3), complementary to allele tag 1, and one labelled with cyanine-5 (Cy5), complementary to allele tag 2.

To confirm that padlock probes could be employed to genotype allohexaploid wheat, and to show that they were capable of discriminating between closely related homoeologous sequences, we used the Rht-B1 and Rht-D1 loci for our preliminary analysis. These two loci, on chromosomes 4B and 4D, respectively, were selected because they code for important proteins involved in height determination and yield potential (Peng et al., 1999). The ‘wild-type’ alleles Rht-B1a and Rht-D1a lead to relatively tall plants, whereas the dominant Rht-B1b and Rht-D1b‘mutant’ alleles result in semi-dwarf plants. In both cases, sequence analysis of the Rht-(B/D)1b alleles has shown that they each differ from the respective Rht-(B/D)1a alleles by a single base, although this base change is at a linked, but different location in the two homoeologues (Ellis et al., 2002; Figure 2a). Sequence analysis has also shown that the two Rht-B1a and Rht-D1a homoeologues differ from each other by a single base a short distance from the base changes differentiating the Rht-(B/D)a/b alleles. Hence, this scenario is typical of the situation described for many homoeologous genes. We therefore used the available sequences for the Rht loci to design padlock probes, employing the principles described by Baner et al. (2003). The sequences of the region of interest within the various Rht alleles are shown in Figure 2a, together with the padlock probes used (Figure 2b). To carry out the padlock probe procedure for the Rht-B1 and Rht-D1 loci, we combined the four padlock probes shown in Figure 2 and used the pooled probes to screen 12 wheat varieties. The 12 varieties were selected to represent the possible Rht-B1a/b and Rht-D1a/b genotypes, but three varieties – Harnesk, Holdfast and ‘Synthetic’– had no prior Rht gene assignment. For all the varieties, duplicate genotyping experiments were carried out using two separately prepared genomic DNA samples from the same seed stock (biological replicates). The results were expressed as log10 ratios and the two replicates were compared (Figure 2c,d; Supplementary Data S1, see Supporting Information). As each locus had two padlock probes, one specific for each of the two alternate SNPs, the results of genotyping were expressed as the log10 ratio of the cyanine-5 (Cy5) and cyanine-3 (Cy3) hybridization intensities. Figure 2c,d shows the results of comparison of the log10 ratios of the two replicates for Rht-B1 and Rht-D1 genotyping. For each locus, the 12 varieties clustered into two significantly different groups, representing the Rht-B1a and Rht-D1a tall and Rht-B1b and Rht-D1b semi-dwarf genotypes (Figure 2c and 2d, respectively). In all cases, the individual biological replicates were highly correlated (R2 = 0.9654 for Rht-B1 and R2 = 0.9707 for Rht-D1).Table 1 shows the results obtained from the Rht-B/D1 padlock probe-based genotyping in comparison with (where known) the expected genotypes. In all cases, the results obtained corresponded (where known) to the expected genotype, as determined using the ‘perfect’ markers developed by Ellis et al. (2002). In cases in which the Rht genotype was previously unknown (for the varieties Harnsek, Holdfast and a synthetic line), we confirmed the padlock probe results by amplification and direct sequence analysis using the primers described by Ellis et al. (2002) (data not shown).

Figure 2.

Partial sequences of the Rht-B1 and Rht-D1 alleles and their respective padlock probes. (a) Comparison of the two Rht homoeologous sequences from chromosomes 4B and 4D shows that, in addition to the single nucleotide polymorphism (SNP) responsible for the tall/dwarf phenotypes, the two homoeologous loci differed by a single SNP (G in Rht-B1a + b and C in Rht-D1a + b). SNPs defining both the tall/dwarf Rht genotypes and the difference between the homoeologous Rht-B and Rht-D loci are highlighted. (b) Padlock probes designed from the genomic sequences of Rht-1Ba + b and Rht-D1a + b. Probes were designed on the basis of the principles highlighted in Figure 1. Rht-B1a + b padlock probes were 103 bases in length, whereas Rht-D1a + b padlock probes were 104 bases in length. For clarity, the padlock probes are shown with a single space between each functional part of the synthetic oligonucleotide; this space does not exist in the actual oligonucleotide. All padlock probes were synthesized 5′ phosphorylated as denoted by the 5′‘P’. The predicted melting temperature of the padlock probes was 74.5 °C. (c, d) Scatter plots of duplicate genotyping of the Rht-B1 and Rht-D1 loci. (c) Log10 scatter plots of 12 varieties genotyped in duplicate using the Rht-B1 padlock probe pairs. (d) Log10 scatter plots of 12 varieties genotyped in duplicate using the Rht-D1 padlock probe pairs. In each case, one of each duplicate genotype has been plotted along the x and y axis.

Table 1.  Comparison of the genotyping of 12 wheat varieties at the Rht-B1/D1 loci employing padlock probes with their previously determined genotypes using allele-specific polymerase chain reaction (PCR) assays (Ellis et al., 2002)
Wheat varietyPadlock probe resultReported Rht genotype
Chinese SpringRht-B1a/Rht-D1aRht-B1a/Rht-D1a

The results using the two Rht loci indicated that padlock probes were capable of discriminating between similar sequences when employed to genotype wheat genomic DNA. We next examined whether multiple padlock probes were capable of discriminating between several possible SNP variants at a single locus. For this analysis, we selected the single-copy purindoline b (Pinb-D1) gene located on chromosome 5D. Seven Pinb-D1 alleles (Pinb-D1aPinb-D1g) have been identified and characterized in bread wheat. The allele Pinb-D1a is considered to be the ‘wild-type’ allele, resulting in a soft-textured endosperm, whereas the Pinb-D1bPinb-D1g alleles encode defective proteins and result in a hard-textured endosperm (Huang and Roder, 2005). In all cases, the Pinb-D1aPinb-D1g alleles differ from one another at a single base. Using the published sequences, padlock probe pairs were designed for each of the six (Pinb-D1bPinb-D1g) alleles (i.e. 12 probes in total). In all cases, one probe of each pair was designed to detect the ‘wild-type’Pinb-D1a allele, whereas the second probe of each pair was designed to detect the Pinb-D1bPinb-D1g alleles. Therefore, if the variety under investigation carried the Pinb-D1a allele, all six probe pairs should give the same result. To ensure that all seven alleles were present within the varieties screened, we genotyped at least one line for each of the known Pinb-D1 alleles; hence, in this experiment, we included the varieties Andrew (Pinb-D1g), Gehun (Pinb-D1e) and UTAC (Pinb-D1f). The sequences of the seven Pinb-D1 alleles (a–g), together with the padlock probe sequences and the results obtained from the screening of 14 wheat varieties, are presented in Figure 3. Screening of the 14 varieties indicated that Hobbit and Robigus carried the Pinb-D1a allele, whereas Andrew carried the Pinb-D1g allele, Gehun the Pinb-D1e allele and UTAC the Pinb-D1f allele (Figure 3c). In addition, Scorpion25 and XI19 were shown to contain the Pinb-D1c allele (Figure 3c). Although the assay could easily distinguish between the Pinb-D1a, Pinb-D1c, Pinb-D1e, Pinb-D1f and Pinb-D1g alleles, the results for the Pinb-D1b and Pinb-D1d alleles were more complicated. For example, for varieties carrying either the Pinb-D1b or Pinb-D1d allele, the padlock probe data suggested that the same varieties might also carry the Pinb-D1f allele (Figure 3c). We believe that the reason for this interesting result with the Pinb-D1f allele, in the presence of either the Pinb-D1b or Pinb-D1d allele, is probably caused by the relative position of the three SNPs leading to the ‘b’, ‘d’ and ‘f’ alleles. These SNPs all occur within a six-base region, with the SNP leading to the ‘f’ allele being located two bases away from the ‘d’ allele SNP and four bases away from the ‘b’ allele SNP (Figure 3a). Hence, our results suggest that, when used to screen varieties carrying the ‘b’ or ‘d’ allele, the presence, in the two ‘f’ allele probes, of ‘a’ specific flanking sequences results in a mismatch at the position of the ‘b’ or ‘d’ SNP, which significantly reduces the intensity of the resulting hybridization event. This leads to lower signal levels (by as much as 1000-fold) and, consequently, unpredictable variations in the Cy5/Cy3 ratios. Such a result confirms that padlock probes only hybridize efficiently when the two flanking sequences show homology. Interestingly, no such problems occur with the ‘b’ and ‘d’ call when the ‘f’ allele is present (Figure 3c). These results show that padlock probes can be used to detect lines carrying the ‘b’ and ‘d’ allele, as long as the status of the ‘f’ allele is taken into consideration, whereas lines actually carrying the ‘f’ allele are only positive for this allele. Given this scenario, the padlock probe assay suggests that the lines Harnesk, Holdfast, Recital and Thatcher carry the Pinb-D1b allele, whereas Malacca and Mercia carry the Pinb-D1d allele. As above, in all cases, our padlock probe genotyping results confirmed the previously published Pinb-D1 genotype or matched our own data generated using polymerase chain reaction (PCR) markers (Kane et al., 2004) and near-infrared determination of endosperm texture.

Figure 3.

Partial sequences of the Pinb-D1 alleles used to generate the six padlock probe pairs, the resulting padlock probes and the genotyping results from the screening of 14 wheat varieties. (a) Comparison of the seven Pinb-D1 alleles (a–e) highlighting the allelic SNPs. (b) Padlock probe pairs designed from the genomic sequences of Pinb-D1a–g alleles. The Pinb-D1-specific padlock probes were between 109 and 116 bases in length and had a melting temperature of 74–75 °C.

Figure 3.

(c) Results of the screening of 14 wheat varieties with the combined Pinb-D1 padlock probes. The results for each variety are expressed as log10 ratios; positive log ratios indicate Pinb-D1a allele characteristics, whereas minus log ratios indicate a ‘mutant’Pinb-D1 allele. Andrew (Pinb-D1g); Gehun (Pinb-D1e); Harnsek (Pinb-D1b); Hobbit (Pinb-D1a); Holdfast (Pinb-D1b); Malacca (Pinb-D1d); Marquis (Pinb-D1b); Mercia (Pinb-D1d); Recital (Pinb-D1b); Robigus (Pinb-D1a); Scorpion25 (Pinb-D1c); Thatcher (Pinb-D1b); UTAC (Pinb-D1f); Xi19 (Pinb-D1c).

Large-scale genotyping of wheat with random ESTs

We next investigated the use of large numbers of pooled padlock probes to genotype 57 lines representing 52 wheat varieties. To do this, we designed 336 pairs of padlock probes representing SNPs at 284 loci. The 284 loci were selected either because they represented regions of interest to the wheat breeding community or because they represented ‘random’ ESTs for which SNPs had been previously described or for which the location of an SNP had been suggested. However, it should be noted that, in the majority of cases, these SNPs had only been reported in non-UK material (Caldwell et al., 2004). In many cases, the random ESTs had been mapped to specific deletion bins (Qi et al., 2003). ESTs were only included if both the sequence and location of the SNP were considered to be in the public domain. The 336 padlock probe pairs included the previously used Rht-B/D and Pinb-D1 probes, as well as 23 high-molecular-weight (HMW)-glutenin probe pairs, six low-molecular-weight (LMW)-glutenin probe pairs, three probe pairs for sequences associated with the vernalization requirement, eight probe pairs associated with disease resistance, two probe pairs associated with alien chromosome introgressions, 16 probe pairs for genes of known or suspected function and 268 ‘random’ ESTs (D’Ovidio and Anderson, 1994; Smith and Schweder, 1994; Ellis et al., 2002; Butow et al., 2003; Ma et al., 2003; Mochida et al., 2003; Zhang et al., 2003, 2004; Sherman et al., 2004; Yan et al., 2004; Huang and Roder, 2005). Loci were selected only when it was possible to design homoeologous-specific padlock probes (i.e. when the public data suggested that an intervarietal SNP was present and a polymorphism existed between the homoeologues in the flanking region covered by the padlock probe). However, because both the wheat genome sequence and the EST collections are incomplete, it is probable that some of the probes designed were not truly homoeologue specific. In designing the padlock probe pairs, we took no account of the sequences surrounding the SNP being genotyped; however, as the maximum size of a padlock probe that could be synthesized efficiently was approximately 120 bases, we avoided sequences that had an AT content greater than 60%. The padlock probe sequences, together with their locus and allele-specific tags, are provided as Supporting Information (Supplementary Data S2).

To assess how many of the 336 padlock probe pairs were capable of hybridizing to wheat genomic DNA in a consistent and reproducible manner, we carried out duplicate hybridizations to two different Scorpion25 genomic DNA samples prepared from the same plant (Figure 4a). Replicate experiments indicated that, of the 336 padlock probe pairs, 125 (36%) showed either no or poor hybridization and were therefore removed from the subsequent analysis. The remaining 211 padlock probe pairs (64%) were used to screen 57 wheat lines, selected to represent both past and present UK germplasm, as well as foreign lines that might be expected to show increased levels of diversity compared with elite UK material. The 57 lines also included the two genomic DNA samples prepared from the same Scorpion25 plant, three ‘different’ lines of the variety ‘Mercia’ and two ‘different’ lines of the variety ‘Thatcher’. Following hybridization, scanning and data capture, as described in Experimental procedures, the Cy5/Cy3 log ratios were collated in an Excel spreadsheet (provided as Supplementary Data S3, see Supporting Information). To assess whether the performance of an individual padlock probe was influenced by the presence of numerous other padlock probes, we examined the genotype of the 57 lines for the Rht-B/D and Pinb-D1 probe sets. The ‘pooled’ results were the same as the ‘single’ locus results, suggesting that the presence of large numbers of padlock probes did not reduce the specificity of individual padlock probe pairs. However, again as described for the individual Pinb-D1 padlock probe pairs, the Pinb-D1f padlock probe pairs had a propensity to suggest that the Pinb-D1f mutant allele was also found when the ‘b’ or ‘d’ mutant was actually present. However, this result was easily compensated for during the scoring of the dataset.

Figure 4.

Examples of the use of padlock probes to screen: (a) replicate Scorpion25 genomic DNA with 211 padlock probe pairs; and (b) the allele content of the starch synthase locus on chromosome 7A in 40 wheat lines. In (a), the log10 cyanine-5/cyanine-3 (Cy5/Cy3) ratios of the 211 padlock probe pairs for two independently prepared genomic DNAs, obtained from the same plant, were compared. (b) Log10 Cy5/Cy3 ratio for the starch synthase padlock probe (probe number PS00073) when used to screen 40 wheat lines. For the starch synthase gene padlock probe pairs, a range of at least 2 log10 units separated the two allele scores.

Scoring of the remaining padlock probe pairs was carried out in a similar manner to that described for the Rht and Pinb-D1 probes, and resulted in 12 027 data points (211 probes × 57 lines). An example of a padlock probe pair score, for the gene encoding starch synthase, is presented in Figure 4b. Examination of the overall results showed that 49 of the 211 padlock probe pairs (~23%) were monomorphic across the different lines. These monomorphic markers appeared to be randomly distributed across all seven homoeologous groups. Although the number of monomorphic probes appears to be high, it is considerably lower than the 90.6% of monomorphic probes reported by Akbari et al. (2006) for randomly selected wheat DArT fragments screened against 62 lines. The remaining 162 polymorphic padlock probes (77%) comprised 41 pairs (80% of the original 51) specific to loci of agronomic importance and 124 (43.6% of the original 284) designed to randomly selected genes and ESTs. Polymorphism information content (PIC) values for the 162 polymorphic padlock probe pairs ranged from 0.035 to 0.50, with an average value of 0.27 (Supplementary Data S3, see Supporting Information). Although PIC values for biallelic SNPs cannot be greater than 0.5, the value of 0.27 is, as expected, lower than that previously determined for wheat microsatellite markers (0.39–0.886; Roder et al., 2002). However, using SNP markers in wheat, in combination with homoeologous-specific PCR, Somers et al. (2003) reported an average PIC value of 0.27 (range, 0.04–0.49) for 20 SNP markers used to screen a diverse set of 12 varieties from Brazil, Canada, China and Mexico. Our study also included the 20 SNP markers reported by Somers et al. (2003); however, of the original 20, only 13 markers generated a usable and reproducible score that could be analysed. In the wheat lines used here, the 13 markers also had an average PIC value of 0.27 (range, 0.07–0.49). In a similar study, Chao et al. (2009) reported that a set of 359 wheat SNPs derived from introns had a mean PIC value of 0.18 when used to screen 20 US cultivars, and a mean PIC value of 0.23 when used to screen a more diverse germplasm set. Examination of the PIC value of the markers mapped to the various 21 homoeologous chromosomes suggested that, although there were differences between chromosomes, with markers on chromosomes 2B, 2D, 4B, 5B and 5D having lower than average PIC values (ranging between 0.18 and 0.22), the differences were not significant. Although the PIC values for the various padlock probes were comparable with those previously obtained by Somers et al. (2003) and Chao et al. (2009), we noticed that a relatively large number of padlock probes had low PIC values, suggesting that these probes were only capable of discriminating between a small number of lines. For instance, 65 padlock probes (39.6% of the polymorphic probes) had PIC values of less than 0.2 (i.e. approximately 90% of the lines had the same allele), whereas for 18 padlock probes (11%) only one line was identified as having the alternative allele. The occurrence of relatively large numbers of padlock probes with low PIC values was not unexpected, as many of the SNPs/flanking regions used to design the probes were derived from sequences generated from either diploid or tetraploid progenitors of wheat, or hexaploid material genetically distinct from UK material. This was confirmed by examining those lines that appeared to carry relatively large numbers of rare alleles. For instance, the variety CM-82036 (a CIMMYT shuttle line) carried 15 rare alleles, Norin 10 (a Japanese line) carried 13, Chinese Spring (Chinese) and Recital (French) carried 12, and Lynx (UK), Monopol (German), Badger (UK), the two synthetic lines, Arina (German), Brevor (USA/Canadian), Hedgehog (UK) and Piko (German) each carried seven to nine rare alleles.

The data derived from the informative padlock probes were analysed using xlstat™ to generate the dendrogram shown in Figure 5. Although this dendrogram serves only as an example of how padlock probe data can be employed, it allows a number of general comments to be made about the relationships suggested by padlock probe data. Firstly, the two Scorpion25 DNAs prepared from the same plant showed less than 1% dissimilarity from each other, and hence set a lower limit for the dissimilarity value for any two lines. As expected, the two synthetic lines clustered together and, along with Chinese Spring and Norin 10, represented three outlying groups. In addition, the two Thatcher lines clustered together (~13% dissimilarity), as did the four Mercia lines, including the Mercia lines carrying the Rht-B1 locus (2.5% and 10% dissimilarity). The non-UK lines Thatcher (North American), Marquis (North American), Bezostaya (Russian), Cheyenne (North American) and Brevor (North American) all clustered together, as did the second ‘spring’ wheat grouping Scorpion25, Cadenza, Xi19 and Warlock24. Of interest, this latter grouping also contained the US spring wheat Andrew and the Indian spring wheat Gehun. The Germanic lines Monopol and Arina also showed some evidence of clustering, and these two lines showed some similarity to the CIMMYT line CM-82036. The largest grouping, comprising 26 lines, contained the majority of the UK winter wheat lines genotyped by the padlock probes. These lines were selected to represent material used in UK breeding during the 1930s (Holdfast), 1950s (Cappelle Desprez), 1970s (Maris Huntsman, Hobbit, Mercia), 1980s (Norman, Galahad and Riband), 1990s (Cadenza) and 2000s (Scorpion25, Smuggler, Warlock24 and Xi19). Of interest, and somewhat surprisingly, was the close relationship seen between the lines Malacca, which is a current Nabim Class 1 UK Recommended List variety, and Hobbit, a variety released in 1977. The padlock probe data suggest that these two lines differ from one another by ~12% (19 of 162) of the polymorphic markers used.

Figure 5.

Genetic relationships between a group of 57 wheat lines screened with 211 padlock probes. The program ‘xlstat™’ was employed to generate the dendrogam using the unweighted pair group method.

Genotyping HMW-glutenins

One of the main traits of interest to wheat breeders is the bread-making quality of the flour. Payne et al. (1987) suggested that between 47% and 60% of the variation in bread-making quality can be attributed to variation in a specific class of seed storage protein – the HMW-glutenin subunits. The HMW-glutenin loci Glu-A1, Glu-B1 and Glu-D1 are located on the long arms of homoeologous group 1 chromosomes 1A, 1B and 1D, respectively. Each HMW-glutenin locus can contain up to two glutenin genes, referred to as the x and y types. Although protein gel-based assays for the x and y types on the three homoeologous chromosomes are routine in many laboratories (Butow et al., 2003; Ma et al., 2003), DNA based screens for the specific HMW-glutenins can be problematical as a result of the sequence similarities between the various HMW-glutenins and the presence of multiple repeat sequences within each subunit (Butow et al., 2003). We therefore assessed the possibility that padlock probes might be a useful method to assess the complete HMW-glutenin allele content by including 23 padlock probe pairs in the 211-probe mix. Of these 23 padlock probes, 13 were polymorphic in the lines used. The polymorphic probes included three specific for the Glu-A1 locus, seven for the Glu-B1 locus and three for the Glu-D1 locus. The use of these 13 probes only to cluster the various lines resulted in 15 generic groups (Table 2; Figure 6). To assess whether padlock probes were capable of identifying the HMW-glutenin content of the various wheat lines, we compared the results obtained here with those of the on-line HMW-glutenin database (http://www.aaccnet.org/grainbin/gluten_gliadin.asp). Comparison of these two sets of data revealed a close correlation. For example, Padlock Cluster Group 1 (Table 2) consists of lines with a null A-genome glutenin allele, a 7 or 7 + 8 B-genome glutenin content and a 2 + 12 D-genome glutenin (null, 7/7 + 8, 2 + 12), whereas Group 2 lines (Table 2) appear to have a 1, 7 + 9, 2 + 12 HMW-glutenin content. Hence, in nearly all cases, lines with a known HMW-glutenin content grouped with lines containing either identical or very similar glutenin alleles.

Table 2.  Published high-molecular-weight (HWM)-glutenin genotypes compared with the padlock probe clusters
LineRecorded A-genome subunitRecorded B-genome x/y subunitsRecorded D-geno me x/y subunitsPadlock Cluster Group
Cap. DesprezNull 72 + 121
Ch. SpringUnknown 7 + 82 + 121
Galahad1 72 + 121
HobbitNull 7 + 83 + 121
MalaccaNull17 + 182 + 121
Maris HuntsmanNull 6 + 83 + 121
RichmondNull17 + 182 + 121
RobigusNull 7 + 92 + 121
WaspNull 7 + 92 + 121
Revel J.UnknownUnknownUnknown2
Svilena1 7 + 92 + 122
Synthetic (Advanta)UnknownUnknownUnknown2
Synthetic (JIC)UnknownUnknownUnknown3
MerciaNull 6 + 85 + 109/4
Cheyenne2 7 + 95 + 105
Bezostaya2 7 + 95 + 105
Thatcher2 7 + 95 + 105
DrakeNull 6 + 82 + 126
GriffinNull 6 + 82 + 126
HedgehogNull 6 + 82 + 126
NormanNull 6 + 82 + 126
ParadeNull 6 + 82 + 126
Piko1 6 + 82 + 126
MithrasNull 6 + 82 + 126
RibandNull 6 + 82 + 126
VectorNull 6 + 82 + 126
VivantNull 6 + 82 + 126
WasmoNull 6 + 82 + 126
Rendezvous1 6 + 83 + 127
LynxNull 6 + 83 + 127
Holdfast1 7 + 85 + 108
Marquis1 7 + 95 + 108
Norin101 7 + 82 + 128
Smuggler117 + 185 + 108
MerciaNull 6 + 85 + 109/4
OberonNull 6 + 85 + 109
Recital2 6 + 85 + 109
BadgerNull14 + 155 + 1010
CadenzaNull14 + 155 + 1010
Scorpion25Null14 + 155 + 1010
Xi19Null14 + 155 + 1010
Warlock24Null14 + 155 + 1011
ArinaNull 6 + 83 + 1212
TellusNull 6 + 83 + 1212
TonicNull 7 + 85 + 1012
Brevor2 7 + 92 + 1213
UTACNull 7 + 82 + 1213
Monopol1 6 + 85 + 1015
Figure 6.

Genetic relationships between 57 wheat lines screened with high-molecular-weight (HMW)-glutenin-specific padlock probes. The program ‘xlstat™’ was employed to generate the dendrogam using the unweighted pair group method. The HWM-glutenin groups, described in Table 2, are indicated in the left-hand column.

Overall, the use of padlock probes to genotype the HMW-glutenin alleles present in the 57 lines generated results in keeping with the recorded HMW-glutenin genotype. However, there were some differences between lines for which no previous difference in the HMW-glutenin allele content had been recorded. For example, the recorded HMW-glutenin profile of Warlock24 is the same as that for Cadenza, Scorpion25 and Xi19, whereas the padlock probe data suggested that they were similar, but not identical. We believe that there are two possible reasons for this observation. First, it is possible, although unlikely, that padlock probes are variable in their ability to call particular HMW-glutenin alleles. However, as our results were consistent over several repeat experiments, we believe that a more probable explanation is that HMW-glutenin alleles are more variable than previously reported. For instance, most HMW-glutenin alleles have been scored using either protein-based polyacrylamide gel electrophoresis (Payne et al., 1987) or PCR assays across indels (Liu et al., 2008). We suggest that our padlock probe data show that both of these technologies probably underestimate the amount of DNA sequence variation present within the HMW-glutenin loci, and hence we believe that padlock probes may be a valuable tool to discriminate between closely related, but distinct, HMW-glutenin alleles.


The results presented here demonstrate that padlock probes can be used in relatively large numbers to characterize the wheat genome. The two main characteristics of this system, namely the requirement for both the correct SNP and perfectly homologous sequences 25 bases either side of the SNP, are useful attributes in discriminating between closely related homoeologues. Although technologies for quantitative SNP analysis, such as matrix-assisted laser desorption ionization-time of flight (MALDI-TOF)-based genotyping are being developed, currently the only alternative to using padlock probes to screen for alleles of genes of known agronomic importance is to first design homoeologous-specific primers to amplify homoeologous-specific PCR products and then to use either these products in conjunction with a method for allele discrimination, such as TaqMan, or single-strand conformation polymorphism (SSCP) in conjunction with capillary electrophoresis (Martins-Lopes et al., 2001). Both of these technologies have their uses but, in each case, it is relatively difficult to multiplex the assay. DArT technology offers an alternative multiplex genotyping platform; however, in the case of DarT, the markers are anonymous and have relatively low levels of polymorphism. Hence, one has to screen large numbers to identify markers that tag genes or regions of importance. Padlock probes, however, can be designed for any region known to be linked to the trait of interest. Given the results described here, it would be logical to design several probes to allow for poor probe performance and low levels of polymorphism, but, as described here, this is not a technical problem.

One of the main areas of interest for our group is the classification of the extent of genetic diversity in UK breeding material. Bread wheat is classified as an inbreeding species and, as such, in a commercial variety virtually all loci tend to be present in a homozygous state. Although we believe that padlock probes are more suited to screening species that have a high proportion of homozygous loci, preliminary results suggest that, with careful experimental design and the use of various controls, padlock probes are able to discriminate between homozygous and heterozygous states. However, until such experiments are completed, we recommend that padlock probes be used to screen varieties rather than to segregate breeding material.

Currently, we believe that there are three significant problems associated with padlock probes: first, there is a lack of verified SNP markers for individual homoeologues; second, padlock probes require a hybridization step which can be problematic when scaled up to include several hundred lines; finally, because of their size, padlock probes are expensive to synthesize. However, we estimate that a single padlock probe synthesis is sufficient for at least one million assays and therefore, based on an assay utilizing 211 padlock probe pairs, even with the high costs associated with the synthesis and purification of large oligonucleotides, the cost per assay should be significantly less than that of similar assays designed to examine a single locus.

Experimental procedures

Preparation of wheat genomic DNA

Wheat genomic DNA was prepared from 14-day-old seedlings, as described by Poole et al. (2007). One hundred nanograms of genomic DNA were digested with 1 unit of the restriction enzymes BstUI and SspI (New England Biolabs, Ipswich, Massachusetts, USA) for 30 min at 37 °C, as described by the manufacturer, and purified by ethanol precipitation. Digested genomic DNA was resuspended in sterile distilled water at a concentration of 5 ng/µL.


Padlock probes were designed from sequences derived from a variety of public sources (Supplementary Data S2, see Supporting Information); 5′-phosphorylated padlock probe oligonucleotides were synthesized by SigmaGenosys (Haverhill, Dorset, UK) and purified by polyacrylamide gel electrophoresis (PAGE).

Padlock probe assay

To carry out the padlock probe assay, we generally followed the procedure adopted by Baner et al. (2003) with the following modifications: 25 ng of digested genomic DNA was mixed with 1 µL of padlock probe oligonucleotide mix, consisting of 336 pairs of padlock probe oligonucleotides (Supplementary Data S3, see Supporting Information), each at a concentration of 10 fmol; the DNA–oligonucleotide mix was cycled in a Perkin-Elmer (Waltham, Massachusetts, USA) 9700 PCR machine with 1 unit of thermostable ligase (Epicentre, Madison, Wisconsin, USA) in a total volume of 30 µL using the following cycling conditions: 95 °C for 2 min and 72 °C for 20 min for 10 cycles. The entire reaction mix was mixed with 30 µL of a mixture containing 1 unit of Exonuclease I (New England Biolabs) and 0.025 units of Exonuclease III (GE Healthcare, Chalfont St Giles, Buckinghamshire, UK) in 1 × Exonuclease III buffer supplemented with 50 mm KCl. The reaction was allowed to proceed for 2 h at 37 °C before being terminated by incubation at 80 °C for 20 min. For PCR amplification, 5 µL of the Exonuclease reaction was combined with 24 µL of a HotStart Pfu PCR mix made up according to the manufacturer's recommendations (Stratagene), containing both the common forward primer (5′-ATCATGCTGCTAACGGTCGTC) and the two allele-specific primers (allele 1, 5′-AGAGCGCATGAATCCGTACA; allele 2, 5′-CCGAGAATGTACCGCTATCCA). Reactions were cycled 20 times between 98 °C for 20 s and 55 °C for 2 s.

Array hybridization and data analysis

The product of the padlock probe PCR was incubated at 95 °C for 5 min and immediately mixed with 50 ng of Cy3 detection probe (5′-Cy3TGGATAGCGGTACATCTCGG) and 50 ng of Cy5 detection probe (5′-Cy5TGTACGGATTCATGCGCTCT) in 30 µL of 2 × Nexterion hybridization buffer (Nexterion, Harrogate, North Yorkshire, UK), and applied to a single array slide printed with reverse oligonucleotide tags, as described in Supplementary Data S3 (see Supporting Information). Following the application of a cover slip, hybridization was allowed to proceed at 50 °C for 3 h. After hybridization, slides were successively washed in 2 × saline sodium citrate (SSC), 0.1% sodium dodecylsulphate (w/v) at 42 °C (2 × 5 min), 0.2 × SSC at room temperature (1 min) and 0.1 × SSC at room temperature (1 min). Slides were dried in a swing-out plate rotor by centrifugation (400 g) and scanned. Signal intensities were recorded using an Axon Instruments GenePix 4000B dual laser scanner, and data were collected using genepix™ pro 4.0 software (Axon Instruments, Union City, CA, USA). The data were sorted, and all features flagged as ‘bad’ by the genepix™ pro 4.0 software were excluded from further analysis. The median signal from the negative control values was subtracted from all data, and the information from the replicate spots on each array was pooled and averaged. The Cy5/Cy3 log10 ratios were then calculated for each probe pair and the data were exported to an Excel spreadsheet (Supplementary Data S3, see Supporting Information). Only padlock probe pairs which had a log10 ratio score of greater than unity between the two possible alleles were considered to be polymorphic (161 of 211 probes). Padlock probes which did not allow for a clear discrimination between the two alleles were considered to be monomorphic.


We are grateful to the Biotechnology and Biological Sciences Research Council, UK (BBSRC Agri-Food) for providing the main funding for this work (award BBS/B/01839). We would also like to thank the entire wheat community for providing various sequences prior to publication, and Dr Baner for providing the 25-mer tag sequences prior to publication.