• Open Access

Gene flow among wild and domesticated almond species: insights from chloroplast and nuclear markers


Malou Delplancke, Université Montpellier 2, Centre d’Ecologie Fonctionnelle et Evolutive UMR 5175, 1919 Route de Mende, 34293 Montpellier cedex 5, France.
Tel.: +33/0 4 67 61 22 58;
fax: +33/0 4 67 41 21 38;
e-mail: malou.delplancke@gmail.com


Hybridization has played a central role in the evolutionary history of domesticated plants. Notably, several breeding programs relying on gene introgression from the wild compartment have been performed in fruit tree species within the genus Prunus but few studies investigated spontaneous gene flow among wild and domesticated Prunus species. Consequently, a comprehensive understanding of genetic relationships and levels of gene flow between domesticated and wild Prunus species is needed. Combining nuclear and chloroplastic microsatellites, we investigated the gene flow and hybridization among two key almond tree species, the cultivated Prunus dulcis and one of the most widespread wild relative Prunus orientalis in the Fertile Crescent. We detected high genetic diversity levels in both species along with substantial and symmetric gene flow between the domesticated P. dulcis and the wild P. orientalis. These results were discussed in light of the cultivated species diversity, by outlining the frequent spontaneous genetic contributions of wild species to the domesticated compartment. In addition, crop-to-wild gene flow suggests that ad hoc transgene containment strategies would be required if genetically modified cultivars were introduced in the northwestern Mediterranean.


Gene flow, the movement of genes among lineages, plays an important role in the evolution of organisms by shuffling the genetic diversity within species (Rieseberg 1997; Petit and Excoffier 2009). Gene flow is quantitatively a major source of genetic variation within populations, thus acting as a primary force to balance the detrimental effects of genetic drift and maintain high effective population sizes (Lynch 2010). Because cultivated species have generally suffered strong bottlenecks through domestication (Doebley et al. 2006), gene flow involving wild species and their domesticated counterparts is valuable in the enrichment of their effective population sizes. Such genetic exchanges have long been reported and exploited by humans (Ellstrand et al. 1999). Wild species have been historically used as a source of genetic variation for crop improvement programs, resulting in important applications for plant breeding (Papa 2005). For example, based on genetic evidence on grapevine, Myles et al. (2011) demonstrated that Western European Vitis vinifera cultivars experienced introgression from local Western European Vitis sylvestris. In addition, crop-to-wild gene flow has received growing attention in the last decade (Felber et al. 2007; Arrigo et al. 2011). The phenomenon has important evolutionary consequences for local relatives because it may promote the origin of highly competitive genotypes, resulting in the exclusion of vulnerable wild species (Ellstrand et al. 1999) or into the development of aggressive weeds (Trucco et al. 2009).

The Rosaceae family provides an excellent model for exploring gene flow between domesticated and wild species. Indeed, hybridization has played a central role in the evolutionary history of the family (Coart et al. 2006), resulting in several large reticulated species complexes. Here, we focus on the almond tree, Prunus dulcis (Mill.) D.A. Webb (syn Amygdalus communis L. and Prunus communis Archang.), an economically important Rosaceae cultivated as a nut crop. The annual world production of almonds exceeds 1.83 million tons (FAO 2008), with half of the production located in California and the other half in Mediterranean Europe.

Almond trees belong to the subgenus Amygdalus (L.) Focke, an Irano-Turanian complex of Prunus including more than 30 species (Browicz and Zohary 1996) that radiated recently (Ladizinsky 1999; Potter et al. 2002; Yazbek, unpublished data). Although several Amygdalus species have been sporadically used for human consumption, only P. dulcis was domesticated to produce sweet almonds. The spatio-temporal origin of domestication is still controversial although several lines of evidence suggest that P. dulcis domestication originated in the Fertile Crescent during the first half of the Holocene (Browicz and Zohary 1996; Ladizinsky 1999; Willcox et al. 2009; Delplancke 2011). Archeobotanic remains of P. dulcis show that almond trees were already cultivated about 11 000 years ago (Willcox et al. 2008) and used throughout the Near East, complementing meat and other plant food (Martinoli and Jacomet 2004).

Prunus orientalis (Duhamel) is one of the wild counterparts of the cultivated almond tree. This taxon is one of the most common Amygdalus representatives occurring in the Near Eastern Mediterranean. It is widespread from northeast Iraq to south and central Anatolia, and commonly grows in contact with P. dulcis orchards. Because it shows substantial genetic differentiation with almond trees, P. orientalis is not considered as the sole potential wild ancestor of P. dulcis (Zeinalabedini et al. 2010). The ancestry of the latter species remains controversial, with a probable diffuse domestication process featuring several wild species that contributed to its current genetic pool (see Zeder 2006).

Several hybridization events involving the almond tree and wild relatives from the Amygdalus group have been reported. For instance, spontaneous wild-to-crop gene flow was detected in several Italian almond orchards, in which self-compatibility (i.e., species are otherwise self-incompatible) and specific morphological characters had presumably been introgressed from Prunus webbii (Spach) (Socias i Company 1998; Godini 2000). Moreover, crop-to-wild exchanges have long been suspected because of the wide range of intermediate phenotypes observed throughout western and central Asian species (Grasselly 1977; Grasselly and Crossa-Raynaud 1980; Denisov 1988; Browicz and Zohary 1996; Gradziel 2009). Such pervasive gene flow is consistent with the mating system (i.e., self-incompatibilty), insect-mediated pollination (Dicenta and Garcia 1993; Socias i Company 1998), and the perennial life cycle of almond, which promotes outcrossing and hybridization (Goodwillie et al. 2005; Petit and Hampe 2006). Finally, gene flow among Amygdalus taxa might be facilitated because a large proportion of the group shares a similar diploid chromosome number (2n = 16 chromosomes), including P. dulcis and P. orientalis (Grasselly 1977; Corredor et al. 2004), which may lead to viable hybrids (Browicz and Zohary 1996). In the current context of global genetic erosion, there is urgent need for a comprehensive understanding of the genetic relationships and the coexistence between cultivated, feral, and wild Prunus species in their centers of origin.

Because the reality of genetic exchanges between the cultivated form P. dulcis and one of its most widespread wild relatives P. orientalis has never been examined, we investigate gene flow and hybridization between these two key almond species in the Fertile Crescent, their supposed native area, using nuclear and chloroplastic markers. We outline the reciprocal genetic contributions of one species to the other, relying on microsatellite genotyping [hereafter simple sequence repeat (SSRs)], a category of molecular markers widely used for investigating evolutionary relationships between lineages having diverged recently. Moreover, by combining highly polymorphic, bi-parentally inherited nuclear SSRs with nonrecombinant, maternally inherited chloroplastic SSRs, we aim to

  • 1Assess whether nuclear and chloroplastic microsatellites are efficient markers for delineating species in the Amygdalus complex.
  • 2Characterize the genetic diversity of P. dulcis and P. orientalis.
  • 3Investigate the relative genetic contribution of a common wild species to the gene pool of the cultivated almond crops.
  • 4Assess the level of crop-to-wild gene flow.

Material and methods

Sampling and plant material

A total of 428 and 134 individuals were collected for P. dulcis and P. orientalis, respectively (Table 1). Sampling included 24 traditional orchards of P. dulcis and seven spontaneous wild populations of P. orientalis, located in the western part of the Fertile Crescent (i.e., Lebanon, Turkey, and Syria). Species were identified using morphological characters described by Browicz and Zohary (1996). The domesticated species P. dulcis is a nonspiny tree, containing numerous brachyblasts (i.e., short shoots) bearing relatively large leaves. Its wild counterpart, P. orientalis, is a smaller and subspinescent shrub, characterized by white tomentose shoots, leaves, and fruits (Fig. 1). Allopatric and sympatric populations of P. dulcis and P. orientalis were sampled by collecting fresh young leaves (dried and stored in silica gel) from each individual, and by registering the global positioning system coordinates of each sampled tree. Allopatric populations included 324 individuals for P. dulcis (19 populations, referred to as the ‘P. dulcis group’, hereafter ‘D’) and 49 for P. orientalis (two populations referred to as the ‘P. orientalis group’, hereafter ‘O’), respectively. Both species co-occurred in five sampling sites, representing 104 individuals for P. dulcis (referred to as the ‘P. dulcis sympatric’ group, hereafter ‘Ds’) and 85 for P. orientalis (referred to as the ‘P. orientalis sympatric’ group, hereafter ‘Os’).

Table 1. Sampling effort. The taxonomic identification of specimens, the population type (allopatric or sympatric) and name, its geographic coordinates, and the number of genotyped specimens are provided for each surveyed populations and for each marker type.
GroupPopulationCountryLatitude (° N)Longitude (° E)Nuclear SSRsChloroplast SSRs
  1. SSRs, simple sequence repeats.

Allopatric Prunus dulcis (D)Al ShaharSyria32.6636.6473
Bire AkaarLebanon34.5936.2451
Dahr al DjabalSyria32.6736.66228
El Nebi HabilSyria33.6036.0655
Hior Al LouzSyria32.7336.66166
Jourt HatarSyria32.6536.6021
Subtotal   32488
Sympatric P. dulcis (Ds)CiftehanTurkey37.5234.74295
Waadi Kafar SounSyria35.8136.5174
Subtotal   10441
Sympatric Prunus orientalis (Os)CiftehanTurkey37.5234.70245
Waadi Kafar SounSyria35.7436.4844
Subtotal   8545
Allopatric P. orientalis (O)GazentepTurkey37.0637.53243
Subtotal   493
 Total   562177
Figure 1.

 Specimens of Prunus dulcis, Prunus orientalis and putative hybrid, with focus on diagnostic morphological characters. The domesticated P. dulcis shows a tree habitus (A) and large green leaves (B). Putative hybrid shows an intermediate phenotype with large green and tomentose leaves (C). In contrast, the wild P. orientalis is a shrub (D, foreground) with tomentose leaves (E) and thorny shoots (F).

SSR genotyping

DNA was extracted from silica-dried leaves using the DNeasy® 96 plant kit (Qiagen, Hilden, Germany) with two modifications: samples were lysed 2 h at 65°C, and DNA was eluted in 200 μL of buffer AE. SSR amplification was performed in 96-well plates in a Mastercycler (Eppendorf, Hamburg, Germany) with the following parameters: 10 min at 95°C, 45 cycles of 30 s at 94°C, 90 s at 58°C, 90 s at 72°C, and 30 min at 72°C. Individuals were randomly distributed on PCR plates, with three wells per plate used as negative and migration controls.

All individuals were genotyped with twelve nuclear microsatellite loci (UDP96-001, UDP96-018, UDP96-003, UDP97-401, UDP98-408, UDP98-409, pchgms1, pchgms3, BPPCT017, BPPCT001, BPPCT007, and BPPCT025; Cipriani et al. 1999; Testolin et al. 2000; Dirlewanger et al. 2002), using four multiplexed PCRs. As chloroplasts have smaller effective population sizes than nuclear genomes (Powell et al. 1995) and may thus harbor lower variation at the intrapopulation level because of quicker allele fixation, a subsample of 177 of the 562 individuals was selected by optimizing the geographic coverage of the samples and was further genotyped with ten chloroplastic SSRs (TPScp1, TPScp2, TPScp3, TPScp4, TPScp5, TPScp7, TPScp8, TPScp9, TPScp10, and TPScp11; Ohta et al. 2005), using two multiplexed PCRs. Five loci (TPScp1, TPScp 2, TPScp4, TPScp7, and TPScp8) were later excluded from the dataset, because of a high missing value percentage that demonstrated amplification difficulties and/or presence of null alleles. For nuclear and chloroplastic markers, multiplexed PCRs were carried out with the Type It Microsatellite PCR Kit® (Qiagen) in a final volume of 10 μL, containing 1× of Qiagen Master Mix, 0.2 μm of a primers’ mix (0.4 μm for the chloroplast primers) and 2 ng/μL of template DNA. The reproducibility of reactions was checked using samples replicated on the different plates and reached 95.1% (i.e., the error rate was <0.05). GeneScan was performed on an ABI 3130 XL 16 capillary-sequencer (ABI Prism Applied Biosystems, Foster City, CA, USA), and allele calling was performed by two independent investigators using Genemapper (Applied Biosystems).

Insights on the genetic diversity using nuclear and chloroplastic data

For nuclear SSRs, allelic richness (Ar, the number of distinct alleles), allele size range (Range), observed heterozygosity (Ho), and unbiased heterozygosity (He) were estimated for each population including more than 15 individuals (Table S1). For chloroplastic SSRs, the number of distinct alleles (Ar), effective number of alleles (Ne, computed as inline image with pi as the frequency of the ith allele), and the unbiased expected heterozygosity (H) were estimated for each species. The diversity estimates relied on a rarefaction procedure to obtain comparable sampling efforts among groups (Petit et al. 1998), (15 individuals per populations for the nuclear dataset and 48 individuals per species for the chloroplastic dataset, respectively). The estimations were averaged from 1000 resampled datasets and computations were performed using custom r scripts (R Development Team, 2011) (available from the first author, on request). The results were averaged at the species level for the nuclear datasets. Statistical differences among species diversities were assessed by random permutation tests (1000 permutations, using custom R CRAN scripts available from the first author). Finally, the partitioning of genetic variation among species (i.e., P. dulcis versus P. orientalis) and sampling sites was quantified with a hierarchical analysis of molecular variance (AMOVA, significance levels were tested with 1000 permutations), using Arlequin (Excoffier and Lischer 2010). Here as well, only sampled sites with more than 15 individuals were included in the analysis.

Species boundaries and Bayesian estimation of admixture levels

Species boundaries were investigated using three distinct approaches. First, the chloroplastic SSRs were analyzed with a median joining network of haplotypes, using network (Bandelt et al. 1999). Second, the nuclear SSRs were investigated using a principal component analysis performed among specimens, using the ‘Ade4’r cran package (R, Core Development Team 2011). Third, for nuclear and chloroplastic SSRs, admixture proportions of P. dulcis and P. orientalis samples were estimated using a model-based Bayesian clustering of individuals, as implemented in structure 2.3.1 (Pritchard et al. 2000). This software uses a MCMC framework, in which the algorithm explores a parameter space considering individual admixture proportions, locus-specific ancestries, population allele frequencies, and the expected admixture of the dataset, assuming an user-defined K number of groups. The likelihood of each iteration was then evaluated by computing the probability of the model predictions given the empirical data (the computation assumes Hardy–Weinberg equilibrium within the K groups). The MCMC algorithm was set up for 200 000 burn-in steps (i.e., an initiation phase without results recording), followed by 1 000 000 steps for data acquisition (the remaining parameters were left as default values), assuming admixture. Each analysis was replicated ten times, and only runs with the highest maximum-likelihood values were kept for further investigations. The computations considered K values ranging between two and ten groups, and the optimal number of groups was assessed using the deltaK criterion (Evanno et al. 2005; Fig. S1). The same procedure was applied for the chloroplastic (129 and 48 individuals for P. dulcis and P. orientalis, respectively) and the nuclear (428 P. dulcis and 134 P. orientalis individuals) datasets, with haploid and diploid parameterization of the model, respectively. The structure outputs were handled using the SIMIL R script collection (Alvarez et al. 2008).

Coalescent models to estimate population sizes and gene flow

The effective population size of P. dulcis and P. orientalis and the magnitude of gene flow among species were inferred using coalescent-based methods implemented in Migrate-n (Beerli and Felsenstein 2001). Analyses were performed using a subset of six nuclear SSRs that followed a stepwise mutation model (i.e., UDP-408, pchgms3, BPPCT007, UDP96-018, UDP96-003, and BPPCT017; the stepwise mutation model assumption was checked by performing the frequency distribution of allele size – in repeat length – for each loci) and assumed two populations (i.e., P. dulcis versus P. orientalis specimens, as defined from morphological identifications). Using a maximum-likelihood approach and applying the MCMC search implemented in Migrate-n, the algorithm estimates the approximation of the effective population size of each species (as Θ = 4Neμ, the effective population size scaled by the mutation rate) and their reciprocal gene flow (as M = m/μ the migration rate scaled by the mutation rate). The heuristic searches relied on a preliminary run to refine the parameter search space (the initial values of Θ and M were estimated from FSTs, the searches included ten short chains of 1 × 106 generations with 5000 recorded genealogies followed by four long chains of 25 × 106 generations with 50 000 recorded genealogies, and a burnin of 10 000 generations was applied). Demographic parameters and their statistical significance were estimated from five additional independent runs (hereafter the ‘final runs’) that were initiated using estimations obtained from the preliminary run. Accordingly, Θ and M were initiated using normal distributions (mean = 7 and 18, standard deviation = 1 and 2, for Θ and M, respectively) and searches included five short chains of 12 × 105 generations with 1000 recorded genealogies, followed by two long chains of 2 × 106 generations with 10 000 recorded genealogies (a burnin of 1 × 106 generations was applied). All chains used the Brownian motion approximation as the mutation model and relied on adaptive heating to maximize the visited space (default parameters). The convergence of chains within runs was assessed with the Gelman–Rubin criterion (default parameter). The estimates were assumed as accurate when the 99% confidence intervals of demographic parameters were overlapping in at least two final runs. Finally, the statistical significance of gene flow was assessed with likelihood ratio tests that relied on alternative models considering M either as absent (i.e., Mdulcis-to-orientalisMorientalis-to-dulcis = 0) or asymmetric (i.e., Mdulcis-to-orientalis and Morientalis-to-dulcis were alternatively set to null). The complete test procedure was performed in Migrate-n.


Genetic diversity of almond trees, as revealed by nuclear and chloroplast markers

The nuclear SSRs (Table 2) revealed high levels of genetic diversity (Ar = 8.74 ± 1.24 alleles per loci, Range = 31.70 ± 2.74) and high heterozygosities (Ho = 0.73 ± 0.06 and He = 085 ± 0.05). In contrast, chloroplast markers showed limited polymorphism (Ar = 2.65 ± 1.67 alleles per loci) and low levels of effective number of alleles (Ne = 1.61 ± 0.66).

Table 2. Genetic diversity of Prunus dulcis and Prunus orientalis, as revealed by nuclear and chloroplast SSRs. The averaged and standard deviation (between parenthesis) across all loci are provided.
 Arnucl*Rangenucl*Honucl*Henucl* NcpArcpNecpHcp
  1. Ar, allelic richness; range, allele size range; Ho, observed heterozygosity; He, expected heterozygosity; Ne, effective number of alleles; H, unbiased expected heterozygosity; SSRs, simple sequence repeats.

  2. The sampling effort (N) and several diversity estimates are provided.

  3. *Diversity estimates from nuclear SSR markers.

  4. †Diversity estimates from chloroplastic SSR markers, inline image and inline image, where pi is the frequency of the ith allele.

P. dulcis 8.61 (1.00)32.93 (2.45)0.73 (0.06)0.85 (0.04)1292.89 (1.82)1.40 (0.37)0.24 (0.19)
P. orientalis 8.87 (1.47)30.46 (3.02)0.73 (0.05)0.85 (0.05)482.40 (1.67)1.82 (0.86)0.34 (0.32)
Global8.74 (1.24)31.70 (2.74)0.73 (0.06)0.85 (0.05)1772.65 (1.67)1.61 (0.66)0.29 (0.25)

Both Prunus species appeared similarly diversified (Table 2 and Table S1), as attested by nuclear (Arnucl = 8.61 ± 1.00 and 8.87 ± 1.47, Honucl = 0.73 ± 0.06 and 0.73 ± 0.05, and Henucl = 0.85 ± 0.04 and 0.85 ± 0.05 for P. dulcis and P. orientalis, respectively) and chloroplastic markers (Arcp = 2.89 ± 1.82 and 2.40 ± 1.67, Necp = 1.40 ± 0.37 and 1.82 ± 0.86, Hcp = 0.24 ± 0.19 and 0.34 ± 0.32 for P. dulcis and P. orientalis, respectively). No significant differences among species were detected for any diversity estimate.

The AMOVA outlined similar patterns in the genetic partition of P. dulcis and P. orientalis (Table 3). Both species showed similar percentages of genetic diversity throughout the investigated variation levels (FSC = 0.09 and 0.11, FIS = 0.11 and 0.10, FIT = 0.18 and 0.20, for P. dulcis and P. orientalis, respectively). In addition, most of the variation occurred within individuals, representing 81.55% and 80.09% of the genetic diversity within P. dulcis and P. orientalis, with only 3.07%, 8.78%, and 9.32% of the diversity occurring among species, among populations, and among individuals, respectively. Finally, species differentiation was significant but low, with genetic variance among species (FCT = 0.03) being lower than genetic variance among populations within species (FSC = 0.09).

Table 3. Analysis of molecular variance of nuclear simple sequence repeats, considering the partitioning of four levels of genetic variation (a–d) for Prunus dulcis and Prunus orientalis.
Variation levelSpeciesSum of squaresVariance componentVariation percentageFixation indices
  1. Significance levels were tested with 1000 per mutations, ***significant at 0.001.

a. Among speciesGlobal91.090.173.07FCT = 0.03***
b. Among population within species P. dulcis 389.510.468.58FSC = 0.09***
P. orientalis 146.590.5810.66FSC = 0.11***
Global536.110.498.78FSC = 0.09***
c. Among individuals within pop. P. dulcis 1857.990.539.87FIS = 0.11***
P. orientalis 645.850.519.24FIS = 0.10***
Global2503.850.529.32FIS = 0.11***
d. Within individuals P. dulcis 1573.004.3581.55FIT = 0.18***
P. orientalis 554.504.3980.09FIT = 0.20***
Global2127.504.3778.83FIT = 0.21***

Species limits and gene flow among almond trees

The chloroplast and nuclear datasets differed in their ability to discriminate P. dulcis from P. orientalis specimens. The chloroplast SSRs revealed 18 haplotypes (i.e., Fig. 2A H1 to H18), among which five were shared by both species (H5, H8, H9, H12, and H17) and were found in 76% of the global dataset (including respectively 85% of P. dulcis and 52% of P. orientalis specimens). The remaining haplotypes were species-specific and showed lower frequencies (except H7 that occurred in 33% of the specimens of P. orientalis). Notably, both species differed slightly in terms of private haplotypes (H1, H2, H3, H4, H6, H11, H14, H15, and H16 versus H7, H10, and H18, for P. dulcis and P. orientalis, respectively). In addition, P. orientalis included several haplotypes that were frequent but geographically restricted (H5, H7, H8, and H9), while P. dulcis included a frequent widespread haplotype (H5) along with several rare and geographically restricted variants (H1, H2, H3, H6, H11, H14, H15, H16, and H17). Finally, the median joining network (Fig. 2A), outlined unclear genetic limits among both Prunus species. These results were corroborated by the Bayesian clustering of specimens based on chloroplast SSRs (optimal K value determined following Evanno’s method, see further details in Fig. S1). Still, although chloroplastic haplotypes did not delineate clear species limits, they revealed consistent phylogeographic patterns (Fig. 2C). Haplotypes from Turkey (H1, H8, H17, and H18) were genetically distant from those mostly occurring in Syria and Lebanon (H2, H3, H6, H7, H9, H10, H11, H12, H14, H15, and H16). The remaining haplotype (H5) was mostly observed in P. dulcis specimens and showed the widest geographic distribution.

Figure 2.

 Insights from the chloroplastic simple sequence repeats as revealed by median joining networks (A and C) and model-based Bayesian clustering of specimens (B and D). On median joining networks, 18 distinct haplotypes are displayed as pie-charts reflecting the proportions of specimens occurring in (A) allopatric or sympatric populations (i.e., D/Ds and O/Os for Prunus dulcis and Prunus orientalis, respectively) and in (C) the three surveyed countries (i.e., Lebanon, Syria and Turkey). The radius of pies reflects the frequency of haplotypes in the global dataset, dots along edges correspond to mutational steps. Results from the model-based Bayesian clustering are displayed using barplots. Specimens were assigned to K = 2 genetic groups, defined using STRUCTURE. Each specimen is represented as a vertical bar where blue or red sectors reflect assignment probabilities to each of the two groups. Specimens are sorted according to their taxonomical status (B) or their geographical origin (D).

Nuclear SSRs revealed a clear genetic differentiation between P. dulcis and P. orientalis (Fig. 3). Indeed, both species were discriminated along the first two eigenaxes of the principal components analysis (Fig. 3A, accounting for 24% of the observed variance). These results were largely corroborated by the Bayesian clustering of specimens (Fig. 3B,C), where the deltaK spectrum (following Evanno’s method) identified K = 3 as the most likely number of groups (see further details in Fig. S1). Intraspecific patterns were revealed for P. dulcis where specimens were split into two groups; one geographically widespread and another restricted to Syria and Lebanon (Fig. S2). This signal could partly reflect the Lebanon and Taurus Mountains biogeographic splits (respectively parallel to the Mediterranean coast from southern Lebanon into Syria and extending from eastern to southwestern Turkey). Finally, P. orientalis specimens were included in a single group. With respect to interspecific variation, P. dulcis and P. orientalis were discriminated from each other throughout the Bayesian clusters, but species limits appeared as fuzzy in several cases. For instance, two specimens identified morphologically as P. dulcis were assigned to the P. orientalis genetic pool by the Bayesian clustering. Moreover, several specimens were assigned to both genetic pools (e.g., 14 and 18 specimens of P. dulcis and P. orientalis, respectively) with assignment probabilities ranging between 0.1 and 0.9. These specimens were collected either in sympatric or allopatric locations (Fig. 3C).

Figure 3.

 Focus on nuclear simple sequence repeats to highlight gene flow between Prunus dulcis and Prunus orientalis. (A) Principal components analysis of genotypes. (B) Admixture proportions of specimens –as estimated using STRUCTURE with K = 3 groups– displayed in a barplot. Each specimen is represented as a vertical bar, with blue, light red, dark red sectors reflecting assignment probabilities to each of the K = 3 groups. (C) Admixture proportions of specimens (figured as red or blue dots for P. dulcis and P. orientalis) to each of the K = 3 groups, are displayed as a ternary plot. Each specimen is represented as a point, with blue or red color reflecting the taxonomic status. For (A), (B) and (C), the specimens are labeled according to their taxonomical status and/or their origin (i.e. D/Ds and O/Os correspond to P. dulcis and P. orientalis specimens collected in allo- or sympatric populations, respectively).

The magnitude of gene flow, as well as the effective population sizes of P. dulcis and P. orientalis, estimated using a coalescent-based approach are summarized in Table 4. The estimates obtained from four out of five final runs showed overlapping confidence intervals and only the most likely results were provided. The analysis revealed large and comparable effective population sizes for both species (Θdulcis = 7.33 and Θorientalis = 7.11). Furthermore, substantial gene flow among species was detected, with Morientalis-to-dulcis = 16.96 and Mdulcis-to-orientalis = 15.64. In addition, likelihood ratio tests outlined gene flow as a significant parameter: scenarios considering absent or asymmetric gene flow produced significantly less explicative models than the full model allowing symmetric gene flow (Table 4).

Table 4. Maximum-likelihood estimates of effective sizes (Θ) and gene flow (M) of Prunus dulcis and Prunus orientalis populations, as revealed by nuclear SSRs. The most likely estimates and 95% confidence intervals (between parenthesis) over all loci are provided for (a) the full or (b) partial models. The partial models consider that gene flow is either totally absent or asymmetric.
Model*ΘdulcisΘorientalis Mdulcis-to-orientalis Morientalis-to-dulcisAIC§Log-likelihood§ P-value§
  1. SSRs, simple sequence repeats.

  2. *The analysis includes 428 P. dulcis and 134 P. orientalis specimens; the estimates are computed from a subset of six nuclear SSRs that follow a stepwise mutation model.

  3. †Θ = 4Neμ is an estimate of the effective population size, scaled by the mutation rate (μ).

  4. Msource-to-sink = m/μ is an estimate of the gene flow magnitude among source and sink populations, scaled by the mutation rate (μ). The number of immigrants in a sink population can be computed as Nem = ¼Θsink × Msource-to-sink.

  5. §Likelihood ratio test. The Akaike information criterion (AIC), the log-likelihood and the P-value (Ho: Partial = Full model) of models are provided for either the full or partial (i.e., constraining gene flow to null) models.

a. Full model
 With gene flow7.33 (0.48)7.11 (0.63)15.64 (3.79)16.96 (2.13)273.78−132.89
b. Partial models
 Asymmetric gene flow7.337.11016.96195047.56−97520.781 × 10−6
 Asymmetric gene flow7.337.1115.640225026.45−112510.231 × 10−6
 Without gene flow7.337.1100420889.73−210442.871 × 10−6


Relative performance of nuclear and chloroplastic molecular markers in discriminating species

Chloroplastic and nuclear markers differed in their ability to discriminate Prunus species, a result corroborating other studies that outlined the limited taxonomic resolution of chloroplastic markers, when compared to the nuclear ones (Rieseberg et al. 1996). Nuclear SSRs provided robust and highly informative signals, consistent with earlier studies showing the suitability of such markers to discriminate species and detect hybridization among Prunus species (Cipriani et al. 1999). In contrast, chloroplastic SSRs had a moderate transferability among species and showed low levels of polymorphism (Table 2). Furthermore, the detected haplotypes were more informative about phylogeographic patterns than for delimiting species boundaries (Fig. 2). These results could reflect limitations in taxonomic (177 individuals investigated with chloroplastic SSRs versus 562 for the nuclear SSRs) or genetic sampling (only four polymorphic markers), possibly resulting in underestimated levels of species differentiation. More likely, pervasive gene flow among species, with chloroplastic lineages evolving largely independently from species boundaries, but constrained by geographic features, might explain our results. Such patterns were indeed already reported from other plant species (Petit et al. 2005) and could result from unbalanced contributions of pollen and seeds to gene flow – a process causing more reticulated signals for chloroplast than nuclear markers (Petit et al. 2005). As a consequence, we mainly focused on nuclear genotypes to detect interspecific gene flow.

Genetic diversity of domesticated and wild almond trees

Our investigation revealed genetic diversity levels in domesticated almond trees that were as high as those reported from wild tree species. Indeed, P. dulcis and its wild counterpart, P. orientalis appeared as highly heterozygous (average of 0.73 and 0.85 for Ho and He, respectively, Table 2). Furthermore, genetic diversity was higher within individuals (78.83%) than between populations (8.78%), a pattern similar to that observed for other tree species that might be explained by both high level of pollen flow and life cycle characteristics of trees (juvenile phase and overlapping generations, Austerlitz et al. 2000). These results were also consistent with the self-incompatible mating system of both species. In addition, P. dulcis and P. orientalis appeared as similarly diversified (Table 2) when considering the number of alleles (Na), allelic richness (Ar), heterozygosities (He and Ho), and coalescent-based estimations of effective population sizes (θ = 7.33 and 7.11 for P. dulcis and P. orientalis, respectively). Finally, both species differed slightly in terms of phylogeographic structures, with a stronger regional differentiation observed in the domesticated P. dulcis (See Fig. S2). Our results thus contrasted with the high level of diversity loss usually observed in many annual seed-propagated crops (maize: Matsuoka et al. 2002; common bean: Papa et al. 2007; wheat: Haudry et al. 2007) but were congruent with insights revealed from perennial crops. For example, several cultivated perennials retained relatively high variation throughout domestication (e.g., tropical fruit trees, Hollingsworth et al. 2005; grapevine, Myles et al. 2011). Our results also suggested that the domestication of almond trees might not have suffered a substantial reduction in the original gene pool, because extant almond trees appeared as diversified as other wild Prunus species. This hypothesis would require further investigation (e.g., using sequence data). Indeed, the signature of bottlenecks might have been underestimated by our SSR markers, owing to their high mutation rates that are able to quickly replenish diversity losses typical of domestication processes (Glemin and Bataillon 2009). Alternatively, cultural practices might also partly explain our results. Almond orchards are propagated through seed reproduction, a strategy that has probably maintained high diversity levels in the domesticated forms (Grasselly and Crossa-Raynaud 1980). Finally, gene flow between crop and wild species could also account for the high observed diversities (e.g., Mariette et al. 2010).

Contributions of wild Prunus orientalis to domesticated cultivars

Our results clearly outlined ongoing gene flow between the wild P. orientalis and the domesticated P. dulcis (Figs 2 and 3, Table 4). Accordingly, several P. dulcis specimens were either admixed (with Bayesian assignment probabilities ranging between 0.1 and 0.9) or even genetically clustered within the P. orientalis genetic group (Fig. 3B,C). Furthermore, coalescent-based estimations of gene flow (Table 4) revealed that over the complete sampling area, P. orientalis contributed about 31 immigrants per generation to the P. dulcis pool (Θdulcis = 7.33 and Morientalis-to-dulcis = 16.96, Nem = ¼ΘM = 31). These results were corroborated by insights from other crops that revealed substantial genetic contributions from wild local species to the domesticated pool. For instance, patterns with a similar trend were observed for the maize (Matsuoka et al. 2002), the olive tree (Breton et al. 2008), several European grapevine cultivars (Myles et al. 2011), apples (Coart et al. 2006), or the bread wheat (Caldwell et al. 2004). In addition, wild species have long been used as a source of genetic novelty in breeding programs and almond cultivars are not an exception (Denisov 1988; Socias i Company 1998). Still, the present study outlined gene flow that was most likely an unintended genetic contribution from a wild species to domesticated almonds (i.e., the sampling included only traditional orchards, with no modern cultivars). From an ethnobotanic perspective, these results point out the question whether wild-to-crop exchanges remained exclusively spontaneous or whether traditional practices could have facilitated the fixation of introgressed wild genes in domesticated almond trees. The latter case was reported from many traditional agro-systems and involved for instance fig trees (Achtak et al. 2010) or olive trees (Aumeeruddy-Thomas et al. 2009). For almonds, several authors reported the direct use of wild species as rootstocks (Denisov 1988; Martinoli and Jacomet 2004), and in Italy, Godini (2000) and Socias i Company (1998) reported that self-compatibility (the allele Sf at the S-locus) could have been transferred spontaneously into cultivars from P. webbii, another wild relative. Additional ethno-botanical investigations are needed because traditional practices in almonds breeding remain largely unknown.

Genetic transfers from the domesticated Prunus dulcis to its wild relative

Nuclear SSRs showed that genetic exchanges between P. dulcis and P. orientalis were bidirectional and outlined substantial crop-to-wild gene flow (Figs 2 and 3 and Table 4). Indeed, the coalescent-based approach revealed that, over the complete sampled area, 28 domesticated migrants were introgressed into P. orientalis, at each generation (Table 4, Θorientalis = 7.11 and Mdulcis-to-orientalis = 15.64, Nem = ¼ΘM = 28). These results were consistent with mentions of hybridization within the Amygdalus species group (Browicz and Zohary 1996) and confirmed that P. dulcis genes could spontaneously be introgressed into their wild relatives. This scenario is highly likely if genetically modified almond cultivars (e.g., Agrobacterium-mediated transformation, see Gradziel 2009) were introduced in the Western Mediterranean. Such genetic transfers can have significant evolutionary consequences, especially if the inserted transgene is adaptative under natural conditions (see Felber et al. 2007 for a review). For instance, traits such as enhanced fertility (e.g Kron and Husband 2009), resistance to pests (Fladung et al. 2006) or viruses (e.g., plum pox virus, Scorza et al. 2007) could favor the emergence of highly competitive phenotypes if transferred into wild species. In addition, genetic exchange among related Prunus species (Grasselly 1977) could cause an uncontrolled spread of transgenes across the Amygdalus species complex (i.e., the bridge species concept, Felber et al. 2007).

As might be expected for a crop and its wild relative (Zohary 1984), the present study revealed substantial and ongoing gene flow among P. dulcis and P. orientalis. The magnitude of the detected gene flow consistently reflected the self-incompatible mating system of almond trees species and supported earlier mentions of admixture among cultivated and wild germplasms (Ortega and Dicenta 2003; Gradziel 2009).

Furthermore, we provided genetic evidence that wild lineages could have spontaneously contributed to the current cultivated gene pool. These results could confirm that the domestication of almonds might have been diffuse and characterized by recurrent genetic exchanges among the domesticated forms and the local wild relatives (Zeder 2006). Our results also highlighted the importance of including wild relatives when documenting the origins of almond domestication using genetic data, because gene flow from wild relatives can bias distance-based ancestry inferences (e.g., see van Heerwaarden et al. 2011).

Finally, our study revealed that crop-to-wild gene flow occurred commonly among the domesticated almond and at least one of its wild relatives. These results suggested that transgenes could potentially introgress into P. orientalis populations and further outlined the need for detailed characterization of crop-to-wild gene flow within the Amygdalus species complex. Therefore, ad hoc containment strategies of transgenes might be necessary if genetically modified almond cultivars are grown in sympatry with their wild relatives.


The present study was funded by the ‘Fruit Med Project’ and distributed by the French Agropolis Foundation. N. Arrigo and N. Alvarez were funded by the Swiss National Science Foundation (grant No. 132747 and an Ambizione fellowship PZ00P3_126624, respectively). Genotyping was done in the ‘Service Commun de Marqueurs Génétiques en Ecologie’ of the CEFE. The authors are very grateful to Rebecca Rundell for constructive comments on the manuscript and to fieldwork collaborators for their help during sample collections: A. Chehade, L. Chalak, B. Hamadeh, A. El Bitar (Lebanese Agricultural Research Institute); S. Beyazit, S. Cerce; K. Gündünz (Mustapha Kemal University, Turkey); and A. Al Ibrahem (General Commissinon for Scientific Agricultural Research, Syria), A. El Ibrahim (Centre National de Recherche Agronomique, Syria).

Data archiving statement

Data for this study will be available at Dryad: doi:10.5061/dryad.5f41fq18.