Population-based resequencing revealed an ancestral winter group of cultivated flax: implication for flax domestication processes

Authors

  • Yong-Bi Fu

    Corresponding author
    1. Plant Gene Resources of Canada, Saskatoon Research Centre, Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
      I would like to dedicate this intellectual contribution to Dr. Ken W. Richards for his contribution to and retirement from the Plant Gene Resources of Canada
    Search for more papers by this author

Yong-Bi Fu, Plant Gene Resources of Canada, Saskatoon Research Centre, Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada. Tel: 306-956-7642; Fax: +306-956-7246; E-mail: yong-bi.fu@agr.gc.ca

I would like to dedicate this intellectual contribution to Dr. Ken W. Richards for his contribution to and retirement from the Plant Gene Resources of Canada

Abstract

Cultivated flax (Linum usitatissimum L.) is the earliest oil and fiber crop and its early domestication history may involve multiple events of domestication for oil, fiber, capsular indehiscence, and winter hardiness. Genetic studies have demonstrated that winter cultivated flax is closely related to oil and fiber cultivated flax and shows little relatedness to its progenitor, pale flax (L. bienne Mill.), but winter hardiness is one major characteristic of pale flax. Here, we assessed the genetic relationships of 48 Linum samples representing pale flax and four trait-specific groups of cultivated flax (dehiscent, fiber, oil, and winter) through population-based resequencing at 24 genomic regions, and revealed a winter group of cultivated flax that displayed close relatedness to the pale flax samples. Overall, the cultivated flax showed a 27% reduction of nucleotide diversity when compared with the pale flax. Recombination frequently occurred at these sampled genomic regions, but the signal of selection and bottleneck was relatively weak. These findings provide some insight into the impact and processes of flax domestication and are significant for expanding our knowledge about early flax domestication, particularly for winter hardiness.

Introduction

Cultivated flax (Linum usitatissimum L.) is a multiple purpose crop being utilized for oil and fiber and its early domestication history may involve multiple events of domestication for oil, fiber, capsular indehiscence, and winter hardiness (Allaby et al. 2005; Fu et al. 2012). Recent discovery of the early domestication event for capsular indehiscence (Fu 2011) not only supports this argument, but also stimulates more searches for clues on the early domestication history. Dehiscent flax (i.e., cultivated flax with spontaneously opening capsules) is genetically unique and displays close relatedness to its wild progenitor, pale flax (L. bienne Mill. or previously L. usitatissimum L. subsp. angustifolium [Huds.] hell.; Hammer 1986). In contrast, winter flax (i.e., cultivated flax with a vernalization requirement) is closely related to oil or fiber forms of cultivated flax and distantly related to its progenitor (Fu 2011). Possibly, these findings are clouded with inadequate sampling of diverse flax and/or limited genomic sampling with insufficient molecular markers (Uysal et al. 2010; Fu et al. 2012). Given the facts that capsular dehiscence and winter hardiness are two major characteristics of pale flax and that cultivated flax was spread from the warm Near East to the cold Europe (Maier and Schlichtherle 2011), we hypothesized that winter hardiness was among those flax traits that human domesticated early (Fu 2011). We further reasoned that sampling more ancestral genetic diversity (Charlesworth 2010) may help to reveal closer relatedness between winter flax and pale flax, as some winter flax may have experienced differential domestication pressure over time and still carry more ancestral polymorphism.

Assessments of genetic relationships among various groups of cultivated flax with unique domestication-associated traits can provide insights into its domestication paths, as trait-specific groups should carry unique genetic traces of plant domestication accumulated over time (Fu 2011) and groups with early versus recent domesticated traits may display different levels of genetic relatedness to its progenitor (Zohary 1999). Early efforts were made to group cultivated flax based on specific flax traits (Elladi 1940; Dillman 1953; Kulpa and Danert 1962) to facilitate flax germplasm conservation, utilization, and research. The commonly referred or applied groups of cultivated flax are oil flax (i.e., cultivated flax with improved oil composition), fiber flax (i.e., cultivated flax with improved fiber characters), dehiscent flax, and winter flax (Diederichsen and Fu 2006). Interestingly, these four traits are associated with flax domestication (Hammer 1984; Uysal et al. 2012). Generally, cultivated flax is an annual, self-pollinating crop, has variable seed dormancy, grows fast with large variation in the generative plant parts, and has early flowering, almost indehiscent capsules, and large seeds. However, pale flax is a winter annual or perennial plant with narrow leaves and dehiscent capsules, and usually displays large variation in the vegetative plant parts and variable growth habit (Diederichsen and Hammer 1995; Uysal et al. 2012).

Pale flax has been identified as the wild progenitor of cultivated flax (Tammes 1928; Gill 1987; Fu et al. 2002; Fu and Allaby 2010). The archeological records of pale flax were obtained first from Tell Abu Hureyra in northern Syria (11,200–10,500 years ago) (Hillman 1975) and then throughout the Near East by the 8th millennium BC (Zohary and Hopf 2000). The archeological finds from Tell Ramad in Syria (9000 years ago) revealed the first occurrence of cultivated forms of flax with an increase in seed size (van Zeist and Bakker–Heeres 1975). archeological evidence also existed for flax spreading from the Near East to Europe and the Nile Valley (Maier and Schlichtherle 2011). The recent archeological finds in southwest Germany revealed larger flax seeds in the earlier, than later, phase of the Late Neolithic (4000–2500 cal. BC) (Herbig and Maier 2011). The flax varieties that spread into the Danube valley were winter oil varieties. However, summer fiber varieties developed in eastern Europe also spread into central Europe and replaced the original varieties (Helbaek 1959; Diederichsen and Hammer 1995). All modern fiber varieties may have originated from eastern Europe (Helbaek 1959). Nowadays, flax is cultivated in more than 60 countries around the world (Fu 2005). The rest of the early history of flax domestication, however, remains unknown (Zohary and Hopf 2000; Allaby et al. 2005).

The objective of this study was to assess genetic diversity and genetic relationships of 48 Linum samples representing pale flax and four trait-specific groups of cultivated flax (dehiscent, fiber, oil, and winter flax) through population-based resequencing at 24 genomic regions. Recent development of genomic resources in Linum species through Roche 454 pyrosequencing technology (454 Life Sciences, Branford, CT) (Fu and Peterson 2012) made the genetic sampling of flax genome more feasible than before.

Materials and Methods

All flax accessions studied here were obtained from the flax collection at the Plant Genetic Resources of Canada (PGRC; Table 1). They include 10 pale flax accessions from Turkey and Greece and 38 cultivated flax accessions from 26 countries. The selection of pale flax accessions is limited due to the lack of widely distributed pale flax germplasm and the selected ones represent only the central part of its natural distribution spanning the western Europe and the Mediterranean, north Africa, western and southern Asia, and the Caucasus regions (Diederichsen and Hammer 1995). The cultivated flax accessions were selected based on previous phenotypic and genetic studies (e.g., see Diederichsen and Fu 2006) to represent four major intraspecific groups of cultivated flax (dehiscent, fiber, oil, and winter flax). The winter flax accessions sampled cultivated flax developed with winter hardiness from eight countries. The dehiscent flax accessions represent the primitive form of cultivated flax with dehiscent capsules and have been long accumulated from flax cultivation in the cultivated flax gene pool (Hegi 1925). For this study, the dehiscent flax accessions were empirically verified for capsular dehiscence, and the selected pale flax accessions were assessed for their taxonomic identity in the greenhouse.

Table 1.  List of 48 accessions of wild and cultivated flax sequenced, with their species/group, origin country, sad2 haplotype, inferred cluster, and label.
CN1Species/group2Description3Origin3H- sad23Cluster3Label4
  1. 1CN, Canadian National accession number at Plant Gene Resources of Canada (PGRC), Saskatoon, Canada; T, temporary number for accessions that were acquired, but not yet added to the PGRC germplasm collection.

  2. 2Lb, Linum bienne; Lu, Linum usitatissimum. Four letters (d, f, o, w) represent four trait-specific groups of cultivated flax (dehiscent, fiber, oil, winter), respectively.

  3. 3Description of an accession includes the record, if available, for varietal or local name, location, and feature; Origin, origin of country; H- sad2, sad2 haplotype obtained from Fu et al. (2012); four clusters inferred using the BEAST program.

  4. 4Accession label is consisted of the first letter for species (P, L. bienne) or group of cultivated flax (D, dehiscence; F, fiber; O, oil; W, winter), followed by the numbers distinguishing among accessions within a species or group.

T19719LbIsland of EviaGRCIC2P1
T19716LbRhodes airportGRCIC2P2
113606LbSamsunTURXIC3P3
113622LbAntalyaTURIIC1P4
113627LbSinopTURIVC2P5
113628LbKarabükTURVC2P6
113630LbKastamonuTURIXC3P7
113633LbZonguldakTURIXC3P8
113638LbÇanakkaleTURIIC1P9
113642LbTrabzonTURXIC3P10
97606Lu-d ESPIIIC1D1
100852Lu-dGrandalPRTIIIC1D2
100910Lu-dGrandalPRTIIIC1D3
97769Lu-dAberticoPRTIIIC1D4
97473Lu-d RUSIIIC1D5
98833Lu-d RUSIIIC1D6
97605Lu-d RUSIIIC1D7
100837Lu-d TURIIIC1D8
98986Lu-fCristaBELIXC4F1
101017Lu-fBaladiCHNVIIIC4F2
101388Lu-fSaskaiCZEIXC4F3
98475Lu-fFlachskopfDEUVIIIC4F4
101111Lu-fVikingFRAXC4F5
98946Lu-fTalmune FiberNLDIXC4F6
101120Lu-fLianaPOLXC4F7
97325Lu-fKotowieckiPOLVIIIC4F8
18991Lu-fNikeRUSIXC4F9
101397Lu-fPskovski 2976UKRXC4F10
18974Lu-oCDC BethuneCANXC4O1
100832Lu-oBarbarigoCZEVIIC4O2
101171Lu-oHermesFRAXC4O3
18989Lu-oAtalanteFRAIXC4O4
101265Lu-oAmasonGBRVIC4O5
98256Lu-oArrevetiINDVIC4O6
97888Lu-oTomagoanIRNIXC3O7
101268Lu-oRaisaNLDIXC4O8
100917Lu-oRalugaROMIXC4O9
33399Lu-oBisonUSAXC4O10
98178Lu-w1285-SAFGXC3W1
96915Lu-wUruguay 36/49AUSVIC4W2
97009Lu-wBeladi Y 6903EGYIXC3W3
97004Lu-w ETHVIC4W4
98509Lu-w ISRIXC4W5
97102Lu-w PAKIXC4W6
96960Lu-w SYRIXC3W7
96848Lu-w TURIXC4W8
100828Lu-w TURXC3W9
100829Lu-w TURVIIIC4W10

DNA extraction

Plants were grown from seed for two to three weeks for cultivated flax and up to two months for pale flax in a greenhouse at the Saskatoon Research Centre, Agriculture and Agri-Food Canada. Young leaves were individually collected, freeze-dried, and stored at –20°C. A freeze-dried leaf sample of one individual plant from each accession was selected, and its genomic DNA was extracted with the DNEasy Plant Mini kit (Qiagen, Mississauga, Ontario, Canada). Extracted DNA was quantified with a Thermo Scientific NanoDrop 8000 spectrometer (Fisher Scientific Canada, Toronto, Ontario, Canada).

PCR and Sanger resequencing

Sanger resequencing was performed on 24 confirmed contigs available in the Linum genomic resources developed through the Roche 454 pyrosequencing technology (Fu and Peterson 2012). The contig selection was mainly based on its polymorphism and quality, as gene annotations on all developed contigs were incomplete and unverified (Fu and Peterson 2012). The PCR primers for 24 loci were designed using the on-line Primer Quest tool (Integrated DNA Technologies, Coralville, IA) (Table 2). The conditions for PCR were: 1× KAPA 2G Buffer A containing 1.5 mM MgCl2 (KAPA Biosystems, Woburn, MA), 1× KAPA Enhancer 1, 0.2 mM each dNTP, 0.4 pmol/µl each forward and reverse primers, 100 ng of the same genomic DNA template samples as used above for next-generation sequencing, and 0.5 U KAPA 2G Robust polymerase in a final volume of 25 µl; touchdown PCR cycled at 95°C for 3 min followed by 10 cycles of 95°C for 10 sec, 60°C decreasing 0.5°C per cycle for 15 sec, 72°C for 30 sec followed by 25 cycles of 95°C for 10 sec, 55°C for 15 sec, 72°C for 20 sec, followed by a final extension of 72°C for 30 sec. A 3-µl sample of each PCR product was separated on 1.5% agarose for 2 h at 120 V. PCR was performed on either a DYAD or PTC-200 thermocycler (Bio Rad, Mississauga, Ontario, Canada). PCR products were cleaned following the method outlined by Rosenthal et al. (1993) and submitted for Sanger sequencing at the DNA Technologies Laboratory at the Canadian National Research Council's Plant Biotechnology Institute (Saskatoon, Saskatchewan, Canada).

Table 2.  List of 24 primer pairs used by Sanger resequencing of 24 contigs representing polymorphic genomic regions in the 48 Linum accessions, along with the polymorphism and gene annotation information.
Primer1Sequence (5′>3′)1Tm (°C)2CL2Ts2Nh2π2Scaffold|GO2
  1. 1The primer set was labeled for the contig, followed by B and A for right and left primers for the contig. More primer information is available in Table S2 of Fu and Peterson (2012).

  2. 2Tm, annealing temperature; CL, contig length flanking by the primer set; Ts, the total number of segregating sites; Nh, the number of haplotypes observed; π, the estimate of nucleotide diversity; Scaffold|GO, the scaffold number and gene annotation number available at http://WWW.linum.ca, and gn means no gene annotation found for the contig.

031B/ACTCATCTTCTTCTTCCTTACATCTGACG/AACAGGACGCCCGAATGAATTG56.6/58.134610 50.0064sc453|gn
049B/ATGCAGGTGTGCCTGAATCTGACAT/AACAGGCCTTGGTGGGTCTAATGA60.8/60.333413 80.0078sc401|gn
071B/AAGGACCATTGTGTTGCAAGCATCC/CCAATCATCTTTGGATCTGTCCAGG60.2/57.528315170.0079sc297|g13204
145B/AGGACAAGGGTTCATTTCGTGAAAGCG/AGTRGCATCCTCGGAACTTCTCTT60.4/58.63643 50.0033sc530|g23272
151B/AACAAAGACACCAATGCTCCCTCCT/TCCRGGCATGGAAAGATATTAAGT60.5/55.036310 50.0077sc142|gn
204B/ATGTTTATTGACATAATTGGACGAAA/AACGCCCTTACGAATGRACAYTA51.4/56.52254 50.0042sc181|gn
221B/ATGTAGGGATAGCGAACGATAGTAAC/CCCTTTCATTCCACGGTAGCAA55.6/57.537012 80.0114sc186|gn
242B/AACTCTAACAGACAAGGCCACCGAT/GCCATACAAGCATGGATCCTGTCA59.9/59.23044 90.0061sc475|g20873
246B/AAATTCAGGGAGCGACACAGCCAGA/CAACCGTCGACAAGTTGGCAAGAA62.5/60.02595 50.0029sc1078|g37692
281B/AAACTCTGCTCTCATTCCTGCCGAA/ACCTCGAGTACATCTCGTTCGCAT60.3/59.72857 60.0060sc584|g24349
316B/ATGTGATCAATTGTGAAGACGAA/ATAATCTGCGTGCTCCCTCT52.4/55.922710 70.0129sc1937|gn
360B/ACCCAGAAGWCAAACTGATGTATGC/CCAGTGTTAGGTTTAAGCGTGCAG55.9/58.22878140.0068sc741|g29697
440B/AATCGTTCGTGGTCATTGGTTTGCC/ATGTGCGATGGCACCATGGAAATG60.2/60.523127 70.0411sc299|g8719
449B/AGATTCGTCGTCGTGTCAATG/CCACGGCAAACTTAGCAAAT53.8/54.02482 30.0018sc719|g33222
469B/ACTGATAGACCGCTATGGAACGTAG/AGGCTGAACTGCGAGAAAGTGGT56.4/60.944115 90.0078sc672|g27554
503B/ACATCGCCAAGCAACACTTCTCCAT/AGGTTGGAAAGGAGTACGAGCTGA59.9/59.631310170.0062sc983|g36290
524B/AGCAAGCCATACATGTGCCAGATTTGC/GCATTGATAGTGTTCTGATGCTGCCG61.0/60.12774 70.0033sc689|g27494
550B/ATCCATGTTTCTACGCAGTGAGG/TGCTCTGCAAGTGATGTTCATTGT56.9/57.43307 50.0062sc1204|g40050
586B/ACACTACCTTCTTCGAGGTGTGCAA/TCACAGCAGGATCATCACCGAACA58.8/60.32221 20.0183sc67|g3272
590B/AGTCAAGTGTATACGATTTCAACAAG/GGAAGGCACCAGTGACTACAAT52.3/57.02318 90.0095sc977|g11464
632B/ATGGGATAAATCGAAATCTGAGAGGA/GGTGCGTTTCACAGATTTAGCAGTCC55.4/60.02566 50.0151sc411|g17731
676B/ACCCTGGTTTACTCTCTCTGGTCAA/CCTTCGGCCGTGTTACGTTGTTT58.0/60.225715100.0104sc1159|gn
677B/ACTGGKATGCTRAATTGTGTTCTGC/GGCCACCTCTTCAAATTCTGCGAT56.6/59.81845100.0062sc1616|gn
712B/AGTTGAAATATCTAAACATTGCTGCTGA/CGTGGCTCAATTTAATGGTGACGG54.3/58.52492 30.0024sc436|g18214
Total  6886203480.0079 

Sequence analysis

All sequencing products were assembled with Vector NTI Suite's ContigExpress v9.0.0 (Invitrogen, Carlsbad, CA) and aligned using MUSCLE v3.6 (Edgar 2004). Aligned sequences with required length and quality were deposited into GenBank under accessions JN845641--JN846695 and JN861766 and those without are given in Table S1. Population genetic analyses of aligned DNA sequences were performed using DnaSP program (Librado and Rozas 2009). Several measures of sequence variation were obtained, and they are the number of segregating sites, haplotype number, nucleotide diversity (π; Tajima 1983), the signal of selection (i.e., deviation from neutrality; Tajima 1989; Fu and Li 1993), and the frequency of recombination (i.e., the minimum number of recombination events; Hudson and Kaplan 1985). The comparative diversity analyses were also done for different loci and various Linum groups. Haplotype analyses with and without gaps were performed using the DnaSP program. The positions of SNPs and indels for each haplotype were generated.

The genetic relationships of the 48 Linum samples were analyzed based on concatenated sequences using the Bayesian Markov chain Monte Carlo approach available in the BEAST v1.4 (Drummond and Rambaut 2007), as the concatenation approach tends to yield more accurate trees than the consensus one (Gadagkar et al. 2005). The maximum clade credibility (MCC) phylogenies were generated with a relaxed uncorrelated lognormal clock and with tree priors as constant size, expansion, or exponential growth. The substitution model was under an HYK model with gamma distribution for site heterogeneity. The rest of the options were applied with default values. This Bayesian approach should yield more informative phylogeny, as it directly calculates ultrametric phylogenies based only on observed data and model parameters and incorporates both the branch-length errors and the topological uncertainties (Rutschmann 2006). For comparison, the distance-based NeighborNet (Bryant and Moulton 2004) of the 48 samples was also generated using the SplitsTree4 (Huson and Bryant 2006) with the default options of Uncorrected_P and EqualAngle. The NeighborNet should display detailed reticulations where recombination may occur and yield more information for understanding the genetic relationships.

The optimal genetic structure of the 48 samples was also inferred based on concatenated sequences with a model-based Bayesian method available in the program STRUCTURE v2.2.3 (Pritchard et al. 2000; Falush et al. 2007). The STRUCTURE program was run 20 times for each subpopulation (K) value, ranging from 2 to 10, using the admixture model with 10,000 replicates for burn-in and 10,000 replicates during analysis. The final population subgroups were determined based on (1) likelihood plot of these models, (2) the change in the second derivative (ΔK) of the relationship between K and the log-likelihood (Evanno et al. 2005), and (3) stability of grouping patterns across 20 runs. For a given K with 20 runs, the run with the highest likelihood value was selected to assign the posterior membership coefficients to each accession. A graphical bar plot was then generated with the posterior membership coefficients. To assess the consistency of structural inference, an additional analysis was also made with the Bayesian method available in the BAPS software (Corander et al. 2008). Individual samples were clustered using the model for linked (or concatenated) markers and 20 replicate runs of the algorithm with the upper-bound values (K) for the number of clusters ranging between 2 and 10.

An analysis of molecular variance (AMOVA) was performed based on concatenated sequences using Arlequin v3.01 (Excoffier et al. 2005) to quantify nucleotide variation between species and among various groups and inferred clusters of Linum accessions. Three models of genetic grouping were considered: pale flax versus cultivated flax; five groups of pale flax and cultivated flax; and four clusters of Linum samples inferred using the BEAST program. The significance of variance components and intergroup genetic distances (or pairwise group Fst) for each model was tested with 10,000 random permutations. The analysis also generated group-specific Fst values in each model.

Coalescent simulation for bottleneck

The intensity of the bottlenecks associated with flax domestication was estimated following the procedures described in Haudry et al. (2007) using Hudson's ms program (Hudson 2002). The procedures applied simple demographic model of reduction in effective population size, assumed that an ancestral population experienced an instantaneous change in effective population size many generations ago (t) and no population expansion after the bottleneck. The bottleneck intensity α was defined as the ratio of the wild population size (Na) to cultivated population size (Np). Higher values of α correspond to more severe bottlenecks. The model had five parameters (Na, Np, τ, θwild, and 4Nc) and the last three are the time after the bottleneck, the ancestral nucleotide diversity, and the population recombination at the locus, respectively. In this simulation, we assumed Na= 30,000 similar to those predicted in wheat and barley domestication (Badr et al. 2000; Haudry et al. 2007) and cultivated flax had gone through (t =) 9000 generations (or years) of domestication, so that τ= 0.15α. The estimates of θwild and 4Nc at each locus for pale flax were obtained in this study. A set of 19 values of α was explored on a grid ranging from 1 (no reduction in effective population size) to 10 (severe reduction in effective population size), with 5000 simulations and an effective sequence length of 184 to 441 bp. The proportion of 5000 runs that simulated π is within 20% of the observed π was calculated for each α value for each of five domestication groups (dehiscent, fiber, oil, winter, and all cultivated flax samples). The average bottleneck intensity for each domestication group was estimated following Haudry et al. (2007) by calculating a multilocus likelihood as the product over 24 locus-specific likelihoods and maximizing the multilocus likelihood with respect to α. A 95% confidence interval was also constructed around the estimate of α by determining the value of α at which the log-likelihood value was 2 log-likelihood units lower than the maximized likelihood.

Results

Nucleotide polymorphism

The Sanger resequencing generated a total of 1152 sequences of 24 DNA fragments for 48 Linum samples (Table 2). These DNA fragments represented 24 unlinked loci sampled across the flax genome. Sixteen fragments were associated with predicted gene functions, mainly with different proteins, but were not fully annotated for further diversity analysis. The DNA fragments varied in length ranging from 184 to 441 bp and averaging 289 bp. The total length of 24 concatenated sequences for each sample was 6886 bp. The number of segregating sites per DNA fragment ranged from 1 to 27 and averaged 8.5. The number of haplotypes detected per DNA fragment ranged from 2 to 17 and averaged 2.

For pale flax, the number of segregating sites per fragment ranged from 1 to 25 and averaged 6.4 and the estimated nucleotide diversity ranged from 0.0027 to 0.0603 and averaged 0.0108 (Table 3). For all the cultivated flax samples, the number of segregating sites per fragment ranged from 1 to 25 and averaged 7.0 and the estimated nucleotide diversity ranged from 0.0008 to 0.0351 and averaged 0.0076. For all 24 loci, the overall nucleotide diversity was larger for pale flax (0.0097) than for cultivated flax (0.0071). For the four groups of cultivated flax, large variation in nucleotide polymorphism was observed (Table 4). The number of segregating sites per fragment ranged from 0 to 25 and averaged 4.6 for the dehiscent flax; 0 to 7 and 2.5 for the fiber flax; 0 to 22 and 4 for the oil flax; and 0 to 22 and 4.5 for the winter flax. The estimated nucleotide diversity ranged from 0 to 0.0522 and averaged 0.0075 for the dehiscent flax; 0 to 0.0157 and 0.0033 for the fiber flax; 0 to 0.0217 and 0.0051 for the oil flax; and 0 to 0.0578 and 0.0072 for the winter flax. For all 24 loci, the highest estimated nucleotide diversity was 0.0071 for the dehiscent flax, followed by the winter flax (0.0069), the oil flax (0.0053), and the fiber flax (0.0034).

Table 3.  Comparative nucleotide polymorphisms between pale flax and cultivated flax at 24 sampled genomic regions.
Primer S 1 π1 D 1 D/F1 Rm1 S π D D/F Rm
  1. 1Four polymorphism parameters are S for the number of segregating sites; π, the nucleotide diversity (Tajima 1983); D, selection test by Tajima's D (Tajima 1989); D/F, significant results obtained by Fu and Li's D* and Fu and Li's F* (Fu and Li 1993); Rm, the minimum number of recombination events (Hudson and Kaplan 1985); and significance of test, ns P > 0.05, # P≈ 0.05,*P < 0.05,**P < 0.01.

  Pale flax Cultivated flax
031B/A 90.01020.454  0 40.00401.126  0
049B/A 30.00421.152  0140.0089–0.343  1
071B/A100.0104–0.754  4110.0070–0.739  4
145B/A 20.00301.642  0 40.00491.967#ns|# 1
151B/A 90.01671.310*|# 0 30.00420.959  0
204B/A 10.00271.303  0 60.0056–0.344  0
221B/A110.0099–0.281  1100.01051.458  0
242B/A 50.00720.931#|ns 2 40.00560.992  2
246B/A 40.0041–0.943  0 30.0026–0.158  0
281B/A 40.00530.264  0 70.00630.200  0
316B/A 90.02511.219  0 60.00780.627  0
360B/A 80.01210.026  1 70.0055–1.013  3
440B/A250.06031.585  0250.03530.503  0
449B/A 20.00431.642  0 20.0008–1.102  0
469B/A 60.00651.455  0150.0088–0.202  0
503B/A 40.00480.204  1 90.0063–0.510  3
524B/A 20.00270.222  0 40.0034–0.041  2
550B/A 70.0065–0.584  0 50.00511.057  0
586B/A 20.0136–0.184  0 10.01250.976  0
590B/A 50.01160.981#|ns 0 70.0068–0.158  1
632B/A 70.0096–0.318  0 60.01612.778**ns|** 0
676B/A130.0162–0.458  1 80.0081–0.286  0
677B/A 40.00870.143  1 50.0054–0.605  2
712B/A 20.00320.120  0 10.00231.643  0
Total1540.00970.555 111670.00710.323 19
Table 4.  Comparative nucleotide polymorphisms among four groups of cultivated flax at 24 sampled genomic regions.
Primer S 1 π1 D 1 D/F1 Rm1 S π D D/F Rm
  1. 1Four polymorphism parameters are S for the number of segregating sites; π, the nucleotide diversity (Tajima 1983); D, selection test by Tajima's D (Tajima 1989); D/F, significant results obtained by Fu and Li's D* and Fu and Li's F* (Fu and Li 1993); Rm, the minimum number of recombination events (Hudson and Kaplan 1985); and significance of test, ns P > 0.05, # P≈ 0.05,*P < 0.05,**P < 0.01; nd, no data.

  Dehiscent group Fiber group
031B/A00.0000nd nd00.0000nd nd
049B/A90.01160.545*|ns010.0006–1.112 0
071B/A100.01400.138 020.00270.222 0
145B/A20.0019–0.448 040.00571.7724#ns|#0
151B/A30.00530.458 000.0000nd nd
204B/A30.0033–1.448 030.0042–0.431 0
221B/A70.01011.411 070.00890.900 0
242B/A20.00260.069 060.0064–0.366 1
246B/A20.0026–0.448 010.00211.303 0
281B/A30.00561.601 030.00370.021 0
316B/A00.0000nd nd00.0000nd nd
360B/A10.0011–1.055 040.0033–1.245 0
440B/A250.05220.248 000.0000nd nd
449B/A10.00170.334 000.0000nd nd
469B/A80.00740.258 060.0040–1.103 0
503B/A90.0122–0.060 150.0046–0.783 1
524B/A30.0027–1.448 020.00391.743# 1
550B/A40.00651.697ns|#000.0000nd nd
586B/A40.01080.182 110.00290.820 0
590B/A30.0040–0.813 000.0000nd nd
632B/A10.00251.444 070.01572.383***|**0
676B/A60.01101.022 050.0063–0.329 0
677B/A40.00940.081 120.0024–1.401 0
712B/A10.00251.444 010.00211.303 0
Total1110.00710.345 3600.00340.276 3
  Oil group Winter group
031B/A00.0000nd nd10.0006–1.112 0
049B/A60.00811.108 050.00711.334#|ns1
071B/A30.0032–0.507 030.00390.097 0
145B/A40.00591.953#ns|#040.00551.591ns|#0
151B/A10.0008–1.112 010.00190.820 0
204B/A10.00210.820 040.0043–1.245 0
221B/A70.00680.025*|ns070.00891.356*|#0
242B/A30.0042–0.130 060.00730.198 2
246B/A10.00211.303 010.00180.820 0
281B/A70.00940.329 040.00520.143 0
316B/A10.0009–1.112 000.0000nd nd
360B/A90.01150.026 370.0091–0.348 0
440B/A220.0217–2.053****|**0220.05782.405****|**0
449B/A00.0000nd nd20.0016–1.401 0
469B/A80.01002.093**|**0110.0089–0.255 0
503B/A20.00260.526 140.00530.686 1
524B/A20.0020–0.691 040.0044–0.521 0
550B/A50.00540.024#|ns050.00540.024#|ns0
586B/A00.0000nd nd00.0000nd nd
590B/A00.0000nd nd40.0055–0.400 0
632B/A70.01542.041**|**060.01752.1482*#|**0
676B/A40.0038–1.667##|#040.0037–1.245 0
677B/A20.0040–0.184 020.00420.019 0
712B/A10.00221.464 010.00211.303 0
Total960.00530.003 41080.00690.736 4

Selection, recombination, and bottleneck

For pale flax, no significant deviation from neutrality measured with Tajima's D was detected for any loci assayed, but two deviations from neutrality (one significant and one marginally significant) were observed for cultivated flax (Table 3). However, if based on Fu and Li's D* and F* tests, there were three possible significant deviations from neutrality for pale flax. For the four groups of cultivated flax, the largest number of significant (and/marginally significant) tests for deviation from neutrality based on Tajima's D was 5 for the oil flax, followed by the winter flax (2), the fiber flax (2) and the dehiscent flax (0) (Table 4). If based on Fu and Li's D* and F* tests, the largest number of significant (and/marginally significant) tests for deviation from neutrality was 7 for the oil flax, followed by the winter flax (6), the fiber flax (2) and the dehiscent flax (2).

The recombination analysis performed with the DnaSP program revealed large variation in recombination frequency with respect to species and group (Tables 3 and 4). The total number of recombination events at the 24 loci was 11 for the 10 pale flax samples and 19 for the 38 cultivated flax samples. The total number of recombination events at the 24 loci was four for the oil and winter flax and three for the dehiscent and fiber flax.

The coalescent simulations assuming a simple demographic model with observed values of related parameters revealed the extent of domestication bottleneck ranging from 1.5 to 2 for the four groups of cultivated flax and the whole cultivated flax samples (Table 5). Specifically, based on the estimated π for each group, the bottleneck intensity was estimated to be 2 for the oil flax group and 1.5 for the other groups. The estimates of the 95% confidence interval were also large, ranging from 1 to 3.0, depending on the group of interest.

Table 5.  The estimates of bottleneck intensity by coalescent simulations and proportional nucleotide variations among pale flax and four groups of cultivated flax obtained from the analysis of molecular variance at 24 sampled genomic regions.
Group (size)Intensity of bottleneck1Group-specific FstPairwise group Fst2
DehiscentFiberOilWinter
  1. 1The values in parentheses represent the 95% confidence intervals, estimated with 2 log-likelihood units lower than the maximum likelihood estimate of bottleneck intensity.

  2. 2The significance of test with ns P > 0.05,**P < 0.001,***P < 0.0001.

Pale (10) 0.2220.254***0.276***0.216***0.162**
Dehiscent (8)1.5 (1.2–1.7)0.254 0.500***0.424***0.351***
Fiber (10)1.5 (1.2–1.9)0.299  0.103**0.161**
Oil (10)2.0 (1.1–2.4)0.276   0.056ns
Winter (10)1.5 (1.0–3.0)0.254    
Cultivated (38)1.5 (1.0–1.9)     
Mean /range 0.261/0.222–0.299    

Genetic relationship

The BEAST program generated three MCC trees for the 48 Linum samples with three tree priors as constant size, expansion, and exponential, respectively. The phylogenies with the first two tree priors were exactly the same, although estimated branch lengths (or evolutionary rates) varied. The MCC tree with tree prior as expansion mirrored more closely with the NeighborNet by SplitsTrees4 described below. The MCC tree with tree prior as exponential had a cluster with mixed memberships from pale, winter, oil, dehiscent flax samples and was slightly less compatible with the NeighborNet. Figure 1 showed the MCC tree of the 48 Linum samples obtained with the tree prior as expansion. The cluster at the top (C1) consisted of eight dehiscent flax samples and two pale flax samples (P9 and P4), followed by a small cluster (C2) of four pale flax samples (P5, P1, P2, P6). The next cluster down (C3) had four pale flax samples (P8, P7, P10, P3), four winter flax samples (W3, W9, W7, W1), and one oil flax sample (O7). The bottom large cluster (C4) consisted of 25 samples representing fiber, oil, and winter flax. The detailed members of each cluster are given in Table 1.

Figure 1.

The maximum clade credibility trees of the 48 Linum accessions representing pale flax and four groups of cultivated flax obtained by the BEAST program based on 24 sampled genomic regions. The node bar for Length_95%_HPD is shown. The first capital letter of the sample label represents the flax group (Table 1). Four major clusters (C1–C4) are labeled on the branches. The ancestral winter flax group is highlighted.

Clearly, the winter flax samples were divided into two groups; one with C4 mixed with other cultivated flax samples and the other with C3 closer to some pale flax. The cluster C3 is unique and thus named as the ancestral winter flax group (Fig. 1), as this winter flax group displayed substantial ancestral polymorphism with and close relatedness to pale flax samples. The four ancestral winter flax samples were originated from Afghanistan, Syria, Turkey, and Egypt; the oil flax sample came from Iran; and four pale flax samples were collected from Samsun, Kastamonu, Zonguldak, and Trabzon regions of Turkey. All the members of the ancestral winter group were associated with three sad2 haplotypes (IX, X, XI; Table 1). Quantifying nucleotide variation among four inferred clusters of Linum samples revealed a significant (P < 0.0001) differentiation among these inferred clusters, which explained 35.8% nucleotide variation. The ancestral winter flax group (C3) was significantly differentiated from the other three clusters (Table 6).

Table 6.  The AMOVA results at 24 sampled genomic regions for four clusters of Linum accessions inferred by the BEAST program.
Cluster (size)1Cluster-specific FstPairwise cluster Fst2
C2C3C4
  1. 1The members of each cluster are given in Table 1. C3 represents the ancestral winter flax group.

  2. 2The significance of test with*P < 0.001,**P < 0.0001.

C1 (10)0.3370.306**0.317**0.392**
C2 (4)0.373 0.340*0.369**
C3 (9)0.339  0.366**
C4 (25)0.371   
Mean/range0.355/0.337–0.373   

The NeighborNet of the 48 Linum samples obtained (Fig. 2) revealed essentially the same patterns of genetic relationships as those in the MCC tree, but with higher resolution for recombination at the individual sample level. The winter flax samples also were divided into two clusters; one with six members was closely related to the oil and fiber flax samples, and the other with four members was closely related to the oil flax sample from Iran and became closer to four pale flax samples. The whole dehiscent group was closely related to the pale flax samples. The fiber flax samples were placed in a cluster with a large articulation and mixed with the oil flax samples.

Figure 2.

The NeighborNets of the 48 Linum accessions representing pale flax and four groups of cultivated flax obtained by the SplitsTree4 program based on 24 sampled genomic regions. The first capital letter of the sample label represents the flax group (Table 1). The four major clusters (C1–C4) obtained by the BEAST program (Fig. 1) are outlined and C3 is the ancestral winter flax group.

Genetic structure

The model-based inference of genetic structure within the 48 Linum accessions by STRUCTURE considered K= 2--10 clusters and revealed five optimal clusters with the highest log-likelihood value of –3400.3. The inference of the optimal number of clusters gained further support from the change in the second derivative (ΔK) of the relationship between K and the log-likelihood (results not shown). Figure 3A shows the inferred genetic structure and ancestry for the 48 Linum samples for three runs with the highest log-likelihood values under K= 4, 5, and 6. Clearly, the changes of ancestry between K= 4 and 5 and between K= 5 and 6 were not extensive. Under K= 5 (i.e., the five optimal clusters), the 10 pale flax samples were divided into four ancestral groups; one was species specific and three were shared with the dehiscent, winter, or oil flax samples. Interestingly, the largest number of Linum samples (6) sharing ancestry with the pale flax was observed in the winter flax, followed by those in the dehiscent flax (4), the oil flax (2), and the fiber flax (1).

Figure 3.

Genetic structure and ancestry of the 48 Linum accessions representing pale flax and four groups of cultivated flax inferred by STRUCTURE (A) and BAPS (B) based on 24 sampled genomic regions. Five optimal clusters were inferred by STRUCTURE and four optimal clusters by BAPS. Each sample is labeled on the bottom of graphical bars and the first capital letter of the sample label represents the Linum group (see Table 1). Note that the corresponding clusters may have different colors.

The model-based inference of genetic structure by BAPS revealed only four optimal clusters with little mixed ancestry (Fig. 2B). The pale flax was divided into two clusters, one of which was shared with one oil and four winter flax samples. The dehiscent flax formed one unique cluster, while the other cultivated flax samples formed another cluster. Clearly, a large number of fiber, winter, and oil flax samples were genetically related, except for those in the cluster mixed with pale flax.

Characterization of a priori genetic structure present in the 48 Linum samples using the Arlequin program revealed 15.7% nucleotide variation present between pale flax and cultivated flax and 26.1% residing among five Linum groups (one pale flax and four cultivated flax groups). The pale flax samples appeared to have the smallest group-specific Fst value (0.222), followed by the dehiscent and winter group (0.254), the oil group (0.276), and the fiber group (0.299) (Table 5). The pairwise group differentiations were large, ranging from 0.056 (for the group pair oil flax and fiber flax) to 0.500 (for the group pair dehiscent flax and fiber flax) and averaging 0.250 (Table 5).

Discussion

This study represents the first large resequencing effort to sample Linum genomic regions for the assessment of flax nucleotide diversity and inference of flax domestication history. The effort generated an interesting finding of an ancestral winter group of cultivated flax that displayed close relatedness to pale flax. A related diversity analysis revealed an overall 27% reduction of nucleotide diversity in cultivated flax when compared with the pale flax. Additional analyses showed that recombination frequently occurred at these sampled genomic regions, but the signal of selection and bottleneck was relatively weak. These findings provide some insight into the impact and processes of flax domestication and are significant for expanding our knowledge about early flax domestication, particularly for winter hardiness.

Few estimates of nucleotide diversity are available in Linum species (Fu et al. 2012). This study generated a new, useful set of nucleotide diversity estimates for two Linum species. The estimates at the 24 sampled genomic regions were higher (0.0071--0.0097) than those at the sad2 locus (0.0017--0.0052). For cultivated flax, the trait-specific group with the highest estimate of nucleotide diversity was dehiscent, followed by winter, oil, and fiber flax (Table 4). For the sad2 locus, however, the trait-specific group with the highest estimate of nucleotide diversity was winter, followed by oil, fiber, and dehiscent flax (Fu et al. 2012). Overall, the estimates of nucleotide diversity for these two species appeared to be compatible with those reported for outcrossing crops such as maize (Wright et al. 2005) and much higher than those for other inbreeding species such as wheat and barley (e.g., see Table 3 of Haudry et al. 2007). These findings are surprising, as a self-fertilization rate of 95% or higher was reported in cultivated flax (Robinson 1937). However, the mating system and gene flow in the wild populations of pale flax remain unknown, although two distinct genetic backgrounds were detected in pale flax accessions collected from Turkey and associated with site elevation and longitude (Uysal et al. 2010, 2012). Also, it is possible that the high estimates of nucleotide diversity reflect the effect of sampling genomic regions only with the most polymorphism.

The impact of domestication on cultivated flax seems to be only moderate at the sampled genomic regions. First, the overall reduction of nucleotide diversity (27%) in cultivated flax with respect to pale flax was not large, when compared with those for the inbreeding species such as wheat and barley (e.g., see Table 3 of Haudry et al. 2007, but also see Kilian et al. 2007). When trait-specific groups of cultivated flax are considered, the impact appears to be large, ranging from 27% to 65% and is compatible with those previously reported (Haudry et al. 2007). Second, the overall selection at these genomic regions was relatively weak, as significant deviations from neutrality were not extensive across all the genomic regions assayed (Tables 3 and 4). Third, the estimated intensities of domestication bottleneck for cultivated flax and trait-specific groups were also weak, ranging from 1.5 to 2, implying that the effective population size after domestication was onefold smaller than the effective population size in the wild progenitor population. These levels of bottleneck were considerably weak, when compared with those inferred in wheat (3; Haudry et al. 2007) and rice (3.5; Li et al. 2011). However, more extensive coalescent simulations for bottleneck are desirable with an expanded genomic coverage and outgroup sequence.

The BEAST program clustered the assayed winter flax samples into two groups, one of which was closely related to pale flax (Fig. 1). This ancestral winter flax group was significantly differentiated from the other three clusters including the other group of winter flax (Table 6). The genetic division in the winter flax samples gained further support not only from the NeighborNet analysis with the SplitsTree4 (Fig. 2), but also from the Bayesian inferences of genetic structure with the STRUCTRUE and BAPS programs (Fig. 3). The ancestral winter flax group was consistently formed with compatible inferences of ancestry for each member, although these Bayesian inferences varied between two methods such as in the optimal cluster number. Also, the STRUCTURE program seems to yield more information on ancestry for the ancestral winter flax group than the BAPS program. This may reflect the weakness of the BAPS Bayesian method or the effect due to the violation of linked marker with concatenated unlinked sequences.

The discovery of the ancestral winter flax group provides the first set of genetic evidence for early domestication for flax winter hardiness. As mentioned earlier, winter hardiness and capsular dehiscence are two major characteristics of pale flax (Diederichsen and Hammer 1995; Uysal et al. 2012). Previous genetic studies (Uysal et al. 2010; Fu 2011; Fu et al. 2012) showed that the dehiscent flax displayed more genetic similarity to its pale flax, but the winter flax displayed more genetic similarity to oil and fiber flax. The analysis here revealed the genetic division of the assayed winter flax samples; one group displayed more genetic similarity to pale flax. This is consistent with our original reasoning that the winter flax may have experienced differential domestication pressure and some of them still carry substantial ancestral polymorphism from pale flax (Charlesworth 2010). In contrast, the fiber flax samples displayed little ancestral polymorphism from pale flax at these genomic regions (see Fig. 3).

Another interesting result associated with the ancestral winter flax group is its inclusion of the oil flax sample from Iran. This result also has some implications. First, it supports the previous reasoning that flax was domesticated initially for oil, rather than fiber, use (Allaby et al. 2005). Second, it is consistent with the reasoning from the sad2 locus that multiple independent pathways of domestication of flax occurred after the initial domestication for oil use (Fu et al. 2012). Similarly, as cultivated flax was spread into Europe, winter hardiness was improved along with the selection for oil and fiber traits (Maier and Schlichtherle 2011), so that the nonancestral winter flax samples were well mingled with oil and fiber flax samples (Figs. 1 and 2).

Our study could be further improved for more informative inferences with enlarged sampling in various Linum groups and genomic coverage. However, extra efforts are still needed to collect pale flax samples from other regions of its species distribution and to assemble more trait-specific groups of flax germplasm (Diederichsen and Fu 2006; Uysal et al. 2012). The effects of genomic sampling cannot be completely excluded, as the 24 genomic regions were selected mainly based on the polymorphism. Expanding the genomic coverage would help to minimize such sampling effects. Also, 16 of the 24 genomic regions were associated with functional genes (encoding proteins) and should represent the transcribed regions of the flax genome, but it remains unknown that the detected polymorphism was truly ancestral variation from pale flax with respect to winter hardiness. Answering this question would require further investigation of genes or genomic regions knowingly associated with winter hardiness, but such genomic resources currently are still lacking.

The findings presented here are encouraging for searching clues on flax domestication processes. Winter hardiness was among those flax traits that human domesticated early (Fu 2011). This study, along with those companion investigations (e.g., see Uysal et al. 2010; Fu 2011; Fu et al. 2012), helps to establish the early domestication events associated with human selection for oil, fiber, capsular indehiscence, and winter hardiness. These efforts constitute the first important step to unravel the complex sequence and timing of human selection on flax over the last 9000 years. With the development of more informative genomic resources, more ancestral variation will be identified and utilized to establish domestication events. More effort is needed to model, test, and date the domestication paths with these established events. Ultimately, the flax domestication history can be reliably described and better understood.

Acknowledgments

The author would like to thank G. W. Peterson for his technical assistance for the research and three anonymous journal reviewers for their helpful comments on an early version of the manuscript.

Ancillary

Advertisement