Population-based resequencing revealed an ancestral winter group of cultivated flax: implication for flax domestication processes

Cultivated flax (Linum usitatissimum L.) is the earliest oil and fiber crop and its early domestication history may involve multiple events of domestication for oil, fiber, capsular indehiscence, and winter hardiness. Genetic studies have demonstrated that winter cultivated flax is closely related to oil and fiber cultivated flax and shows little relatedness to its progenitor, pale flax (L. bienne Mill.), but winter hardiness is one major characteristic of pale flax. Here, we assessed the genetic relationships of 48 Linum samples representing pale flax and four trait-specific groups of cultivated flax (dehiscent, fiber, oil, and winter) through population-based resequencing at 24 genomic regions, and revealed a winter group of cultivated flax that displayed close relatedness to the pale flax samples. Overall, the cultivated flax showed a 27% reduction of nucleotide diversity when compared with the pale flax. Recombination frequently occurred at these sampled genomic regions, but the signal of selection and bottleneck was relatively weak. These findings provide some insight into the impact and processes of flax domestication and are significant for expanding our knowledge about early flax domestication, particularly for winter hardiness.


Introduction
Cultivated flax (Linum usitatissimum L.) is a multiple purpose crop being utilized for oil and fiber and its early domestication history may involve multiple events of domestication for oil, fiber, capsular indehiscence, and winter hardiness (Allaby et al. 2005;. Recent discovery of the early domestication event for capsular indehiscence (Fu 2011) not only supports this argument, but also stimulates more searches for clues on the early domestication history. Dehiscent flax (i.e., cultivated flax with spontaneously opening capsules) is genetically unique and displays close relatedness to its wild progenitor, pale flax (L. bienne Mill. or previously L. usitatissimum L. subsp. angustifolium [Huds.] hell.; Hammer 1986). In contrast, winter flax (i.e., cultivated flax with a vernalization requirement) is closely related to oil or fiber forms of cultivated flax and distantly related to its progenitor (Fu 2011). Possibly, these findings are clouded with inadequate sampling of diverse flax and/or limited genomic sampling with insufficient molecular markers (Uysal et al. 2010;. Given the facts that capsular dehiscence and winter hardiness are two major characteristics of pale flax and that cultivated flax was spread from the warm Near East to the cold Europe (Maier and Schlichtherle 2011), we hypothesized that winter hardiness was among those flax traits that human domesticated early (Fu 2011). We further reasoned that sampling more ancestral genetic diversity (Charlesworth 2010) may help to reveal closer relatedness between winter flax and pale flax, as some winter flax may have experienced differential domestication pressure over time and still carry more ancestral polymorphism.
Assessments of genetic relationships among various groups of cultivated flax with unique domestication-associated traits can provide insights into its domestication paths, as trait-specific groups should carry unique genetic traces of plant domestication accumulated over time (Fu 2011) and groups with early versus recent domesticated traits may display different levels of genetic relatedness to its progenitor (Zohary 1999). Early efforts were made to group cultivated flax based on specific flax traits (Elladi 1940;Dillman 1953;Kulpa and Danert 1962) to facilitate flax germplasm conservation, utilization, and research. The commonly referred or applied groups of cultivated flax are oil flax (i.e., cultivated flax with improved oil composition), fiber flax (i.e., cultivated flax with improved fiber characters), dehiscent flax, and winter flax (Diederichsen and Fu 2006). Interestingly, these four traits are associated with flax domestication (Hammer 1984;Uysal et al. 2012). Generally, cultivated flax is an annual, self-pollinating crop, has variable seed dormancy, grows fast with large variation in the generative plant parts, and has early flowering, almost indehiscent capsules, and large seeds. However, pale flax is a winter annual or perennial plant with narrow leaves and dehiscent capsules, and usually displays large variation in the vegetative plant parts and variable growth habit (Diederichsen and Hammer 1995;Uysal et al. 2012).
Pale flax has been identified as the wild progenitor of cultivated flax (Tammes 1928;Gill 1987;Fu et al. 2002;Fu and Allaby 2010). The archeological records of pale flax were obtained first from Tell Abu Hureyra in northern Syria (11,,500 years ago) (Hillman 1975) and then throughout the Near East by the 8th millennium BC (Zohary and Hopf 2000). The archeological finds from Tell Ramad in Syria (9000 years ago) revealed the first occurrence of cultivated forms of flax with an increase in seed size (van Zeist and Bakker-Heeres 1975). archeological evidence also existed for flax spreading from the Near East to Europe and the Nile Valley (Maier and Schlichtherle 2011). The recent archeological finds in southwest Germany revealed larger flax seeds in the earlier, than later, phase of the Late Neolithic (4000-2500 cal. BC) (Herbig and Maier 2011). The flax varieties that spread into the Danube valley were winter oil varieties. However, summer fiber varieties developed in eastern Europe also spread into central Europe and replaced the original varieties (Helbaek 1959;Diederichsen and Hammer 1995). All modern fiber varieties may have originated from eastern Europe (Helbaek 1959). Nowadays, flax is cultivated in more than 60 countries around the world (Fu 2005). The rest of the early history of flax domestication, however, remains unknown (Zohary and Hopf 2000;Allaby et al. 2005).
The objective of this study was to assess genetic diversity and genetic relationships of 48 Linum samples representing pale flax and four trait-specific groups of cultivated flax (dehiscent, fiber, oil, and winter flax) through population-based resequencing at 24 genomic regions. Recent development of genomic resources in Linum species through Roche 454 pyrosequencing technology (454 Life Sciences, Branford, CT) (Fu and Peterson 2012) made the genetic sampling of flax genome more feasible than before.

Materials and Methods
All flax accessions studied here were obtained from the flax collection at the Plant Genetic Resources of Canada (PGRC ;  Table 1). They include 10 pale flax accessions from Turkey and Greece and 38 cultivated flax accessions from 26 countries. The selection of pale flax accessions is limited due to the lack of widely distributed pale flax germplasm and the selected ones represent only the central part of its natural distribution spanning the western Europe and the Mediterranean, north Africa, western and southern Asia, and the Caucasus regions (Diederichsen and Hammer 1995). The cultivated flax accessions were selected based on previous phenotypic and genetic studies (e.g., see Diederichsen and Fu 2006) to represent four major intraspecific groups of cultivated flax (dehiscent, fiber, oil, and winter flax). The winter flax accessions sampled cultivated flax developed with winter hardiness from eight countries. The dehiscent flax accessions represent the primitive form of cultivated flax with dehiscent capsules and have been long accumulated from flax cultivation in the cultivated flax gene pool (Hegi 1925). For this study, the dehiscent flax accessions were empirically verified for capsular dehiscence, and the selected pale flax accessions were assessed for their taxonomic identity in the greenhouse.

DNA extraction
Plants were grown from seed for two to three weeks for cultivated flax and up to two months for pale flax in a greenhouse at the Saskatoon Research Centre, Agriculture and Agri-Food Canada. Young leaves were individually collected, freeze-dried, and stored at -20 • C. A freeze-dried leaf sample of one individual plant from each accession was selected, and its genomic DNA was extracted with the DNEasy Plant Mini kit (Qiagen, Mississauga, Ontario, Canada). Extracted DNA was quantified with a Thermo Scientific NanoDrop 8000 spectrometer (Fisher Scientific Canada, Toronto, Ontario, Canada).

PCR and Sanger resequencing
Sanger resequencing was performed on 24 confirmed contigs available in the Linum genomic resources developed through the Roche 454 pyrosequencing technology (Fu and Peterson 2012). The contig selection was mainly based on its polymorphism and quality, as gene annotations on all developed contigs were incomplete and unverified (Fu and Peterson 2012). The PCR primers for 24 loci were designed using the on-line Primer Quest tool (Integrated DNA Technologies, Coralville, IA) (  Biosystems, Woburn, MA), 1× KAPA Enhancer 1, 0.2 mM each dNTP, 0.4 pmol/μl each forward and reverse primers, 100 ng of the same genomic DNA template samples as used above for next-generation sequencing, and 0.5 U KAPA 2G Robust polymerase in a final volume of 25 μl; touchdown PCR cycled at 95 • C for 3 min followed by 10 cycles of 95 • C for 10 sec, 60 • C decreasing 0.5 • C per cycle for 15 sec, 72 • C for 30 sec followed by 25 cycles of 95 • C for 10 sec, 55 • C for 15 sec, 72 • C for 20 sec, followed by a final extension of 72 • C for 30 sec. A 3-μl sample of each PCR product was separated on 1.5% agarose for 2 h at 120 V. PCR was performed on either a DYAD or PTC-200 thermocycler (Bio Rad, Mississauga, Ontario, Canada). PCR products were cleaned following the method outlined by Rosenthal et al. (1993) and submitted for Sanger sequencing at the DNA Technologies Laboratory at the Canadian National Research Council's Plant Biotechnology Institute (Saskatoon, Saskatchewan, Canada).

Sequence analysis
All sequencing products were assembled with Vector NTI Suite's ContigExpress v9.0.0 (Invitrogen, Carlsbad, CA) and aligned using MUSCLE v3.6 (Edgar 2004). Aligned sequences with required length and quality were deposited into Gen-Bank under accessions JN845641-JN846695 and JN861766 and those without are given in Table S1. Population genetic analyses of aligned DNA sequences were performed using DnaSP program (Librado and Rozas 2009). Several measures of sequence variation were obtained, and they are the number of segregating sites, haplotype number, nucleotide diversity (π; Tajima 1983), the signal of selection (i.e., deviation from neutrality ;Tajima 1989;Fu and Li 1993), and the frequency of recombination (i.e., the minimum number of recombination events; Hudson and Kaplan 1985). The comparative diversity analyses were also done for different loci and various Linum groups. Haplotype analyses with and without gaps were c 2012 The Authors. Published by Blackwell Publishing Ltd.
performed using the DnaSP program. The positions of SNPs and indels for each haplotype were generated. The genetic relationships of the 48 Linum samples were analyzed based on concatenated sequences using the Bayesian Markov chain Monte Carlo approach available in the BEAST v1.4 (Drummond and Rambaut 2007), as the concatenation approach tends to yield more accurate trees than the consensus one (Gadagkar et al. 2005). The maximum clade credibility (MCC) phylogenies were generated with a relaxed uncorrelated lognormal clock and with tree priors as constant size, expansion, or exponential growth. The substitution model was under an HYK model with gamma distribution for site heterogeneity. The rest of the options were applied with default values. This Bayesian approach should yield more informative phylogeny, as it directly calculates ultrametric phylogenies based only on observed data and model parameters and incorporates both the branch-length errors and the topological uncertainties (Rutschmann 2006). For comparison, the distance-based NeighborNet (Bryant and Moulton 2004) of the 48 samples was also generated using the Split-sTree4 (Huson and Bryant 2006) with the default options of Uncorrected P and EqualAngle. The NeighborNet should display detailed reticulations where recombination may occur and yield more information for understanding the genetic relationships.
The optimal genetic structure of the 48 samples was also inferred based on concatenated sequences with a modelbased Bayesian method available in the program STRUC-TURE v2.2.3 (Pritchard et al. 2000;Falush et al. 2007). The STRUCTURE program was run 20 times for each subpopulation (K ) value, ranging from 2 to 10, using the admixture model with 10,000 replicates for burn-in and 10,000 replicates during analysis. The final population subgroups were determined based on (1) likelihood plot of these models, (2) the change in the second derivative ( K ) of the relationship between K and the log-likelihood (Evanno et al. 2005), and (3) stability of grouping patterns across 20 runs. For a given K with 20 runs, the run with the highest likelihood value was selected to assign the posterior membership coefficients to each accession. A graphical bar plot was then generated with the posterior membership coefficients. To assess the consistency of structural inference, an additional analysis was also made with the Bayesian method available in the BAPS software (Corander et al. 2008). Individual samples were clustered using the model for linked (or concatenated) markers and 20 replicate runs of the algorithm with the upper-bound values (K ) for the number of clusters ranging between 2 and 10.
An analysis of molecular variance (AMOVA) was performed based on concatenated sequences using Arlequin v3.01 (Excoffier et al. 2005) to quantify nucleotide variation between species and among various groups and inferred clusters of Linum accessions. Three models of genetic grouping were considered: pale flax versus cultivated flax; five groups of pale flax and cultivated flax; and four clusters of Linum samples inferred using the BEAST program. The significance of variance components and intergroup genetic distances (or pairwise group Fst) for each model was tested with 10,000 random permutations. The analysis also generated groupspecific Fst values in each model.

Coalescent simulation for bottleneck
The intensity of the bottlenecks associated with flax domestication was estimated following the procedures described in Haudry et al. (2007) using Hudson's ms program (Hudson 2002). The procedures applied simple demographic model of reduction in effective population size, assumed that an ancestral population experienced an instantaneous change in effective population size many generations ago (t) and no population expansion after the bottleneck. The bottleneck intensity α was defined as the ratio of the wild population size (N a ) to cultivated population size (N p ). Higher values of α correspond to more severe bottlenecks. The model had five parameters (N a , N p , τ , θ wild , and 4Nc) and the last three are the time after the bottleneck, the ancestral nucleotide diversity, and the population recombination at the locus, respectively. In this simulation, we assumed N a = 30,000 similar to those predicted in wheat and barley domestication (Badr et al. 2000;Haudry et al. 2007) and cultivated flax had gone through (t =) 9000 generations (or years) of domestication, so that τ = 0.15α. The estimates of θ wild and 4Nc at each locus for pale flax were obtained in this study. A set of 19 values of α was explored on a grid ranging from 1 (no reduction in effective population size) to 10 (severe reduction in effective population size), with 5000 simulations and an effective sequence length of 184 to 441 bp. The proportion of 5000 runs that simulated π is within 20% of the observed π was calculated for each α value for each of five domestication groups (dehiscent, fiber, oil, winter, and all cultivated flax samples). The average bottleneck intensity for each domestication group was estimated following Haudry et al. (2007) by calculating a multilocus likelihood as the product over 24 locus-specific likelihoods and maximizing the multilocus likelihood with respect to α. A 95% confidence interval was also constructed around the estimate of α by determining the value of α at which the log-likelihood value was 2 loglikelihood units lower than the maximized likelihood.

Nucleotide polymorphism
The Sanger resequencing generated a total of 1152 sequences of 24 DNA fragments for 48 Linum samples (Table 2). These DNA fragments represented 24 unlinked loci sampled across the flax genome. Sixteen fragments were associated with predicted gene functions, mainly with different proteins, but were not fully annotated for further diversity analysis. The DNA fragments varied in length ranging from 184 to 441 bp and averaging 289 bp. The total length of 24 concatenated sequences for each sample was 6886 bp. The number of segregating sites per DNA fragment ranged from 1 to 27 and averaged 8.5. The number of haplotypes detected per DNA fragment ranged from 2 to 17 and averaged 2.
For pale flax, the number of segregating sites per fragment ranged from 1 to 25 and averaged 6.4 and the estimated nucleotide diversity ranged from 0.0027 to 0.0603 and averaged 0.0108 (Table 3). For all the cultivated flax samples, the number of segregating sites per fragment ranged from 1 to 25 and averaged 7.0 and the estimated nucleotide diversity ranged from 0.0008 to 0.0351 and averaged 0.0076. For all 24 loci, the overall nucleotide diversity was larger for pale flax (0.0097) than for cultivated flax (0.0071). For the four groups of cultivated flax, large variation in nucleotide polymorphism was observed ( Table 4). The number of segregating sites per fragment ranged from 0 to 25 and averaged 4.6 for the dehiscent flax; 0 to 7 and 2.5 for the fiber flax; 0 to 22 and 4 for the oil flax; and 0 to 22 and 4.5 for the winter flax. The estimated nucleotide diversity ranged from 0 to 0.0522 and averaged 0.0075 for the dehiscent flax; 0 to 0.0157 and 0.0033 for the fiber flax; 0 to 0.0217 and 0.0051 for the oil flax; and 0 to 0.0578 and 0.0072 for the winter flax. For all 24 loci, the highest estimated nucleotide diversity was 0.0071 for the dehiscent flax, followed by the winter flax (0.0069), the oil flax (0.0053), and the fiber flax (0.0034).

Selection, recombination, and bottleneck
For pale flax, no significant deviation from neutrality measured with Tajima's D was detected for any loci assayed, but two deviations from neutrality (one significant and one marginally significant) were observed for cultivated flax ( Table 3). However, if based on Fu and Li's D * and F * tests, there were three possible significant deviations from neutrality for pale flax. For the four groups of cultivated flax, the largest number of significant (and/marginally significant) tests for deviation from neutrality based on Tajima's D was 5 for the oil flax, followed by the winter flax (2), the fiber flax (2) and the dehiscent flax (0) ( Table 4). If based on Fu and Li's D * and F * tests, the largest number of significant (and/marginally significant) tests for deviation from neutrality was 7 for the oil flax, followed by the winter flax (6), the fiber flax (2) and the dehiscent flax (2).
The recombination analysis performed with the DnaSP program revealed large variation in recombination frequency with respect to species and group (Tables 3 and 4). The total number of recombination events at the 24 loci was 11 for the 10 pale flax samples and 19 for the 38 cultivated flax samples. The total number of recombination events at the 24 loci was four for the oil and winter flax and three for the dehiscent and fiber flax.
The coalescent simulations assuming a simple demographic model with observed values of related parameters revealed the extent of domestication bottleneck ranging from 1.5 to 2 for the four groups of cultivated flax and the whole cultivated flax samples (Table 5). Specifically, based on the estimated π for each group, the bottleneck intensity was estimated to be 2 for the oil flax group and 1.5 for the other groups. The estimates of the 95% confidence interval were also large, ranging from 1 to 3.0, depending on the group of interest.

Genetic relationship
The BEAST program generated three MCC trees for the 48 Linum samples with three tree priors as constant size, expansion, and exponential, respectively. The phylogenies with the first two tree priors were exactly the same, although estimated branch lengths (or evolutionary rates) varied. The MCC tree with tree prior as expansion mirrored more closely with the NeighborNet by SplitsTrees4 described below. The MCC tree with tree prior as exponential had a cluster with mixed memberships from pale, winter, oil, dehiscent flax samples and was slightly less compatible with the NeighborNet.  Table 1.
Clearly, the winter flax samples were divided into two groups; one with C4 mixed with other cultivated flax samples and the other with C3 closer to some pale flax. The cluster C3 is unique and thus named as the ancestral winter flax group (Fig. 1), as this winter flax group displayed substantial ancestral polymorphism with and close relatedness to pale flax samples. The four ancestral winter flax samples were originated from Afghanistan, Syria, Turkey, and Egypt; the oil flax sample came from Iran; and four pale flax samples were collected from Samsun, Kastamonu, Zonguldak, and Trabzon regions of Turkey. All the members of the ancestral winter group were associated with three sad2 haplotypes (IX, X, XI; Table 1). Quantifying nucleotide variation among four inferred clusters of Linum samples revealed a significant (P < 0.0001) differentiation among these inferred clusters, which explained 35.8% nucleotide variation. The ancestral winter flax group (C3) was significantly differentiated from the other three clusters (Table 6). The NeighborNet of the 48 Linum samples obtained (Fig. 2) revealed essentially the same patterns of genetic relationships as those in the MCC tree, but with higher resolution for recombination at the individual sample level. The winter flax samples also were divided into two clusters; one with six members was closely related to the oil and fiber flax samples, and the other with four members was closely related to the oil flax sample from Iran and became closer to four pale flax samples. The whole dehiscent group was closely related to the pale flax samples. The fiber flax samples were placed in a cluster with a large articulation and mixed with the oil flax samples.

Genetic structure
The model-based inference of genetic structure within the 48 Linum accessions by STRUCTURE considered K = 2-10 clusters and revealed five optimal clusters with the highest log-likelihood value of -3400.3. The inference of the optimal number of clusters gained further support from the change in the second derivative ( K ) of the relationship between K and the log-likelihood (results not shown). Figure 3A shows the inferred genetic structure and ancestry for the 48 Linum samples for three runs with the highest log-likelihood values under K = 4, 5, and 6. Clearly, the changes of ancestry between K = 4 and 5 and between K = 5 and 6 were not extensive. Under K = 5 (i.e., the five optimal clusters), the 10 pale flax samples were divided into four ancestral groups; one was species specific and three were shared with the dehiscent, winter, or oil flax samples. Interestingly, the largest number of Linum samples (6) sharing ancestry with the pale flax was observed in the winter flax, followed by those in the dehiscent flax (4), the oil flax (2), and the fiber flax (1).
The model-based inference of genetic structure by BAPS revealed only four optimal clusters with little mixed ancestry (Fig. 2B). The pale flax was divided into two clusters, one of which was shared with one oil and four winter flax samples. The dehiscent flax formed one unique cluster, while the other cultivated flax samples formed another cluster. Clearly, a large number of fiber, winter, and oil flax samples were genetically related, except for those in the cluster mixed with pale flax.
Characterization of a priori genetic structure present in the 48 Linum samples using the Arlequin program revealed    (Table 1). Four major clusters (C1-C4) are labeled on the branches. The ancestral winter flax group is highlighted.
15.7% nucleotide variation present between pale flax and cultivated flax and 26.1% residing among five Linum groups (one pale flax and four cultivated flax groups). The pale flax samples appeared to have the smallest group-specific Fst value (0.222), followed by the dehiscent and winter group (0.254), the oil group (0.276), and the fiber group (0.299) ( Table 5).
The pairwise group differentiations were large, ranging from 0.056 (for the group pair oil flax and fiber flax) to 0.500 (for the group pair dehiscent flax and fiber flax) and averaging 0.250 (Table 5). 630

Discussion
This study represents the first large resequencing effort to sample Linum genomic regions for the assessment of flax nucleotide diversity and inference of flax domestication history. The effort generated an interesting finding of an ancestral winter group of cultivated flax that displayed close relatedness to pale flax. A related diversity analysis revealed an overall 27% reduction of nucleotide diversity in cultivated flax when compared with the pale flax. Additional analyses showed that recombination frequently occurred at these sampled genomic regions, but the signal of selection and bottleneck was relatively weak. These findings provide some insight into the impact and processes of flax domestication and are significant for expanding our knowledge about early flax domestication, particularly for winter hardiness. Few estimates of nucleotide diversity are available in Linum species . This study generated a new, useful set of nucleotide diversity estimates for two Linum species. The estimates at the 24 sampled genomic regions were higher (0.0071-0.0097) than those at the sad2 locus (0.0017-0.0052). For cultivated flax, the trait-specific group with the highest estimate of nucleotide diversity was dehiscent, followed by winter, oil, and fiber flax (Table 4). For the sad2 locus, however, the trait-specific group with the highest estimate of nucleotide diversity was winter, followed by oil, fiber,   Table 1). Note that the corresponding clusters may have different colors. and dehiscent flax . Overall, the estimates of nucleotide diversity for these two species appeared to be compatible with those reported for outcrossing crops such as maize (Wright et al. 2005) and much higher than those for other inbreeding species such as wheat and barley (e.g., see Table 3 of Haudry et al. 2007). These findings are surprising, as a self-fertilization rate of 95% or higher was reported in cultivated flax (Robinson 1937). However, the mating system and gene flow in the wild populations of pale flax remain unknown, although two distinct genetic backgrounds were detected in pale flax accessions collected from Turkey and associated with site elevation and longitude (Uysal et al. 2010(Uysal et al. , 2012. Also, it is possible that the high estimates of nucleotide diversity reflect the effect of sampling genomic regions only with the most polymorphism. The impact of domestication on cultivated flax seems to be only moderate at the sampled genomic regions. First, the overall reduction of nucleotide diversity (27%) in cultivated flax with respect to pale flax was not large, when compared with those for the inbreeding species such as wheat and barley (e.g., see Table 3 of Haudry et al. 2007, but also see Kilian et al. 2007). When trait-specific groups of cultivated flax are considered, the impact appears to be large, ranging from 27% to 65% and is compatible with those previously reported (Haudry et al. 2007). Second, the overall selection at these genomic regions was relatively weak, as significant deviations from neutrality were not extensive across all the genomic regions assayed (Tables 3 and 4). Third, the estimated intensities of domestication bottleneck for cultivated flax and trait-specific groups were also weak, ranging from 1.5 to 2, implying that the effective population size after domestication was onefold smaller than the effective population size in the wild progenitor population. These levels of bottleneck were considerably weak, when compared with those inferred in wheat (3; Haudry et al. 2007) and rice (3.5; Li et al. 2011). However, more extensive coalescent simulations for bottleneck are desirable with an expanded genomic coverage and outgroup sequence.
The BEAST program clustered the assayed winter flax samples into two groups, one of which was closely related to pale flax (Fig. 1). This ancestral winter flax group was significantly differentiated from the other three clusters including the other group of winter flax ( Table 6) from the NeighborNet analysis with the SplitsTree4 (Fig. 2), but also from the Bayesian inferences of genetic structure with the STRUCTRUE and BAPS programs (Fig. 3). The ancestral winter flax group was consistently formed with compatible inferences of ancestry for each member, although these Bayesian inferences varied between two methods such as in the optimal cluster number. Also, the STRUCTURE program seems to yield more information on ancestry for the ancestral winter flax group than the BAPS program. This may reflect the weakness of the BAPS Bayesian method or the effect due to the violation of linked marker with concatenated unlinked sequences.
The discovery of the ancestral winter flax group provides the first set of genetic evidence for early domestication for flax winter hardiness. As mentioned earlier, winter hardiness and capsular dehiscence are two major characteristics of pale flax (Diederichsen and Hammer 1995;Uysal et al. 2012). Previous genetic studies (Uysal et al. 2010;Fu 2011; showed that the dehiscent flax displayed more genetic similarity to its pale flax, but the winter flax displayed more genetic similarity to oil and fiber flax. The analysis here revealed the genetic division of the assayed winter flax samples; one group displayed more genetic similarity to pale flax. This is consistent with our original reasoning that the winter flax may have experienced differential domestication pressure and some of them still carry substantial ancestral polymorphism from pale flax (Charlesworth 2010). In contrast, the fiber flax samples displayed little ancestral polymorphism from pale flax at these genomic regions (see Fig. 3).
Another interesting result associated with the ancestral winter flax group is its inclusion of the oil flax sample from Iran. This result also has some implications. First, it supports the previous reasoning that flax was domesticated initially for oil, rather than fiber, use (Allaby et al. 2005). Second, it is consistent with the reasoning from the sad2 locus that multiple independent pathways of domestication of flax occurred after the initial domestication for oil use . Similarly, as cultivated flax was spread into Europe, winter hardiness was improved along with the selection for oil and fiber traits (Maier and Schlichtherle 2011), so that the nonancestral winter flax samples were well mingled with oil and fiber flax samples (Figs. 1 and 2).
Our study could be further improved for more informative inferences with enlarged sampling in various Linum groups and genomic coverage. However, extra efforts are still needed to collect pale flax samples from other regions of its species distribution and to assemble more trait-specific groups of flax germplasm (Diederichsen and Fu 2006;Uysal et al. 2012). The effects of genomic sampling cannot be completely excluded, as the 24 genomic regions were selected mainly based on the polymorphism. Expanding the genomic coverage would help to minimize such sampling effects. Also, 16 of the 24 genomic regions were associated with functional genes (encoding pro-teins) and should represent the transcribed regions of the flax genome, but it remains unknown that the detected polymorphism was truly ancestral variation from pale flax with respect to winter hardiness. Answering this question would require further investigation of genes or genomic regions knowingly associated with winter hardiness, but such genomic resources currently are still lacking.
The findings presented here are encouraging for searching clues on flax domestication processes. Winter hardiness was among those flax traits that human domesticated early (Fu 2011). This study, along with those companion investigations (e.g., see Uysal et al. 2010;Fu 2011;), helps to establish the early domestication events associated with human selection for oil, fiber, capsular indehiscence, and winter hardiness. These efforts constitute the first important step to unravel the complex sequence and timing of human selection on flax over the last 9000 years. With the development of more informative genomic resources, more ancestral variation will be identified and utilized to establish domestication events. More effort is needed to model, test, and date the domestication paths with these established events. Ultimately, the flax domestication history can be reliably described and better understood. Table S1. Aligned Sanger sequences for two contigs (C586 and C667) in 48 Linum samples. Table S2. List of 24 primer pairs used by Sanger resequencing of 24 contigs representing polymorphic genomic regions in the 48 Linum accessions, along with the polymorphism and gene annotation information.
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.