QTL affecting fitness of hybrids between wild and cultivated soybeans in experimental fields

The objective of this study was to identify quantitative trait loci (QTL) affecting fitness of hybrids between wild soybean (Glycine soja) and cultivated soybean (Glycine max). Seed dormancy and seed number, both of which are important for fitness, were evaluated by testing artificial hybrids of G. soja × G. max in a multiple-site field trial. Generally, the fitness of the F1 hybrids and hybrid derivatives from self-pollination was lower than that of G. soja due to loss of seed dormancy, whereas the fitness of hybrid derivatives with higher proportions of G. soja genetic background was comparable with that of G. soja. These differences were genetically dissected into QTL for each population. Three QTLs for seed dormancy and one QTL for total seed number were detected in the F2 progenies of two diverse cross combinations. At those four QTLs, the G. max alleles reduced seed number and severely reduced seed survival during the winter, suggesting that major genes acquired during soybean adaptation to cultivation have a selective disadvantage in natural habitats. In progenies with a higher proportion of G. soja genetic background, the genetic effects of the G. max alleles were not expressed as phenotypes because the G. soja alleles were dominant over the G. max alleles. Considering the highly inbreeding nature of these species, most hybrid derivatives would disappear quickly in early self-pollinating generations in natural habitats because of the low fitness of plants carrying G. max alleles.


Introduction
Many crop species have evolved through recurrent cycles of hybridization with their wild and/or weedy relatives followed by differentiation (Harlan 1992). Gene flow from crops to their wild relatives has been commonly observed in many crop species (Ellstrand et al. 1999;Ellstrand 2003). There is a concern that transgenes in crops will persist in the gene pool of wild relatives and lead to negative environmental effects because of the difficulty in controlling gene flow completely when genetically modified (GM) crops are field planted. Possible concerns related to transgene introgression are the evolution of aggressive weeds from hybrid derivatives (Warwick et al. 2009), the influence on nontarget insects (O'Callaghan et al. 2005), and the changes in genetic diversity of wild populations (Levin et al. 1996;Lu 2008).
The probability of transgene introgression from a crop species into a wild species is largely dependent on the fitness of the F 1 hybrid and subsequent generations. Fitness may be defined as the relative ability of an individual to survive and successfully reproduce in a given environment, with the most fit individuals leaving the greatest number of offspring (Jenczewski et al. 2003). Fitness is not only a characteristic of entire genome, it is also a property of individual genes and chromosomal segments (Harrison 1990). The persistence of transgenes from crop plants within the genomes of crop wild relatives is dependent on the fitness conferred by the transgene and by linked genomic regions (Gressel 1999;Jenczewski et al. 2003;Stewart et al. 2003). The fitness of plants carrying domestication-related genes is assumed to be lower than that of their wild relatives when tested in natural habitats (De Wet and Harlan 1975). Transgenes would be expected to disappear in natural populations when linked with domestication-related genes that lead to a selective disadvantage in wild habitats, such as seed dormancy and seed shattering (Gressel 1999;Stewart et al. 2003).
On the other hand, chromosomal blocks can introgress at a higher rate than expected when they contain advantageous gene combinations with positive fitness consequences (Rieseberg et al. 1996). There are some cases in which hybrids between wild relatives and crop plants may be as fit as or even more fit than their parents; examples have been found in Brassica (Snow et al. 1999;Di et al. 2009), Raphanus (Hovick et al. 2012), and Sorghum (Sahoo et al. 2010). Baack et al. (2008) found that some alleles from cultivated sunflower (Helianthus annuus L.) are favored in a noncrop environment and in wild genetic backgrounds. Depending on the effect of the inserted gene itself, transfer of transgenes could lead to a change in allele frequencies through a selective advantage conferred to the recipient (Hails 2000;Gepts and Papa 2003;Jenczewski et al. 2003;Snow et al. 2010;Hartman et al. 2012).
Genetically modified soybean (Glycine max) is economically important and accounted for 81% of the worldwide planting area of soybean in 2012 (81 million ha; James 2012). The annual wild species Glycine soja is found in eastern and northeastern China, Japan, Korea, and far eastern Russia (Carter et al. 2004). In Japan, G. soja is distributed widely in disturbed habitats such as riverbanks, roadsides, and even at the edges of soybean fields Kuroda et al. 2005Kuroda et al. , 2006bKuroda et al. , 2007. Reproductive barriers have not been observed between G. max and G. soja, and the crosses can produce fertile F 1 hybrids (Singh and Hymowitz 1989;Carter et al. 2004). The risk of transgene dispersal within Glycine is assumed to be very low in Japan because (1) outcrossing rates between G. max and G. soja are generally less than 1% (Nakayama and Yamaguchi 2002;Kuroda et al. 2008;Mizuguti et al. 2009), (2) natural F 1 hybrids between G. max and G. soja are rare in Japan, and (3) plants derived from those hybrids survived only one to a few years in natural habitats Kuroda et al. 2005Kuroda et al. , 2006bKuroda et al. , 2007. However, the genetic and ecological mechanisms for this lack of persistence remain unclear.
During the domestication of soybean, G. max evolved from G. soja to have large and nondormant seeds with a determinate nontwining growing habit that may affect fitness in natural habitats. The seed dormancy of wild soybean is caused by the physical structure of the seed coat, which usually does not imbibe water immediately after immersion (Rolston 1978;Ohara and Shimamoto 1994). In contrast, G. max bears seeds with little to no dormancy because uniform and rapid germination are important for soybean cultivation and food processing. Oka (1983) analyzed reproductive success in seminatural conditions by using hybrid derivatives between G. max and G. soja and found that plants with high seed dormancy and high seed production successfully survived. Thus, knowledge of genomic regions affecting fitness-related traits helps us to understand the reasons why hybrid derivatives between G. max and G. soja are rare in natural habitat. Although domestication-related quantitative trait loci (QTL) such as seed size and growth habit have previously been reported (e.g., Liu et al. 2007), no attempt has been made to identify genetic factors affecting the number of seeds per plant and winter seed survival in the soil.
In this study, artificial F 1 hybrids, F 2 populations, and backcross populations were made of two combinations of G. soja (W -Wild) and non-GM G. max (D -Domesticated); these combinations had different growth habits and represented northern and southern Japanese germplasm based on the assumption that gene flow from GM G. max to G. soja occurs in both northern and southern Japan. The degree of fitness of the hybrids and their derivatives was compared with their G. soja and non-GM G. max parents in three regions of Japan: north, central, and south. On the basis of the results, we discuss the likelihood of persistence of transgenes from G. max in G. soja populations. This is the first report of the detection of QTLs affecting fitness-related traits such as winter seed survival and seed number per plant of G. soja 9 G. max hybrids in experimental fields.

Materials and Methods
Plant materials F 1 hybrids between wild and cultivated soybean F 1 hybrids between G. soja and non-GM G. max were produced for two cross combinations. One combination (W1 9 D1) was developed from a cross between the G. soja accession "JP036034" (W1) collected in Aomori Prefecture, northern Japan, and non-GM G. max cultivar "Ryuhou" (D1), which is widely grown in the northern Japan. The other combination (W2 9 D2) was developed from a cross between G. soja accession "JP110755" (W2) collected in Hiroshima prefecture in southern Japan, and non-GM G. max "Fukuyutaka"(D2), which is widely grown in southern Japan (Table 1). The wild soybean accessions used in these crosses were obtained from the Genebank of the National Institute of Agrobiological Sciences. Non-GM G. max cultivars, D1 and D2, were obtained from the Tohoku Agricultural Research Center and Kyushu Okinawa Agricultural Research Center, respectively. F 2 populations F 2 populations, which might be expected to grow in natural habitats, were developed for testing because of the highly inbreeding nature of soybean. Two F 2 populations, one representing the northern region (W1 9 D1: 204 individuals) and the second representing the southern region (W2 9 D2: 204 individuals) were developed from seeds by self-pollination of a single F 1 hybrid plant per cross (Table 1).

Backcross populations
To confirm the effect of G. max genes in a predominantly G. soja background, backcross (BC) populations were developed for both W1 9 D1 and W2 9 D2 combinations using G. soja as the recurrent parent (Table 1, Fig. 1). Two BC 1 F 1 populations (W1 9 D1: 68 individu-als; W2 9 D2: 160 individuals) were obtained from crossing each F 1 hybrid (donor plant) to the corresponding G. soja accession (recurrent parent). The success of the crossing was confirmed using 60 simple sequence repeat (SSR) markers composed of three markers per linkage group for all 20 linkage groups. Furthermore, two BC 2 F 1 populations (W1 9 D1: 60 individuals; W2 9 D2: 40 individuals) were developed by crossing selected BC 1 F 1 plants (one plant per population) to G. soja. The selection of the BC 1 F 1 plants, having major G. max QTLs for seed dormancy and total seed number identified in F 2 populations, was based on the genotypes of the BC 1 F 1 populations. To investigate the fitness of the populations after an additional generation of self-pollination, the seeds obtained from self-pollination of the selected BC 1 F 1 plants were used to develop two BC 1 F 2 populations (W1 9 D1: 150 individuals; W2 9 D2: 150 individuals).

Field locations
Among 204 F 2 plants in the W1 9 D1 population, 104 F 2 plants, together with the 10 parents and several F 1 hybrid (Table 1), were grown at 1 m 9 1 m spacing at the Tohoku Agricultural Research Center (39.5°N, 140.4°E, Akita Prefecture, northern Japan, Appendices A1 and A2), hereafter referred to as the "north field." The other 100 F 2 plants, together with the parents and F 1 hybrid, were grown at the same density at the Western Region Agricultural Research Center (34.5°N, 133.4°E, Hiroshima prefecture, southern Japan, Appendices A1 and A2), hereafter referred to as the "south field." The W2 9 D2 population, composed of 204 F 2 plants, was grown in the north and south fields in the same manner as the W1 9 D1 population (Table 1). As maintenance and evaluation of many climbing plants with a higher G. soja background are difficult, backcross populations (BC 1 F 1 , BC 2 F 1 , and BC 1 F 2 ) were only grown at 1 m 9 1 m density at the National Institute of Agrobiological Sciences (36.0°N, 140.1°E, Ibaraki Prefecture, central Japan, Appendices A1 and A2), hereafter referred to as the 'central field'. Seed coats were scratched with a razor blade and germinated in a small pot at the beginning of July in 2005, at the middle of June in 2006, and at the end of May for in 2007. The seedlings were transplanted to the field in the middle of July every year. Three stakes with a net strung between the stakes per plant were used to guide twining stems. During October to November, mature pods with seeds were harvested by hand twice a week. Standard agricultural practices such as applications of fertilizer (650 kg/ha of 3 parts nitrogen, 10 parts phosphate, and 10 parts potassium; 1000 kg/ha of fused magnesium phosphate; 1000 kg/ha of limestone), weeding, insecticides to control stink bug and common cutworm, were conducted.

Trait measurement
Of a total of 11 fitness-related traits (Table 2), 10 were treated as quantitative traits and one (seed coat color) was treated as a qualitative trait. Two seed dormancyrelated traits, namely seed winter survival (DORM_1) and seed hardness (DORM_2), were evaluated using the seeds from individual plants. As germination of all the hard seeds from randomly selected lines were confirmed by the mechanical abrasion on the wetted filter paper or soil at room temperature, hard seeds were treated as viable nongerminated seeds. Seed production-related traits, namely total seed number (PROD_1), seed total weight (PROD_2), 100-seed weight (PROD_3), total pod number (PROD_4), stem dry weight (PROD_5), and stem length (PROD_6) were evaluated for each plant (Table 2). Those traits were recorded on a per plant basis after the seeds of each plant had matured. As flowering of the two southern accessions (W2 and D2) as well as F 1 plants and most of the W2 9 D2 F 2 plants were late flowering in the north field, those whole plants (7 of 10 W2, all 5 D2, all 5 of the W2 9 D2 F 1 , and 73 of 104 W2 9 D2 F 2 plants) were taken from the field before the first snowfall and dried in the greenhouse to obtain mature seeds. The methods of trait evaluation in the  backcrossing populations were the same as for the F 2 populations. The number of days from sowing to first flowering was recorded as FLOW. The total number of seeds expected to germinate in the following year (SURV) was estimated as PROD_1 multiplied by DORM_1. The mean and standard deviation for each trait and the correlation coefficient between each pair of traits were calculated. Differences in mean values between G. soja, G. max, and F 1 hybrids were analyzed separately for each field location in each year with the Mann-Whitney U-test or Kruskal-Wallis test. The median and range instead of mean and standard deviation are reported for segregating populations. All statistical analyses were conducted using R version 2.9.2 (R Development Core Team 2009).

Genotyping
Total DNA of each putative F 1 seed was extracted from a small piece of cotyledon tissue using an EZ1 DNA Tissue kit (Qiagen, Tokyo, Japan). Total DNA of F 2, BC 1 F 1, BC 2 F 1 , and BC 1 F 2 individuals was extracted from 100 mg of fresh leaf tissue. DNA concentration was adjusted between 5 and 25 ng/lL by comparing with known concentrations of standard k DNA on a 1.5% agarose gel. A total of 720 SSR markers from SoyBase (http://soybase.org/) were screened to detect polymorphisms between the parents. Five markers were also included to track the three classical soybean loci, I, T, and Dt1 (Appendix A3). Three markers, dCHS1 (Matsumura et al. 2005), AY262686B, and AY262686Z, were used to track the I locus, which controls seed coat color and might be related to seed winter survival. A single-base indel marker sF3′H1 reported by Toda et al. (2002) was used to detect the T locus, a locus that controls pubescence color and interacts with the I locus. A SSR marker LFsoy3 was designed to track the Dt1 locus, which might be related to stem length and seed total number. These markers were amplified by using KOD-plus polymerase (Toyobo, Osaka, Japan), based on the manufacturer's guide, in a GeneAmp 9700 PCR system (Applied Biosystems, Tokyo, Japan). Polymorphisms were scored by using banding patterns in 12% polyacrylamide gel. Successful crossing was confirmed by analysis of DNA from putative F 1 seeds, based on the genotype of the polymorphic SSR marker Satt207, which has a different allele in each of the four parents (W1, 177 bp; D1, 234 bp; W2, 210 bp; and D2, 231 bp). To genotype F 2 , BC 1 F 1 , BC 2 F 1 , and BC 1 F 2 individuals, polymorphic markers were selected at about 20-cM intervals based on the composite map of soybean from SoyBase (http://soybase.org/). Using four types of fluorescent labels (6-FAM, VIC, NED, or PET), multiplex PCR was performed to detect segregation patterns within each population. The PCR reaction mixture consisted of a total volume of 5 lL, containing 1.7 lL of template DNA, 2.5 lL of 2 9 Qiagen Multiplex PCR Master Mix, 0.5 lL of a four-primer mix (1.25 lmol/L each), and 0.3 lL of water. PCR amplification was perform in a GeneAmp 9700 (Applied Biosystems) or iCycler (BioRad, Tokyo, Japan) thermal cycler programmed with an initial activation step at 95°C for 15 min; followed by 40 cycles of 30 sec at 94°C for denaturation, 90 sec at 57°C for annealing, and 60 sec at 72°C for extension; followed by 30 min at 60°C for final extension. For analysis, 3 lL of PCR product was denatured at 95°C for 5 min after mixing with 10 lL of Hi-Di formamide (Applied Biosystems) and 15 nL of GeneScan-500LIZ size standard (Applied Biosystems). Denatured samples were analyzed by using a 3100 Genetic Analyser (Applied Biosystems) and the output was analyzed using Gene Mapper 3.0 software (Applied Biosystems).

Linkage map construction
Linkage maps were constructed for F 2, BC 1 F 1, BC 2 F 1 , and BC 1 F 2 populations by using Joinmap ver. 3.0 software (Van Ooijen and Voorrips 2001) according to the method of Han et al. (2005). The recombination frequencies were converted into map distances using the Kosambi mapping function (Kosambi 1944).

QTL analysis
The QTL analysis for phenotypic data from the BC 1 F 1, BC 2 F 1 , and BC 1 F 2 individuals was conducted with Multi-QTL ver. 2.6 software according to Peng et al. (2003). For phenotypic data of the F 2 individuals from the two field environments (north and south), a single QTL with multiple environment model was fitted to scan the entire genome (Korol et al. 1998(Korol et al. , 2001. Statistical significance thresholds (a = 0.05) for putative QTLs were tested by 10,000 runs of a permutation test (Churchill and Doerge 1994). Multiple interval mapping (Kao et al. 1999) was then conducted to reduce the background variation by taking into account QTL effects from other chromosomes. After the permutation test runs, the parameters of significant QTLs (statistical thresholds a = 0.05) were reported as position, additive and dominant effects, and percentage of variance explained (PVE).

Cultivated and wild soybean
The following domestication-related traits generally differed between the G. soja and G. max parents for both combinations (W1 9 D1 and W2 9 D2) tested at all   Different alphabet among parents and populations at each field location indicates significant difference at 5% level by Mann-Whitney's U-test or Kruskal-Wallis test.
Trait abbreviations are as defined in Table 2.
1Number of seed-producing individuals. three field locations (north, central, and south) in 2005 (Table 3). The means of G. max were higher than those of G. soja for PROD_3, whereas the means of G. soja were generally higher than those of G. max for DORM_1, DORM_2, PROD_1, PROD_4, PROD_6, SURV, and FLOW. In contrast, PROD_2 and PROD_5 were not notably different between G. soja and G. max. Especially, the means of G. max for PROD_2 tended to be similar to or higher than those of G. soja at their recommended regions for growing. Although no G. max data were obtained from the south field in 2005, we confirmed these trends in 2004 (A. Kaga & Y. Kuroda, unpublished data).

F 1 hybrids and F 2 populations
The phenotypic values of the F 1 and F 2 generations in most field locations were intermediate between G. soja and G. max for DORM_1, DORM_2, PROD_1, PROD_3, PROD_4, PROD_6, SURV, and FLOW (Table 3, Fig. 2A and B). However, the means of PROD_2 and PROD_5 in the F 1 and F 2 generations tended to be similar to or higher than those of G. soja at the recommended regions for growing the G. max parent. Most of the G. soja seeds dug up from the soil in the spring did not imbibe water, whereas the G. max seeds were rotten. Seeds from F 1 and F 2 plants were of all types: hard seeds that did not absorb water,   water-absorbing viable seeds, and rotten seeds. DORM_2 was positively correlated with DORM_1 (P < 0.05) in the F 2 generations of W1 9 D1 (seeds harvested from the north field, R 2 = 0.81; seeds harvested from the south field, R 2 = 0.69, Appendix A4) and W2 9 D2 (seeds harvested from the north field, R 2 = 0.85; seeds harvested from the south field, R 2 = 0.60). The extent of DORM_1 was associated with maternalinherited seed coat color and the pubescence color of the F 3 seeds produced on F 2 plants (Table 4). G. soja has black seeds and brown pubescence, and G. max has yellow seeds and white pubescence. High DORM_1 was observed for seeds with black or brown seed coat color produced by F 2 plants with brown pubescence color, and most of those seeds did not imbibe water when tested in the spring (brown seeds, 75.9%; black seeds, 75.5%). The seeds with other colors of pubescence had relatively low DORM_1. In particular, the seeds with brown seed coat color produced by F 2 plants with white pubescence color (22 of 27 F 2 plants) were severely cracked or split and could not be found in the following spring (DORM_1, 0.2%).
The PROD_1 of the F 1 plants was generally intermediate between G. soja and G. max for both the W1 9 D1 and W2 9 D2 combinations (Table 3, Fig. 2A and B). An exception was found in the north field, where PROD_1 of the F 1 plants from the W1 9 D1 combination (average 688) was similar to or higher than that of the G. soja parent (average 421).
The mean values of PROD_2 and PROD_5 in the F 1 generation were also higher than those of the parents. In the next generation, PROD_1 of several F 2 individu-als was similar to or higher than that of the G. soja parent. This transgressive growth of PROD_1 may be explained by heterosis or positional effect within a field for plant size-related traits because of significant (P < 0.05) positive correlations between PROD_1 and plant size-related traits such as PROD_5 and PROD_6 (Appendix A4).
The values for SURV of G. soja and G. max were different within both the W1 9 D1 combination and the W2 9 D2 combination because G. soja had both high PROD_1 and DORM_1, whereas G. max had low PROD_1 and zero DORM_1 (Table 3, Fig. 2A and B). Average SURV of F 1 plants was intermediate between G. soja and G. max for each combination. Greater variation was observed in the F 2 progenies than in the F 1 plants because of genetic segregation of PROD_1 and DORM_1.

Backcross populations
For the backcross populations (BCs; BC 1 F 1 , BC 2 F 1 , and BC 1 F 2 ) from both combinations, plants were grown only in the central field in 2006 and 2007. The phenotypic differentiation between G. soja and G. max in the central field was similar to that seen in the other fields in 2005. All trait values of the BCs were clearly shifted toward those of the G. soja recurrent parents. For both combinations, the medians of the BC 1 F 1 and BC 2 F 1 populations were very close to the means of G. soja for all traits (Table 3). In contrast, the extent of shift in BC 1 F 2 populations for DORM_1, DORM_2, PROD_1, and SURV_1 was not obvious as in the BC 1 F 1 and BC 2 F 1 populations.  *w and *c indicate G. soja (wild) and G. max (cultivated) homozygote excess, respectively, at the designated SSR locus. Arrows indicate QTLs with allelic effects opposite of those predicted by the parental phenotype for traits differing between G. soja and G. max. (N) or (S) next to an arrow indicates that the effect was observed only in the north field or south field, respectively.  *w and *c indicate G. soja (wild) and G. max (cultivated) homozygote excess, respectively, at the designated SSR locus. Arrows indicate QTLs with allelic effects opposite of those predicted by the parental phenotype for traits differing between G. soja and G. max. (N) or (S) next to an arrow indicates that the effect was observed only in the north field or south field, respectively. The plant type of the backcross generations was vigorous in both 2006 and 2007, when mulch sheets were used on the surface of soil; in contrast, the F 1 and F 2 generations, which were grown without the sheets in 2005, were less vigorous. Because G. soja had higher PROD_1 and DORM_1 than G. max, SURV of G. soja was higher than that of G. max in both the W1 9 D1 and W2 9 D2 combinations in all three backcross generations (Table 3, Fig. 2C-H). The medians of SURV in the BC 1 F 1 and BC 2 F 1 generations were very close to G. soja; still, there was variation in both DORM_1 and PROD_1 in the BC 1 F 1 and BC 2 F 1 generations ( Fig. 2C-H). Some individuals had the potential to yield large numbers of dormant seed because the number of seeds (PROD_1) was greater than G. soja and the seed dormancy (DORM_1) was similar.

QTL analysis for F 2 populations
Of 720 markers screened, 359 and 378 markers revealed clear polymorphisms between G. soja and G. max in the W1 9 D1 and W2 9 D2 populations, respectively. Of these, 212 and 208 markers were used to develop F 2 linkage maps of the W1 9 D1 and W2 9 D2 populations, respectively (Table 1, Fig. 3). Although gaps of more than 30 cM were observed between Satt285 and Satt414 on LG-J over populations and generations, the SSR markers were otherwise distributed evenly across the soybean genome, and marker orders were conserved between the W1 9 D1 and W2 9 D2 population maps as well as between those maps and the composite map by Song et al. (2004). The total lengths of the linkage maps developed here were about 2500 cM for the F 2 and BC 1 F 1 populations, comparable to the lengths of the SSR-based linkage maps developed by Song et al. (2004Song et al. ( ) (2524 and Liu et al. (2007Liu et al. ( ) (2383. Several markers (1.4% and 3.4% of the markers in the W1 9 D1 and W2 9 D2 populations, respectively) showed segregation ratios significantly (P < 0.05) deviated from the expected 1:2:1 ratio of G. soja homozygote, heterozygote, and G. max homozygote. Although most markers with segregation distortion were scattered over several linkage groups and were not consistent between the W1 9 D1 and W2 9 D2 populations, five of the distorted markers were adjacent and located in the upper half of LG-C1 in the W2 9 D2 population (Fig. 3A). Paracentric inversions and reciprocal translocations, which can lead to pollen and ovule sterilities and have been found between a specific Chinese accession of G. soja and G. max (Singh and Hymowitz 1988;Palmer et al. 2000), might account for the segregation distortions in these Japanese germplasm sources as well.
In total, 28 and 27 QTLs related to seed dormancy, seed production, and flowering phenology were detected in the F 2 generation of W1 9 D1 and W2 9 D2 populations, respectively (Fig. 3, Appendix A5). Among them, QTLs in three regions (LG-A2, -C2, and -D1b) had large effects on seed dormancy and QTL in one region (LG-L) had a significant effect on seed production.

Seed dormancy
Eight and 6 QTLs associated with seed dormancy were detected in the W1 9 D1 and W2 9 D2 populations, respectively (Fig. 3, Appendix A5). The G. max alleles at all of those QTLs had additive effects (Add.) of decreasing DORM_1 (Add, -3 to -37%; PVE, 6.4-76.2%) and DORM_2 (Add, -1 to -26%; PVE, 6.2-42.7%). Three major QTLs, which were located on LG-A2, -C2, and -D1b, were associated with DORM_1 and DORM_2 in both populations (Fig. 3A). The G. max allele at the QTL on LG-A2 had larger additive effects in the W2 9 D2 population than in the W1 9 D1 population. The QTL on LG-A2 was located near the I locus, and the QTL on LG-C2 was close to the T locus. The additive effect of the QTL on LG-A2 detected in seeds harvested from the south field tended to be higher than that for seeds harvested from the north field. In contrast, the additive effect of the QTL on LG-C2 and LG-D1b detected in seeds harvested from the south field tended to be lower than that for seeds harvested from the north field.
QTLs with large effect on seed production-related traits such as PROD_1, PROD_2, PROD_3, PROD_4, PROD_5, and PROD_6 were located near a marker LFsoy3 in both the W1 9 D1 and W2 9 D2 populations (Fig. 3B). The G. max alleles at those QTLs, except for PROD_3, had additive effects of decreasing the phenotypic values for those traits, but the magnitude of effect differed depending on the test location (Appendix A5). For both populations, the additive effects for PROD_1, PROD_2, and PROD_5 were greater than those in the north field. Although the effects of QTLs for PROD_4 would be expected to be consistent, they did not seem to be related in the populations. The frequency of pods with only one or two seeds on plants in the north field was greater than for plants in the south field (data not shown), which may explain this inconsistency.

Seed dormancy
Three QTLs for seed dormancy were detected in each population (Fig. 3A, Appendix A5). Although the QTLs for DORM_2 on LG-A2 were detected only in the W2 9 D2 population, the QTLs for DORM_1 on LG-A2 were detected across combinations (W1 9 D1 and W2 9 D2), suggesting that the G. max alleles have a consistent genetic effect even within a high percentage of wild genetic background. The G. max allele at this QTL on LG-A2 had a large negative effect on DORM_1 (Add, -7% to -8%; PVE, 21.2-22.3%).

QTL analyses for BC 2 F 1 and BC 1 F 2 populations
The linkage maps for W1 9 D1 and W2 9 D2 BC 2 F 1 populations were constructed by using 103 and 72 markers, respectively (Table 1, Fig. 3). These markers were located in the heterozygous regions in the selected BC 1 F 1 plants. In addition, BC 1 F 2 populations were developed by using seeds from self-pollination of the two selected BC 1 F 1 plants (W1 9 D1 and W2 9 D2) and partial linkage maps were constructed. The linkage maps for the W1 9 D1 and W2 9 D2 BC 1 F 2 populations were constructed by using 105 and 72 markers, respectively. The order of markers in each linkage map was well conserved between the W1 9 D1 and W2 9 D2 populations as well as among the F 2 , BC 1 F 1 , and BC 2 F 1 populations ( Fig. 3A and B). Entire linkage groups (LG-A1, -C1, -I, and -M in W1 9 D1 and LG-C1, -B2, -D2, -G, -J, and -O in W2 9 D2) were found to have been replaced with G. soja genome in the two selected BC 1 F 1 plants, BC 2 F 1 population and BC 1 F 2 population.
In the BC 2 F 1 generation, which had a higher percentage of G. soja genetic background than the BC 1 F 1 , but included the selected fitness-related alleles from G. max, 10 QTLs were detected in both the W1 9 D1 and W2 9 D2 populations (Fig. 3, Appendix A5). Similar to the BC 1 F 1 generation, most QTLs in the BC 2 F 1 generation were different from those detected in the F 2 generation. Unlike the situation in the BC 1 F 1 generation, the effects of DORM_1 and DORM_2 QTLs on LG-A2 were not detected in W1 9 D1 combination (Fig. 3A).
In the BC 1 F 2 generation, which had a similar percentage of G. soja background to the BC 1 F 1 generation but was homozygous for selected fitness-related alleles from G. max, 19 and 17 QTLs were detected in the W1 9 D1 and W2 9 D2 populations, respectively (Fig. 3, Appendix A5). The major QTLs for seed dormancy on LG-A2, C2, and D1b (Fig. 3A) and for seed number on LG-L (Fig. 3B) were well conserved between the F 2 and BC 1 F 2 generation, except for the DORM_1 QTL on LG-A2 in the W1 9 D1 population, which was present in the F 2 but not detected in the BC 1 F 2 generation.

Life history in relation to hybrid derivatives
In a previous study, hybrid derivatives that had arisen from gene flow between G. soja and G. max were grown in several natural habitats in Japan (Kuroda et al. 2010). Because the hardness of the seed coat, a phenotype related to seed dormancy (Table 4), is largely determined by the phenotype of the maternal G. soja plant, F 1 seeds produced by pollen from G. max can survive in the soil several years, and the F 1 plants can grow in the wild with G. soja. Here, FLOW of F 1 hybrids tended to be similar to that of G. soja parent or intermediate between G. soja and G. max parent (Table 3), indicating that the flowering of natural F 1 hybrids and local G. soja could overlap in several parts of Japan where natural hybrids have been identified. Due to genetic segregation in the F 2 progenies, the extent of overlapping flowering time with G. soja will be reduced in that generation. However, once secondary gene flow from the F 1 hybrid to G. soja has occurred, most of the backcross progenies are expect to have flowering time relatively similar to that of G. soja (Table 3). As the outcrossing rate in wild soybean populations has been reported to be 9.3-19% (Fujita et al. 1997) and 0-6.3% (Kuroda et al. 2008), our results suggest that G. max alleles can persist at some frequency in wild populations as long as gene flow continuously occurs at or near the maximum frequency.
Under the experimental field conditions, the total seed number (PROD_1) of the F 1 hybrid was similar to or less than that of the corresponding G. soja parent ( Table 3). PROD_1 of most F 2 progenies was usually less than that of G. soja, although some F 2 individuals revealed a similar or greater PROD_1 than the G. soja parent ( Fig. 2A and  B). As the proportion of G. soja background increased through backcrossing, the frequency of hybrid derivatives that revealed similar PROD_1 to G. soja also increased ( Fig. 2C-F). However, after one round of self-pollination of the BC 1 F 1 progenies, BC 1 F 2 plants with short plant height and low seed production, as was seen in the F 2 progenies, appeared again ( Fig. 2G and H).
Most G. max seeds died in the soil during the winter, whereas the G. soja seeds survived (DORM_1, Table 3). Although DORM_1 of the F 1 hybrids was intermediate between G. max and G. soja, F 2 progenies revealed wide variation in DORM_1 ( Fig. 2A and B). The extent of DORM_1 of the F 2 progenies was related to the seed color (Table 4). As the proportion of G. soja background was increased by backcrossing with G. soja, the seed morphology (i.e., seed coat color and size) became closer to that of G. soja, and DORM_1 of the BC 1 F 1 progenies increased ( Fig. 2C-F). However, after one round of selfpollination of the BC 1 F 1 progenies, BC 1 F 2 seed/plants with low DORM_1 appeared ( Fig. 2G and H). To understand this further, the phenotypic variation observed in the hybrid progenies was genetically dissected into QTLs by constructing genetic linkage maps.

Seed dormancy-related QTLs
Seedling emergence represents the interface between two demographic events: seed production and seedling recruitment. Because seed dormancy-related traits determine the timing of seedling emergence, the physiology of seed dormancy has a large effect on fitness. Good water permeability is an important trait for uniform and rapid germination in G. max cultivation and food processing. Conversely, rapid water uptake is known to lead to cell damage in the cotyledon (Powell and Matthews 1978) and is disadvantageous to survival of G. soja during winter in natural habitats. The physiological difference has been characterized by many researchers who have measured traits such as seed water imbibition or seed hardness during several days under germinable conditions. However, evaluation of seed dormancy is generally quite different between artificial and natural conditions in terms of time, water, and temperature conditions. Even G. max seed, which imbibes water during winter, could survive winter in 2006 (Table 3), indicating that water imbibition does not always lead to loss of seed viability. In this study, three major QTLs affecting both DORM_1 and DORM_2, which are located on LG-A2, -C2, and -D1b (Fig. 3A), were generally consistent over generations and crossing combinations. A significant high correlation between DORM_1 and DORM_2 was observed (Appendix A4): seeds from hybrid derivatives that had G. max alleles at those QTLs imbibed water easily and appear to have rotted in the soil over the winter. In particular, the G. max allele for DORM_1 on LG-A2 was found to be partially dominant to the G. soja allele because its effect appeared in BC 1 F 1 progenies and it had a large effect of reducing survival rate in the W2 9 D2 population (Appendix A5B). Therefore, the effect of such strong G. max alleles may lead to reduced winter survival of the seeds produced by an F 1 hybrid plant as well as by later-generation progenies.
Nevertheless, the magnitudes of allele effects at the three major DORM_1/DORM_2 QTLs were slightly different depending on the cross combination. For example, the effect of the QTL on LG-A2 was strongest among the three QTLs in the W2 9 D2 population, whereas it was similar to that of the other two QTLs in the W1 9 D1 population (Fig. 3A). This explains the different level of seed winter survival between the W1 9 D1 and W2 9 D2 combinations. All the previously reported QTLs had the effect of causing water imbibition when the alleles at those loci were from G. max. In a G. max 9 G. soja population, Keim et al. (1990) detected four QTLs on LG-A2, -L, and -D1b by evaluating imbibition of F 4 seeds for 7 days at room temperature. In contrast, Sakamoto et al. (2004) and Liu et al. (2007) identified two QTLs, located in LG-C2 and -D1b, by evaluating imbibition of seeds for 12 h and 24 h at room temperature, respectively. Glycine gracilis is an intermediate form between G. max and G. soja that originated in northeastern China (Hymowitz 2004). Three QTLs (on LG-C2, -D1b, and -I) were identified in a G. max 9 G. gracilis population by testing imbibition of seeds for 24 h at 25°C (Watanabe et al. 2004). These results indicate that QTLs on LG-C2 and -D1b are common among G. max 9 G. soja populations, but that a QTL on LG-A2 is not consistently detected in such populations. Similarly, in this study, no QTL for seed hardness (DORM_2) was detected on LG-A2 in the W1 9 D1 population (Fig. 3A). It is very interesting that QTLs for seed winter survival (DORM_1), which required a long-term evaluation in the field, were successfully identified in the W1 9 D1 combination in approximately the same region on LG-A2 where QTLs for DORM_1 and DORM_2 were detected in the W2 9 D2 combination. One possible explanation of this finding is that the effect of a QTL on LG-A2 may appear when seeds imbibe water during longterm evaluation if the seed coat of G. max has resistance to water imbibition. The slow imbibition rate seen for D1 parent also supports this explanation and suggests that there is allelic variation within G. max for a seed hardness QTL on LG-A2.
Based on the map locations of gene-derived markers and the magnitude of QTL effects, the DORM_1/ DORM_2 QTLs on LG-A2 and LG-C2 are tightly linked to the I locus and T locus, respectively, and the genes responsible for DORM_1 are either I and T themselves or genes closely linked to those loci (Fig. 3A). The I allele, which suppresses seed coat pigmentation, is dominant to the i allele, and the T allele, which confers pigment pubescence, is dominant to the t allele (Bernard and Weiss 1973). Hybrid derivatives without black seed coat (i.e., those with the I allele) showed low seed survival (Table 4), and, thus, the I allele is related to the water imbibition ability of G. max, which might be due to a physical characteristic of the seed coat. Epistatic interaction between the I and T loci has been reported to cause seed coat cracking when the alleles at both I and T locus are recessive and homozygous (Lindstrom and Vodkin 1991). Such cracked F 3 seeds produced from several F 2 individuals imbibed water quickly and failed to survive during winter (Table 4). Thus, epistatic interactions account for the reduced fitness of progenies derived from self-pollination, in spite of a low proportion of doublerecessive individuals in the progenies, through their influence on seed viability or survival.

Seed production-related QTLs
The genes for domestication-related traits, which differentiate between crops and their wild relatives, are not randomly distributed across crop genomes (Ross-Ibarra 2005; Kaga et al. 2008). In this study, QTLs with high contributions to seed production-related traits, representing distinct differences between G. soja and G. max, tended to be concentrated in a particular genomic region on LG-L (Fig. 3B). Those QTLs were common between different cross combinations (W1 9 D1 and W2 9 D2) as well as across different generations. One possible reason for the positive, high correlation of total number of seed (PROD_1) with traits related to plant size such as stem dry weight (PROD_5) and stem length (PROD_6 [Appendix A4]) would be a gene related to stem elongation. Classically, stem termination in soybean is known to be controlled by two loci, Dt1 and Dt2 (Bernard 1972 (Bernard 1972). Because Dt1 and Dt2 have been mapped on LG-L and LG-G, respectively (Cregan et al. 1999), the QTL with a strong contribution to stem length (PROD_6) on LG-L in this study is likely to be the Dt1 locus (Fig. 3B). Our results indicate that the G. max allele at this locus has the effect of reducing the number of seeds produced by hybrids between G. max and G. soja, as previously reported by Wang et al. (2004). Intriguingly, QTLs for seed weight (PROD_3) as well as other seed production-related traits were closely linked to marker LFsoy3, which was designed to detect a soybean homolog of PsTFL1a, a gene-controlling stem termination in Pisum (Foucher et al. 2003). Further studies are necessary to clarify the pleiotropic effect of soybean TFL1a on these traits. The G. max allele at the QTL for PROD_6 on LG-L was confirmed to have a moderate negative effect in the BC 1 F 1 and BC 2 F 1 populations, but it had no effect on PROD_1 as was found in the progenies from self-pollination (Fig. 3B, Appendix A5). A QTL for both PROD_6 and PROD_1 was identified again on LG-L in the BC 1 F 2 population. These results indicate that the G. max allele is recessive to the G. soja allele because its effects were detected only in progenies generated by self-pollination.

Flowering phenology-related QTL
Photosensitivity is also an important plant response that is heavily involved in the control of flowering as well as in successful seed production. There were clear differences between the W1 9 D1 and W2 9 D2 populations in terms of both days to first flower (FLOW) ( Table 3). The W1 9 D1 population, representing northern Japanese germplasm, had shorter FLOW than the W2 9 D2 population, representing southern Japanese germplasm. This difference reflects the adaptive strategy of G. soja and G. max in Japan. In northern Japan, the growing season is relatively short; thus, the W1 9 D1 population might respond to warm temperatures and start to produce seeds during the short period of moderate climate even if the plants are not large. In contrast, the W2 9 D2 population might respond to photoperiod and start to produce seeds only after the plants have grown large because autumn is relatively long in southern Japan.
Based on the location of SSR markers linked to previously reported flowering loci, the FLOW QTLs on LG-O (W1 9 D1 population), -L (W1 9 D1 and W2 9 D2 population), and -I (W1 9 D1 population) found in this study (Fig. 3B) are thought to be the classical maturity loci E2 (Bernard 1971), E3 (Buzzell 1971), and E4 (Buzzell and Voldeng 1980). The other FLOW QTLs with a large effect (i.e., that on LG-H) or with a moderate effect (i.e., on LG-E and -F in the W2 9 D2 population [Appendix A5B] and on LG-D1b and -K in the W1 9 D1 population [Appendix A5A]) have not been previously described and might be new loci for flowering time in soybean. Although a QTL for days to flowering on LG-C2 has been reported in a G. max 9 G. gracilis population (Yamanaka et al. 2001;Watanabe et al. 2004) and in a G. max 9 G. soja population (Liu et al. 2007), no flowering time QTL at that location was consistently identified in this study.

Evolutionary aspect of fitness-related QTLs and conclusions
Natural selection is expected to occur on the phenotypes of individuals that constitute G. soja populations, including hybrids between G. soja and G. max. Moreover, the phenotype of the hybrid progenies is influenced by the genetic variability of both G. max and G. soja, in response to a heterogeneous environment such as the natural habitat of G. soja. The results obtained here should be considered as an estimate obtained under conditions of maximum plant growth and seed production because the hybrid derivatives were widely spaced in the field (i.e., at intervals of 1 m); the results might have been different if the plants had been evaluated under conditions favoring high mortality of seedlings and restricted seed production in the competitive native weed population.
Genotype-dependent phenotypic response to different environments is common to quantitative traits and is referred to as phenotypic plasticity (Bradshaw 1965). In particular, the genes for wide adaptability that might have accumulated during human selection of G. max are probably different from those accumulated during ecological adaptation of G. soja, and they are likely to control more than the obvious morphological differences between the two species. For this reason, the effects of G. max genes were examined in this study in two types of hybrids between G. soja and G. max and were tested in two regions of Japan.
A large number of genes and their interactions with environmental changes during plant growth are thought to influence seed production. Nevertheless, the only QTL with a strong effect on PROD_1 between G. soja and G. max across different regions was the one identified on LG-L (Fig. 3B). The limited ability to detect QTLs involved in complex epistatic interactions might have led to underestimation of the number of loci involved in PROD_1 because QTLs for traits such as PROD_4 and FLOW that might be expected to affect PROD_1 were not always detected as QTLs for PROD_1.
Until recently, little has been known about the effect of G. max alleles within a predominantly G. soja genetic background. In this study, the genetic effects of those G. max alleles were not expressed as phenotypes in the BC 1 F 1 and BC 2 F 1 generations, indicating that most G. soja alleles are dominant to G. max alleles; one notable exception was the QTL for seed dormancy on LG-A2 (Fig. 3A). Snow et al. (1999) indicated that after two or three generation of backcrossing, hybrid derivatives in which crop alleles have been introgressed can be just as competitive and successful as wild plants. In this study, PROD_1 and DORM_1 in the BC 1 F 1 and BC 2 F 1 generation approached the values for G. soja as the proportion of G. soja genetic background increased (Table 3). Although QTLs at which G. max alleles had the increasing effect on PROD_1 and DORM_1 were not consistent over generations and crossing combinations (Fig. 3, Appendix A5), these alleles may have the potential to increase the fitness of hybrid derivatives. Individual plants that had higher fitness than G. soja in terms of SURV could be found in most generations of both the W1 9 D1 and W2 9 D2 populations (Fig. 2).
In contrast, QTLs at which G. max alleles had negative effects on fitness were consistently detected in both cross combinations and in different generations. In particular, QTLs for DORM_1 on LG-A2, -C2, and -D1b (Fig. 3A) and for PROD_1 on LG-L (Fig. 3B) were found in both cross combinations. This is one reason why hybrid derivatives do not survive in natural habitats Kuroda et al. 2005Kuroda et al. , 2006bKuroda et al. , 2007, and why genetic differentiation is maintained between G. soja and G. max (Maughan et al. 1996;Powell et al. 1996;Xu and Gai 2003;Kuroda et al. 2006a). Previously, it was reported that hybrids between wild and crop species should be less fit than their wild parents due to the burden that crop traits would introduce into wild plants (De Wet and Harlan 1975). Current knowledge of the genetic basis of domestication traits suggests that few genomic regions are usually involved in domestication (White and Doebley 1998;Gross and Olsen 2010); thus, these regions could be purged quite rapidly with no long-term impact on fitness within the first few generations after hybridization. Our results support these studies and suggest that the risk of transgene dispersal into the wild soybean gene pool is generally low in Japan. The simulation studies as to what extent G. max alleles persist under a mixed mating system (i.e., considering the relative proportions of progenies both from self-fertilization and from outcrossing events) is required to improve the assessment of environmental transgene dispersal from GM soybeans.