Identification of QTL for seed yield and agronomic traits in 944 soybean ( Glycine max ) RILs from a diallel cross of early-maturing varieties

Increasing soybean yield plays a key role in meeting the high demand for protein in Europe and other countries. The aim of this study was to dissect the genetic architecture underlying seed yield, plant height, protein yield and thousand- seed weight in early- maturing soybean. To this end, we performed QTL mapping based on 944 RILs derived from a half- diallel crossing design of five parents. We identified five to eight QTL for each of the four agronomic traits and some explained a considerable proportion of the genotypic variance. The three major QTL showed pleiotropic effects on two or more traits. Fine characterization revealed the maturity genes E1 and E3 , and the stem growth habit gene Dt2 as likely candidates underlying these QTL. In general, the allele increasing seed yield also resulted in taller plants, which needs to be considered during selection due to an increased risk of lodging. Collectively, our results underline the strong effect of some loci like the E1 gene on a range of traits including seed yield, making them attractive targets for a marker- assisted selection.

The aim of this study was to investigate the genetic architecture of seed yield, the seed yield component trait thousand-seed weight, plant height and protein yield. The study was based on 944 RILs derived from eight crosses among five early-maturing European soybean cultivars. In particular, our objectives were to (a) perform QTL mapping for the four traits, (b) fine-map identified QTL, (c) analyse QTL co-localizations for six traits by also considering the quality traits protein content and oil content, and (d) draw conclusions for the potential of marker-assisted selection for the investigated traits in soybean breeding.

| Phenotypic data
The field trials and the phenotypic data of the four traits, seed yield, plant height, protein yield and thousand-seed weight, have been described previously (Kurasch, Hahn, Leiser, Starck, et al., 2017). In brief, all RILs and their parents were classified according to their maturity date relative to each other as the earliest, middle and latest maturing genotypes, and as these groups then grown in three trials.
The three trials were grown with overlapping genotypes between trials, 38 lines were included in Trial 1 and Trial 2, and 66 lines were grown in both Trial 2 and Trial 3. The trials were grown in a partially replicated design with 20% of the lines grown in replication. At each location a different set of lines was replicated. The three trials were grown at three locations in Germany (Eckartsweier,48°31ʹ47ʺN,7°51ʹ8ʺE;Hohenheim,48°42ʹ42ʺN,9°12ʹ41ʺE;Neuenstein,49°12ʹ23ʺN,9°34ʹ54ʺE) in 2014, in yield plots with 4 rows and 9 m 2 (1.5 × 6 m) and 65 seeds m −2 .
The harvested seeds were dried to reach the same moisture content before evaluating dry matter seed yield and thousand-seed weight. Plant height was measured based on the distance between the ground and the last trifoliate leaf at the beginning of maturity and protein yield was derived by protein content multiplied with seed yield. Best linear unbiased estimates (BLUEs) were calculated across locations and were used for QTL mapping ( Figure S1).
Heritability was estimated following the BLUP-based approach suggested by Piepho and Möhring (2007). The estimated heritability for seed yield, plant height, protein yield and thousand-seed weight was 0.77, 0.91, 0.70 and 0.94, respectively.

| Linkage map construction
The pipeline of linkage map construction was as described in our previous study (Zhu et al., 2020). In brief, we obtained sequence data by Genotyping-by-sequencing (Elshire et al., 2011) and then, the raw data were processed and converted into ABH format. Next, the construction of the genetic linkage map was performed by using the R packages R/qtl and ASMap (Broman et al., 2003;Taylor & Butler, 2017) and finally, the R package Mapfuser was used to build a consensus map (van Muijen et al., 2017). Our final linkage map included 20 linkage groups, with a cumulative distance of 3,202.68 cM and a total marker number of 10,893 distributed at 3,605 unique positions. The average distance between unique marker positions across the whole consensus map was 0.89 cM.

| QTL mapping
We chose PlabMQTL to perform single family QTL mapping by using composite interval mapping and an additive genetic model (Utz, 2012). The number of cofactors was chosen by the modified Bayesian information criterion (Baierl et al., 2006). QTL scanning was carried out at intervals with 1cM and an empirical LOD threshold was determined by 2000 random permutations for each trait with a genome-wide error rate of α ≤ 0.1. The support interval was specified as a 1-LOD fall-off (Darvasi & Soller, 1997).
Two-hundred fivefold cross-validation runs were conducted to assess the frequency of QTL detection. The proportion of explained genotypic variance was assessed as P G = R adj 2 /h 2 , where R adj 2 is the adjusted explained phenotypic variance by this QTL and h 2 is the heritability of the trait.
Multi-family mapping was conducted with the R package mppR with an additive model that adds connection between families via shared parents (parental model) (Garin et al., 2017). An allele effect was estimated for each parental line, which is then more independent from the genetic background as estimated across all families containing this parent (Blanc et al., 2006). Here, we assumed that the residual terms are cross-specific, fitted by REML using ASReml-R (Butler et al., 2009). The process of QTL detection included simple interval mapping to select cofactors, followed by composite interval mapping to identify QTL. The confidence interval was calculated with -log10(p-value) drop by 1.0 from the CIM profile. The empirical LOD threshold was determined by 1,000 permutation runs at a genome-wide error rate of α ≤ 0.05 for each trait, where the parental model was still used, but the heterogenous residual term (HRT) fitted by REML was replaced by default HRT for time-saving. The same was used in five-fold cross-validation to determine the LOD threshold with 1,000 runs and to obtain the QTL detection frequency.
For genome-wide association mapping, we used a model incorporating a kinship matrix and a fixed effect defining the biparental family to which a genotype belongs as described previously (Zhu et al., 2020). Multiple testing was controlled by a Bonferronicorrected significance threshold (p < .05). The explained genotypic variance was calculated by fitting all significantly associated markers one by one in a linear model (p G-Single ) or together in a joint linear model ordered by the strength of their association (p G-Joint ). The allele substitution effect was obtained from the regression coefficient of the linear model when only one marker was considered.

| RE SULTS
In this study, we performed QTL mapping for seed yield, plant height, protein yield and thousand-seed weight, using 944 RILs derived from a half-diallel experimental mating design of five typical earlymaturing European soybean varieties. Six, five, seven and eight QTL were identified for these four traits, respectively ( Figure 1, Table 1).
Among them, the QTL qSY1/qPH2/qPY2 on chromosome 6 was found for the three traits seed yield, plant height and protein yield, and explained the highest proportion of genotypic variance within families, ranging from 21.21% to 59.21% for seed yield, 31.29% to 75.98% for plant height and 21.76% to 60.04% for protein yield.
We identified another two QTL underlying these three traits, qSY3/ qPH4/qPY5 on chromosome 18 and qSY4/qPH5/qPY6 on chromosome 19, with a relatively high proportion of explained genotypic variance for each trait. Interestingly, the estimated effects of the three QTL showed differences between families. There was also a high detection frequency for the QTL on chromosome 6 for all three traits and for the QTL on chromosomes 18 and 19 for plant height.
Moreover the QTL qSY6/qTSW8 on chromosome 20 was found to control both seed yield and its component thousand-seed weight.
We also adopted multi-family QTL mapping as complementary strategy to compare with the results from single-family mapping ( Figure 1, Table S1). We identified 12 QTL for seed yield with a global proportion of explained phenotypic variance (R 2 ) of 50.88%, 12 QTL for plant height with global R 2 of 68.39%, nine QTL for protein yield with global R 2 of 44.66% and 23 QTL for thousand-seed weight with global R 2 of 60.86%. Four, four, five, and six of the QTL for each of the four traits, respectively, were also identified by single-family mapping ( Table 1). The QTL on chromosome 6 explained the largest R 2 with 17.11% for seed yield, followed by the QTL on chromosome 19 with R 2 of 10.59% and the QTL on chromosome 18 with 4.36% explained variance. The same three QTL were also the major QTL for plant height and protein yield, with the QTL on chromosome 6 always explaining the largest proportion of phenotypic variance. QTL underlying protein yield were all also identified for seed yield and the QTL on chromosome 20 also for protein content (Zhu et al., 2020).
The number of identified QTL was highest for thousand-seed weight, with 23 QTL on 12 chromosomes, and the QTL on chromosome 20 explained the highest R 2 with 7.28%. In addition to the two linkage mapping approaches, we performed genome-wide association mapping. The positions of all of the identified QTL were either in the same regions or very close to the results from single-family mapping and multi-family mapping (Table S2-5, Figure S2). we calculated the frequency of the seed yield-increasing allele of these three QTL in each of the three field trials ( Figure S4). qSY1/ qPH2, that likely corresponds to E1, showed a clear trend with increasing frequency of the seed yield-increasing allele from early maturity genotypes (Trial 1) to latest maturity genotypes (Trial 3). Note, that early and late do not refer to the maturity group here, but to the classification of this population into three groups. For qSY3/qPH4 and qSY4/qPH5, by contrast, no such pattern was discernible. For the QTL pair qSY6/qTSW8 of seed yield and thousand-seed weight, the peak region on chromosome 20 was approximately between 27 and 32 Mb ( Figure S3).
To obtain a more complete picture of QTL co-localization, we combined the QTL for the four traits from the present study with QTL for protein content and oil content identified previously in the same population (Zhu et al., 2020). The upset diagram visualizes that two QTL were found to control four traits, another two QTL affect three traits, and three QTL have effects on two of the traits ( Figure 3). Most of the QTL were specific for one trait, with the largest number for thousand-seed weight.
Owing to the co-localization of the QTL on chromosome 6, 18, 19 and 20 between investigated traits, we chose the segregating marker nearest the QTL position in each of the eight families to investigate the allelic effect for the respective traits ( Figure 4, Figure S5). qSY1  In addition, a significant effect was observed in family P4 × P5, for which the QTL was not identified, but can be expected to segregate ( Figure S6). As for qSY3/qPH4 on chromosome 18, the differences between genotypic groups within families in which this QTL was identi- it showed the strongest variation among families. In general, for the identified QTL pairs the allele that increases seed yield also increases plant height. The QTL pair qSY6/qTSW8 was identified in two families (P1 × P2, P2 × P4) with comparable differences for both traits. This QTL has also recently been identified for protein content and oil content in these two families (Zhu et al., 2020). Further analyses revealed that the allele that increases seed yield and thousand-seed weight also increases oil content, but reduces protein content ( Figure S5).
We further investigated the effect of QTL combinations on seed yield and plant height by classifying the whole population into eight groups based on the genotypic state at the three co-located QTL, qSY1/qPH2, qSY3/qPH4 and qSY4/qPH5 ( Figure 5). The upward trend with an increasing number of the beneficial alleles for seed yield was observed for both traits. Plants carrying all three favourable alleles had the highest average seed yield, but were also amongst the tallest ones. The largest differences between groups were 0.9 Mg/ha for seed yield and 50.0 cm for plant height.

| D ISCUSS I ON
The acreage of soybean cultivation in Europe is constantly increasing due to the high demand for soybean for human consumption and animal feed. In order to bridge the protein gap in Europe and for soybean to become economically competitive or even superior to other crops, soybean yield needs to be improved further. It is well known, that seed yield is a complex trait, controlled by many genes with small effect. An alternative is, therefore, the selection on yield component traits or other correlated traits in an indirect selection, as these often show a higher heritability and a less complex genetic architecture. In this study, we performed QTL mapping for the four traits seed yield, plant height, protein yield and thousandseed weight, in order to understand the genetic control underlying these traits in early-maturing soybean germplasm, towards a marker-assisted breeding.

| Identification of QTL for the investigated traits
Single-family mapping identified QTL for all four traits, with most of them also being detected by multi-family mapping (Table 1,   Table S1). Interestingly, the proportion of explained genotypic variance was rather high for some QTL, reaching up to 50% or higher in some families. Some QTL for the different traits are located in very close (no more than 1 cM distance) or overlapping QTL regions and can thus be regarded as the same or closely linked QTL.

| Fine characterization of the major QTL on chromosomes 6, 18, and 19
In this study, we found the maturity gene E1 to be located in the genomic region of the major QTL on chromosome 6. E1 has been cloned (Xia et al., 2012) and in addition to regulating photoperiodic F I G U R E 2 Fine characterization of the identified co-located QTL for seed yield and plant height. Plots show the result of single marker regression plotted against the physical position of the markers (Gmax Wm82.a2.v1). Circles with black margin indicate the markers with the highest LOD value. Inverted triangles represent the position of the known phenology genes [Colour figure can be viewed at wileyonlinelibrary.com] response and thus maturity, it has been shown to affect plant height and node number of the main stem (Liu et al., 2011). Further support for E1 to underlie this QTL came from the analysis of the allele frequencies in the three trials, where the seed yield-increasing allele increased in frequency from the early to the late genotypes ( Figure S4). This illustrates that it is not only a seed yield or plant height QTL, but related to maturity. In addition, the analysis of E1 for the five parents revealed allelic differences, all carrying photoperiod insensitive alleles as expected for these early maturity varieties, but P5 being e1-as, while P1-P4 are e1-nl (Kurasch, Hahn, Leiser, Starck, et al., 2017). e1-nl is reported as dysfunctional allele with a deletion encompassing the entire E1 gene while e1-as is presumed to be a weaker allele caused by a missense mutation that reduces the repression of E1 on floweringinducing GmFT genes Xu et al., 2013).
Consistently, this QTL for plant height occurred in all four families involving P5, but not in any of the other crosses ( Figure S6). Overall, E1 is a very likely candidate gene for this major QTL on chromosome 6, affecting maturity, plant height, seed yield and protein yield.
For the QTL on chromosome 18, the growth habit gene Dt2 is a strong candidate (Figure 2). In soybean, the two genes Dt1 and Dt2 regulate stem growth habit in an epistatic interaction (Bernard, 1972;Lockhart, 2014;Ping et al., 2014). In the presence of wild-type Dt1, that maintains indeterminate growth, the dominant Dt2 allele results in semi-determinate plants. A potential advantage of indeterminate growth types in regions of higher latitude is the overlap of vegetative and generative growth stages, which allows higher yields under short growing seasons. Semi-determinate types are useful, as they produce fewer nodes compared to indeterminate types, making them less susceptible to lodging. As evident by the segregation of this locus, both alleles and thus growth types appear to be prevalent in early-maturing European soybean germplasm. Generally, the growth habit and flowering time with maturity can directly determine plant morphology like plant height, and thereby lead to a change in yield (Cao et al., 2017;Cober & Morrison, 2010;Xia et al., 2012;Zhang et al., 2015). Indeed, several studies reported higher yield to occur with larger plant height and later maturity (Cober & Morrison, 2010;Kabelka et al., 2004;Kim et al., 2012), and also in this population seed yield was highest in the latest maturing trial and significantly positively correlated with plant height. Interestingly, we identified this presumed Dt2 QTL in three families (P1 × P2, P1 × P3 and P2 × P4), indicating that parents P1 and P4 carry one allele and parents P2 and P3 the other allele ( Figure S6).
Consequently, the QTL can be expected to segregate in two out of the four families involving parent P5. However, the QTL was not detected in any of the four families with P5, that, as mentioned, is the only one to carry the e1-as allele. Only in families homozygous for e1-nl, did we identify this QTL. This indicates a dependency of the effect of Dt2 on the allelic state at E1. Dt2 encodes a MADS-box transcription factor and has recently been shown to affect a range of target genes that are involved in the regulation of different traits including flowering, maturity and water-use efficiency . Particularly, it has also been shown to function as a direct activator of floral identity genes GmSOC1, GmAP1, and GmFUL, which are likely involved in flowering control. As photoperiod-insensitive alleles are often used in early-maturing soybean, this interplay between Dt1, Dt2 and E1 appears interesting and requires further research.
Another soybean maturity locus, E3 (Watanabe et al., 2009), was also found close to the peak marker of the major QTL on chromosome 19 (Figure 2). The stem growth habit gene Dt1 is also located on chromosome 19 at 45.18 Mbp and thus a bit further apart from the QTL peak than E3. However, this QTL was also identified in families with identical Dt1 sequence, making it an unlikely candidate. In contrast to the presumed E1 QTL on chromosome 6, the frequency of the alleles at this QTL showed no change between the three maturity trials ( Figure S4). Notably, this does not rule out E3 as the gene F I G U R E 3 Summary of co-located QTL for six traits. The green bars show the number of QTL for each trait and the blue bars show the number of QTL for each combination. OC, oil content; PC, protein content; PH, plant height; PY, protein yield; SY, seed yield; TSW, thousand-seed weight [Colour figure can be viewed at wileyonlinelibrary.com] underlying this QTL and may be caused by a more complex interplay between the different maturity loci and a stronger effect of the E1 alleles in this germplasm and these environments.
In conclusion, the known phenology genes E1, E3 and Dt2 are likely candidates to underlie the major QTL identified here, illustrating their strong pleiotropic effects on a range of agronomically important traits, including seed yield.

| Characterization of pleiotropic QTL
Our previous phenotypic analysis of this population revealed a strong positive correlation between seed yield, plant height and protein yield (Kurasch, Hahn, Leiser, Starck, et al., 2017). In line with this, we also found the abovementioned three medium-to major-effect QTL to affect all three traits, illustrating an at least in part shared genetic F I G U R E 4 Effect of alleles of identified pleiotropic QTL in all eight families. (a-c) Effect of three QTL located on chromosome 6, 18 and 19, respectively, on seed yield and plant height. (d) Effect of the QTL located on chromosome 20 on seed yield and thousand-seed weight. Asterisks show the level of significance of the difference between the two groups in a family at p < .05, .01 and .001; ns, not significant [Colour figure can be viewed at wileyonlinelibrary.com] control. In total, seven QTL with effects on more than one trait were identified ( Figure 3). Such co-located QTL can be due to pleiotropic effects of a single QTL, as illustrated by the example of the maturity gene E1, or by closely linked QTL. From a breeding perspective, the latter are not necessarily different from the former, as depending on the distance of the QTL and the recombination rate in that chromosomal region, linked QTL can be close to impossible to separate.
Regarding the utilization of marker-assisted selection, it is important to know the effect of the different alleles on each of the target traits.
Is the one allele the favourable allele for both or all target traits, or is one allele positive for one trait but negative for another trait. We therefore assessed the effects of alleles of co-located QTL on the target traits in all eight families (Figure 4, Figure S5). qSY1/qPH2, the major QTL on chromosome 6, resulted in a difference of 0.3-0.7 Mg/ha for seed yield and 28.7-41.6 cm for plant height between genotypic groups in four families, illustrating a certain dependency of this QTL on the genetic background. Thus, the allele carried by parent P5 (presumably e1-as), resulted in taller plants with a higher yield under these environmental conditions, compared to the other allele (presumed e1-nl). The QTL qSY3/qPH4 and qSY4/ qPH5 appear to be more stable across genetic backgrounds, and also have positive effects on both seed yield and plant height. It has been reported that plant height is positively associated with yield (Liu et al., 2011;Lü et al., 2017), but some studies also reported contrasting results (Contreras-Soto et al., 2017). Taller plants, however, are more prone to lodging, which should also be considered in breeding programmes.
Another interesting example is the QTL on chromosome 20, affecting seed yield and thousand-seed weight, as well as protein content and oil content. We have recently reported that this QTL was identified in four families, with an inverse effect on protein content and oil content. The allele that increases protein content by 1% reduces oil content by approximately 0.4% in each family (Zhu et al., 2020). Here, we show that the effects of this QTL on seed yield, plant height and oil content are positively correlated ( Figure S5).
This illustrates that selection for the higher seed yield allele at this QTL will come at the expense of a lower protein content. This may be compensated by a higher protein yield and in addition, could be counteracted by selection for other protein content-increasing QTL.
However, it would be disadvantageous if a high protein content in the seed is required for specialty soybean processing. In conclusion, our results illustrate the complex interplay between agronomically important traits, which is also reflected by pleiotropically acting QTL. In breeding programmes, index selection maybe a good strategy to simultaneously improve several target traits.

| QTL stacking for seed yield and effects on plant height
Regarding the quantitative nature of seed yield, stacking of favourable seed yield QTL alleles might be an effective way in markerassisted soybean breeding programmes. The three pleiotropic QTL on chromosomes 6, 18 and 19 were selected to observe their combined effects on seed yield and plant height in the whole population.
Owing to the similar sign of the effects on both traits, both showed an increase with increasing number of favorable seed yield alleles ( Figure 5). Thus, marker-assisted selection for higher seed yield inevitably results in the selection of taller and later-maturing plants.
These effects of selection on pleiotropically acting QTL may be balanced by the already mentioned index selection.
Based on the successful experience from rice and maize, Liu et al. (2020) purposed that an ideal plant architecture was essential for a 'green revolution' of soybean. This not only includes an appropriate plant height, but also other characteristics such as shorter internode lengths, more internodes, fewer branches and others. Thus, a consequent next step is to dissect the genetic control of such plant architectural traits, which may be aided by soybean functional F I G U R E 5 Effect of QTL stacking. Boxplots showing the effect on seed yield and plant height of combining alleles at three QTL (qSY1, qSY3 and qSY4) with the allele decreasing seed yield (-) or the allele increasing seed yield (+). Seed yield is shown in orange and plant height in red. n refers to the number of genotypes of each genotypic group, Ø SY and Ø PH are the average of seed yield and plant height, respectively [Colour figure can be viewed at wileyonlinelibrary.com] genomics. Collectively, our results and the characterization of identified QTL illustrate the potential of marker-assisted selection, but also its limitations, highlighting that selection must always consider all target traits in order to identify superior genotypes as our future varieties.

ACK N OWLED G EM ENTS
This work was funded by the German Federal Ministry of Food and Agriculture (BMEL, grants numbers 2814500110, 2814EPS011) and by the Federal Ministry of Education and Research of Germany (BMBF, grant numbers 031B0339A, 031B0339B).

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.

AUTH O R CO NTR I B UTI O N
TW and VH designed the study. VH and WLL collected phenotypic and genotypic data. XZ performed the analyses. XZ, VH, WLL and TW wrote the paper.