A tetrasomic inheritance model and likelihood‐based method for mapping quantitative trait loci in autotetraploid species

Summary Dissecting the genetic architecture of quantitative traits in autotetraploid species is a methodologically challenging task, but a pivotally important goal for breeding globally important food crops, including potato and blueberry, and ornamental species such as rose. Mapping quantitative trait loci (QTLs) is now a routine practice in diploid species but is far less advanced in autotetraploids, largely due to a lack of analytical methods that account for the complexities of tetrasomic inheritance. We present a novel likelihood‐based method for QTL mapping in outbred segregating populations of autotetraploid species. The method accounts properly for sophisticated features of gene segregation and recombination in an autotetraploid meiosis. It may model and analyse molecular marker data with or without allele dosage information, such as that from microarray or sequencing experiments. The method developed outperforms existing bivalent‐based methods, which may fail to model and analyse the full spectrum of experimental data, in the statistical power of QTL detection, and accuracy of QTL location, as demonstrated by an intensive simulation study and analysis of data sets collected from a segregating population of potato (Solanum tuberosum). The study enables QTL mapping analysis to be conducted in autotetraploid species under a rigorous tetrasomic inheritance model.


Introduction
Most agronomic traits targeted in plant or animal breeding programmes are quantitative or complex traits, including yield and quality traits, and resistance to various biotic and abiotic stresses. Phenotypic variation of these traits is under polygenic control, and to a significant extent is also influenced by environmental factors. By using abundantly distributed genomic DNA polymorphisms, this variation can be mapped onto specific chromosomal regions known as quantitative trait loci (QTLs). Models and methods for QTL mapping have been well established in diploid species. However, the corresponding methods are far less advanced in polyploid species, particularly for autopolyploids, even though this group encompasses evolutionarily and economically important plants and aquaculture animals, including potato (Solanum tuberosum, the world's third most important food crop), leek, and horticultural crops such as blueberry and rose. This is largely attributed to the complexities in gene segregation and recombination under polysomic inheritance.
In an autotetraploid genome, such as cultivated potato, the four homologous chromosomes may pair in three possible ways and may show random or preferential bivalent formation in different species or genotypes (Bourke et al., 2017). Alternatively, the four chromosomes may form a quadrivalent, which may lead to the phenomenon of double reduction, where identical alleles carried on the sister chromatids enter into the same gamete, causing systematic allelic segregation distortion. This may occur with a frequency of up to 25% in autotetraploids, but it never occurs in a diploid or allopolyploid meiosis (Luo et al., 2006). We have also shown that recombination frequency between a pair of loci can be as high as 75% under a tetrasomic model, compared with 50% in diploids (Luo et al., 2006). These complexities highlight the substantial differences in the patterns of gene segregation and recombination in autopolyploids compared with diploid species. Additional complexities include a high level of heterozygosity stemming from the outbreeding nature of autotetraploids and a much wider spectrum of gene segregation compared with diploids (Bingham, 1980). Much research has focused on developing theory and methods for QTL analysis in autotetraploids. Methods have been proposed for genetic linkage analysis and QTL mapping in experimental populations of autopolyploid species (Hackett et al., 2001(Hackett et al., , 2014Cao et al., 2005;Xu et al., 2014), and widely practised in QTL mapping analyses in polyploid species (Massa et al., 2015;Da Silva et al., 2017;Massa et al., 2018;Mengist et al., 2018;Bourke et al., 2019;da Silva Pereira et al., 2019). However, these studies have been based on various assumptions that have substantially avoided some complexities of the analyses, but in doing so have ignored some essential features of autotetrasomic inheritance and practical data analysis. Specifically, these refer to different patterns of gene segregation and recombination due to different pairings of homologous chromosomes during meiosis of autotetraploids. Strictly speaking, none of the existing methods in the literature has thoroughly incorporated these into the development of a method for QTL analyses in autotetraploid species as detailed in the following. To fill this theoretical and methodological gap in the field of quantitative genetics, we developed a novel likelihood-based method for mapping of QTLs in outbred segregating populations of autotetraploid species.

Description
In general, there are two key components involved in the development of methods for mapping QTLs (Sen & Churchill, 2001), as detailed in the following. The first component is development of a quantitative genetic model, which links the QTL genes to their phenotypic effects on the trait. The second key step is tetrasomic linkage analysis involving a tested QTL and its surrounding genetic markers.

An outbred autotetraploid mapping population and data notation
The mapping population is created from crossing two autotetraploid parents, P 1 and P 2 , and presents the first generation of segregation and recombination of genes carried by the two parental individuals. We consider a linkage map of m molecular marker loci, M 1 , M 2 , . . ., M m , at each of which there could be up to eight different alleles segregating within the full-sib mapping population. Let r j (j ¼ 1; 2; . . .; m À 1) be the recombination frequency in the jth marker interval flanked by markers M j and M j+1 , and a j (j ¼ 1; 2; . . .; m) be the coefficient of double reduction at the jth marker. The parents together with n offspring individuals are scored at the marker loci. Let o i ¼ ðo i;j Þ j¼1;...;m be a vector of marker phenotype for the ith offspring individual at the m marker loci. Similarly, p 1 ¼ ðp 1;j Þ j¼1;...;m or p 2 ¼ ðp 2;j Þ j¼1;...m is the marker phenotype for P 1 or P 2 at the m marker loci. o i;j , p 1;j , and p 2;j are given by 1 9 8 vectors for the jth marker locus, where 1 (or 0) indicates the presence (or absence) of a particular allele. The marker data may be in a form with or without allele dosage information. The trait phenotypic data y i 2 Y and the offspring marker data o i 2 O (i ¼ 1; . . .; n) are modelled through the likelihood function of the model parameters X ¼ fX 1 ; X 2 ; X 3 g, as shown in the following: The model parameters in the likelihood function are organized as follows: ðr j ; g 1;j ; g 2;j ; g 1;jþ1 ; g 2;jþ1 ; q P 1 ; q P 2 Þ; ða; r; g 1 ; g 2 Þg and will be explained in the following sections on formulation of the probability distributions involved in the aforementioned likelihood function.

Orthogonal quantitative genetic model in autotetraploids
The genotypic value of the kth genotype (Q k q ð4ÀkÞ ) at a putative biallelic QTL to be mapped is modelled through the orthogonal model G k ¼ l þ w k1 h 1 þ w k2 h 2 þ w k3 h 3 þ w k4 h 4 , for k ¼ 0; 1; :::; 4, representing the number of trait-phenotype-increasing alleles. Here, µ and r 2 are the population mean and residual variance, h i (i ¼ 1; . . .; 4) are the monogenic, digenic, trigenic, and quadrigenic genetic effects of the QTL, and w kj (k ¼ 0; 1; . . .; 4; j ¼ 1; . . .; 4) are the corresponding orthogonal contrast scales for the genetic effects of genotype k for the jth contrast (j ¼ 1; 2; :::; 4). The rationale, statistical properties, and parameter estimation of the orthogonal quantitative genetic model are detailed in our recent work (Chen et al., 2018).

Tetrasomic linkage analysis for quantitative trait locus mapping in autotetraploids
We previously worked out the probability distribution of 136 possible two-locus gamete genotypes in the offspring from crossing two autotetraploid parental individuals (Luo et al., 2004). By incorporating a biallelic QTL into this two-locus linkage analysis, we have worked out the three-locus gamete genotype distribution, q q^m , of an autotetraploid individual under a quadrivalent pairing model (Table 1), which is fully characterized by the double reduction parameter a at the marker locus closest to the centromere, and the recombination frequency parameters. This probability distribution involves a total of 2080 (136 9 16 À 6 9 12 À 6 9 4) different gamete genotypes at the QTL and its flanking markers. This number reduces to 64 if the homologous chromosomes undergo bivalent pairing in meiosis, as summarized in Supporting Information Table S1. Accordingly, one can work out q qjm , the conditional probability of a gamete QTL genotype given the genotype of the flanking markers. By assuming random union of gametes from the two parents, we worked out the joint marker-QTL-marker zygote genotype probability distribution involving a total of 4326 400 (2080 9 2080) or 4096 (64 9 64) possible zygote genotypes under quadrivalent or bivalent pairing.
In practice, many offspring genotypes may be identical because there are a smaller number of segregating alleles at the marker loci and can thus be sorted computationally together with their corresponding probabilities. We have developed a computer-based algorithm and program to handle any number of segregating alleles at the marker loci. These enable calculation of Prfq ik jz i;j z i;jþ1 ; X 2 g in Eqn 1, which is the conditional probability of the QTL genotype of individual i given its flanking marker genotype and the model parameters X 2 ¼ ðr j ; g 1;j ; g 2;j ; g 1;jþ1 ; g 2;jþ1 ; q P 1 ; q P 2 Þ. Here, z i;j z i;jþ1 is the genotype configuration for offspring i in the marker interval j flanked by markers M j and M j+1 , r j is the recombination frequency in the jth marker interval, g i,j is the genotype of parent i (i ¼ 1; 2) at the jth marker locus, and q P i is the marker-QTL-marker genotype configuration of parent i (i.e. with known linkage phase of the marker and QTL alleles). Although the QTL alleles and linkage phase are unknown in practice, we may search all possible configurations over the likelihood function (Eqn 1) and determine the most likely QTL configuration. Moreover, we established r 12 ¼ r 1 þ r 2 À 4r 1 r 2 =3 to relate the recombination frequencies in the three-locus tetrasomic linkage analysis, as detailed in Notes S1.

Conditional probability of flanking marker genotype
Prfz i;j z i;jþ1 jo i ; X 3 g in Eqn 1 is the conditional probability of the zygote genotype of individual i at the flanking markers j and j + 1 given all the marker phenotypes on the linkage group and the parental marker genotypes g 1 and g 2 at the marker loci. X 3 ¼ fa; r; g 1 ; g 2 g, with a being a vector of the coefficient of double reduction at the marker loci, r is a vector of the recombination frequencies between the adjacent marker loci, and g 1 and g 2 are the parental genotypes at the marker loci. We previously developed a model based on the hidden Markov method to calculate the probability (Leach et al., 2010), as detailed in the present notation in Methods S1.

Quantitative trait locus interval mapping under different patterns of homologous chromosome pairing
It has been established earlier herein that the conditional probability distribution of QTL genotypes given flanking marker genotypes in an autotetraploid segregating population depends on the pattern of pairing between homologous chromosomes in meiosis. The probability distribution has been Table 1 Probability distribution of diploid gamete genotypes at a quantitative trait locus (QTL) and its flanking marker loci from a quadrivalent meiosis of an autotetraploid individual.
Where c ij ¼ r i 1 ð1 À r 1 Þ 2Ài r j 2 ð1 À r 2 Þ 2Àj , a is the coefficient of double reduction at locus A and a 0 ¼ 1 À a, r 12 is the recombination frequency between locus A and B, r 0 12 ¼ 1 À r 12 , and r 1 (or r 2 ) is the recombination frequency between the QTL and its left (or right) flanking marker.  (Table S1) pairing. These probability distributions may be plugged into a statistically appropriate method for the QTL mapping analysis if one knows which of the two chromosome pairings occurs in the species of interest. An obvious question arises that chromosome pairing behaviour is usually unknown a priori, and the homologous chromosomes may show a mixture of bivalent and quadrivalent pairings, as observed in many autotetraploid species, including potato (Quiros, 1982;Jones et al., 1996;Bradshaw, 2007;Bourke et al., 2015). To tackle the problem, we have first shown in Table 2 that the gamete genotype distribution under mixed chromosomal pairing has an almost identical pattern to that under quadrivalent pairing, except for the difference in value of the coefficient of double reduction. We show that the coefficient of double reduction at a locus under the mixed pairing model (a 0 ) can be related to the coefficient in the complete quadrivalent pairing model (a) through the simple relationship a 0 = ka (Methods S2), where k is the frequency of quadrivalent chromosome pairing in meiosis. The deviations between the true distribution g 0 i under the mixed pairings (the fifth column in Table 2) and the approximate distribution f i are highlighted in bold. Note that, for a given recombination frequency between the two loci, these deviations will be smaller when k is larger (i.e. when there is a higher frequency of quadrivalent pairing), which will make estimation of gamete probabilities under mixed chromosome pairing more precise. Conversely, when k takes its smallest value of zero (i.e. complete bivalent pairing), the loss of information will be greatest. The Kullback-Leibler divergence from f i to g 0 i is given in the present context by: Eqn 2 Note that if 0.00 < r < 0.05, D KL (g 0 ||f) will vary between 0 and 0.017, reflecting that very little information will be lost by approximating g 0 i with f i . In particular, note from Table 2 that the proportion of gametes with genotypes involving double reduction is the same for the true distribution and the approximate distribution with mixed chromosome pairing (i.e.
. The methods we have previously developed may therefore be used to estimate the average coefficient of double reduction in autotetraploids undergoing a mixture of quadrivalent and bivalent pairings in meiosis (Luo et al., 2000(Luo et al., , 2004. These results rationalize use of the QTL mapping method developed in the present study under the quadrivalent pairing model in the case where homologous chromosomes actually undergo a mixture of quadrivalent and bivalent pairing. This will be tested through an intensive simulation study in the following.

Model parameter estimation
We work out the maximum likelihood estimates (MLEs) of the QTL genotype means as defined in the above,Ĝ k andr 2 , through the EM algorithm (Dempster et al., 1977) and, in turn, the MLEs of the model parameters X 1 ¼ ðl; h 1 ; h 2 ; h 3 ; h 4 ; r 2 Þ through iteratively calculating the following two steps.
The E step calculates the probability of the ith individual having the kth QTL genotype in the jth marker interval given its phenotype and marker data as given in the following: f ðy i jq ik ;X 1 Þ Prfq ik jz i;j z i;jþ1 ;X 2 g Prfz i;j z i;jþ1 jo i ;X 3 g P 4 k¼0 f ðy i jq ik ;X 1 Þ Prfq ik jz i;j z i;jþ1 ;X 2 g Eqn 3 Table 2 Probability distribution of diploid gamete genotypes at two linked loci with homologous chromosomes showing quadrivalent, bivalent, or a mixture of the two pairing patterns in an autotetraploid meiosis from an individual with genotype

Gametes
(1 i; j; k; l 4) Double reduction occurred at Gamete genotype probabilities in mixed pairing meiosis Quadrivalent meiosis Bivalent meiosis The coefficient of double reduction in a population undergoing complete quadrivalent (or mixed bivalent and quadrivalent) chromosome pairing is given by a (or a 0 ), where a0 ¼ ka and k is the proportion of quadrivalent pairing in a mixture of quadrivalent and bivalent pairings. Dashes denote genotypes incompatible with bivalent pairing. The terms in bold are those stemming from the mixed homologous chromosome pairings.

New Phytologist
Derivation of this equation is detailed in Methods S3. The M step then updates the estimates of the genetic parameters from: The iterative algorithm is initiated using the sample variance for r 2 and by using K-means clustering to derive initial genotypic values for G k . As the E and M steps are repeated iteratively following Eqns 3-5, the likelihood function will increase and the estimated parameters will converge to the MLEs,Ĝ k andr 2 . The genetic effects at the QTL can be solved fromĜ k using: Calculation of w ij , the orthogonal scales for the genetic effects of QTL genotype i (i ¼ 0; 1; :::; 4) for the jth contrast (j ¼ 1; 2; :::; 4), depends on the probability distribution of the QTL genotypes, which can be obtained from the E step in Eqn 3 as detailed in our previous work (Chen et al., 2018). It should be noted that some of the QTL genetic effects may not be estimable for some parental QTL genotype configurations. For example, the trigenic and quadrigenic genetic effects will be indeterminable for the parental QTL genotype configuration QQqq 9 qqqq because no relevant offspring QTL genotypes would be generated from the parental QTL genotypes.
With the MLEs of the QTL genotype effects, the likelihood ratio statistic for testing the presence of a QTL at a location characterized by r j1 , the recombination frequency between the QTL and its left flanking marker, is calculated as: where LOD is the logarithm of the odds.G andr 2 are the estimates of genotypic effects and variance under the no QTL model, given by the mean and variance of the trait calculated from all individuals, assuming the phenotypes to be independently identically normally distributed.
Any possible location within each marker interval may be tested for the presence of QTLs to generate a LOD score profile for each chromosome. Permutation of the offspring phenotypic trait values over the corresponding marker data was used to see how the LOD scores distribute under the null hypothesis that there is no QTL present (Churchill & Doerge, 1994).
The foregoing analysis relies on the availability of information of QTLs and marker genotype and linkage phase information, which is unknown in practice. Assuming parent P 1 has the higher trait value implying a larger number of trait-increasing alleles Q, there are nine possible genotype configurations for P 1 and P 2 , listed as (1) (9) Qqqq Â qqqq. Taking all possible marker-QTL linkage phases into consideration, there will be up to 92 possible combinations of parental QTL genotypes. QTL interval mapping is carried out for each possible parental QTL genotype configuration. The Bayesian information criterion (BIC) (Schwarz, 1978) is calculated at each location tested as follows: Eqn 8 whereL is the maximized value of the likelihood function of the model, n is the population size, and k is the number of parameters estimated by the model (i.e. the population mean, genetic effects and the residual variance). The most likely parental QTL configuration should have the lowest BIC value, and the estimated genotypic value of parent P 1 must be higher than that of parent P 2 . Note that there are eight parental QTL configurations for which at least one parental genotype does not appear in the offspring, listed as (1) Under these eight models, we expect to obtain an identical likelihood profile with one of the other 84 parental QTL configurations. For example, the same likelihood profile is expected for genotype configuration (1) QQQQ Â qqqQ and an alternative configuration, QQQQ Â QQQq. However, the genetic effects would differ in sign under these two models, allowing the correct model to be distinguished, given that the monogenic effect must be positive by definition (Chen et al., 2018).

Simulation data analysis
A series of simulation data sets were generated under simulation models I and II, which simulated different pairings of homologous chromosomes during meiosis, marker densities on simulated chromosomes, and particularly generated marker data either with (model I) or without (model II) allele dosage information, as detailed in Notes S2. These simulation data sets were analysed using the QvMethod (the method developed in the present study for modelling quadrivalent chromosome pairing) and/or the BvMethod (the method formulated in the study for modelling bivalent pairing). The methods showed 100% power to detect QTL with a trait heritability of 5% or 10% in a first-generation segregating (S 1 ) population of 300 individuals from crossing two autotetraploid parental lines differing by one, two, or three traitincreasing alleles under the corresponding chromosome pairing model. Prediction accuracy of the parental QTL genotype configuration depends heavily on the heritability of the simulated QTL and improves when the difference in the number of the trait- increasing alleles between the parental QTL genotypes increases. Incorrect prediction of QTL genotype configuration may lead to bias in the estimation of higher order genic (e.g. tri or quadrigenic) effects at the QTL (Tables S2, S3). The inferred location of QTLs was within 10 cM of the simulated position in > 70% cases when the heritability was 10%, whereas an expected reduction in accuracy to 44-67% was observed when heritability was only 5%. Similar results were obtained for simulation model II where marker allele dosage information is unknown, as explained in Notes S2 (Tables S4, S5). These results indicate that both BvMethod and QvMethod enable powerful detection of QTLs in outbred autotetraploid populations of a reasonable size and can accurately estimate QTL location and genetic effect parameters when the pairing behaviour of homologous chromosomes is known a priori to involve exclusively bivalent or exclusively quadrivalent pairing during meiosis.
To explore the robustness and appropriateness of implementing BvMethod or QvMethod to analyse the data generated under mixed quadrivalent and bivalent chromosomal pairing, we conducted a series of simulations involving 20 biallelic markers on a 100 cM chromosome, with parental QTL genotype QQQq Â Qqqq and a trait heritability of 10%. Each population was generated from a given frequency k of quadrivalent chromosome pairing, ranging from k = 0 (complete bivalent pairing) to k = 1 (complete quadrivalent pairing). It must be noted that when BvMethod is implemented with the simulation data generated with k > 0, a proportion of offspring marker genotypes will be incompatible (i.e. not expected to be observed) under the assumption of bivalent chromosome pairing. Incompatibilities may arise in two different ways. First, due to the occurrence of double reduction; for example, a one-locus gamete with genotype A 1 A 1 must have resulted from double reduction in a parent with genotype A 1 A 2 A 3 A 4 . Second, the offspring haplotype must come from no more than two parental chromosomes if bivalent pairing occurs during meiosis. For example, a three-locus gamete genotype A 1 B 1 C 1 /A 2 B 3 C 4 is incompatible with bivalent pairing. Table 3 summarizes the means and SEs of the MLEs of genetic effects at the QTL, and the empirical power across varying frequencies of quadrivalent pairing from k = 0 to k = 1. Individuals with incompatible genotypes at any of the 20 markers were removed from the data set before analysis with BvMethod. As the frequency of quadrivalent chromosomal pairing increased, estimation of higher order (digenic and trigenic) QTL genetic effects from BvMethod became increasingly biased. Note that the offspring genotypes needed for estimation of the quadrigenic effect (h 4 ) are not present under the bivalent chromosomal pairing model. QvMethod estimated the higher order genetic effects much more accurately than BvMethod did, though note that the variance is inherently higher for higher level genetic effects, as detailed in Notes S3. The proportion of correctly predicted The simulated parental quantitative trait locus (QTL) genotype was QQQq Â Qqqq. The proportion of quadrivalent chromosome pairing is given by k. Q genotype0 (Q genotype1 ) represents the proportion of predicted parental QTL genotype configurations exactly matching (differing by only a single allele) the simulated QTL genotypes. The mean and SEs of parameter estimates are given based on 200 replicate simulations. Individuals with at least one marker genotype that is unexpected from the theoretical genotype distribution under a model of bivalent chromosome pairing are removed from the simulated data sets before analysis with BvMethod. Accuracy (cM) is the distance between the true QTL location and its inferred location. Heritability was equal to 10%.

Research
New Phytologist parental QTL genotype configurations dropped dramatically using BvMethod, even when the proportion of quadrivalent pairing was low (k = 0.25), with QvMethod correctly predicting parental QTL genotype 68% of the time, compared with only 49% using BvMethod.
QvMethod is substantially more accurate than BvMethod in locating the QTLs when the homologous chromosomes show a mixture of both bivalent and quadrivalent pairings. As quadrivalent pairing increases in frequency, the accuracy of BvMethod drops markedly.
For example, at k = 0.50, the QTL detected is 5.68 AE 0.98 cM away from the true QTL location using BvMethod, but this is narrowed down to only 2.17 AE 0.59 cM by using QvMethod. At higher values of k, the improvement in accuracy through use of QvMethod becomes more pronounced. Note also that performance of QvMethod is comparable to that of BvMethod even when the homologous chromosomes undergo pure bivalent pairing (k = 0). A similar pattern of results was obtained in the analyses where individual incompatible genotypes were treated as missing data, as others (Hackett et al., 2014 have done elsewhere (Tables S6, S7). The degree of improvement in accuracy through use of QvMethod will naturally depend on the particular parental genotype configuration (Tables S6, S7), and hence on how extreme the difference is between the true offspring genotype distribution and the genotype distribution obtained under the assumption of bivalent pairing.

Experimental data analysis
We demonstrated the utility of QvMethod in real experimental data analysis using single nucleotide polymorphism (SNP) and phenotype data sets collected by Bradshaw et al. (2008) and repeatedly analysed by Hackett et al. (2014Hackett et al. ( , 2017. The data were collected from 190 offspring individuals generated from a cross between two autotetraploid parental potato lines, 'Stirling' and 12601ab1. As described elsewhere , this outbred segregating population was scored for 12 quantitative traits. The methods described in Hackett et al. (2001Hackett et al. ( , 2014 have recently been engineered into the Windows-supported software TETRAPLOIDSNPMAP, which enables genetic linkage map reconstruction and QTL mapping analysis in autotetraploids . The software is illustrated by mapping QTLs for the 12 agronomic traits on linkage group V reconstructed with 119 SNP markers, which corresponds to potato (S. tuberosum) chromosome V. Individual marker genotypes incompatible with the assumption of bivalent pairing were removed from the data by the authors. To demonstrate the method we have developed (i.e. QvMethod) and enable a direct comparison with the method in TETRAPLOIDSNPMAP (abbreviated here as H2017), we focused here on QTL mapping analysis with the 12 traits on linkage group V (though we have scanned QTLs for these traits over all 12 potato chromosomes). In the real data analysis, an empirical mapping resolution was defined as the size (in centimorgans) of the chromosomal interval over which the LOD score profile dropped by a value of 2.0 on both sides of the QTL peak. Fig. 1 illustrates the LOD score profiles of QTL detection for QvMethod (red lines) and H2017 (green lines) for the 12 potato agronomic traits. The significance thresholds (dotted lines) were calculated from 100 permutation tests of the LOD score test statistic for the corresponding method. QvMethod clearly shows a markedly higher statistical power for detecting the QTLs than the bivalent-pairing-based H2017 method does. In particular, QvMethod detected QTLs for four of the 12 traits (tuber size, Fry10, Globodera cyst counts, and flower colour) that were missed by the H2017 method ( Fig. 1; Table 2). Table 4 shows that QvMethod confers a substantially higher QTL mapping resolution than the H2017 method does. Table 2 also shows that the QTLs detected by QvMethod usually explain a larger proportion of genetic variance of the trait (Vg%) compared with the QTLs detected by H2017. These observations suggest that H2017, a method strictly based on a bivalent chromosomal pairing model, may fail to detect QTLs that have a small genetic effect on a quantitative trait. For a given Vg%, the estimated proportion of additive genetic variance of the QTLs detected,q, indicates that the QTLs detected by both methods contribute largely additive genetic variation to the trait phenotype variation. The notable exception is tuber yield, which may reflect the particularly complex genetic architecture of this trait, both in potato (Schonhals et al., 2017) and other crops (Shi et al., 2009). Note that three of the four QTLs that were missed by H2017 (Fry10, Gtc, Fc) have a relatively small additive genetic variance, reflecting the fact that H2017 only models additive QTL genetic effects.

Significance of chromosomal pairing patterns in quantitative trait locus mapping efficiency
The previously mentioned simulation study and real data analysis demonstrated that QvMethod outperforms other methods that assume only bivalent chromosomal pairing in the meiosis of autotetraploids, both in terms of QTL detection power and mapping resolution. This thus urges the essentiality of modelling the complex features of tetrasomic gene segregation and recombination for QTL mapping in autotetraploid species and motivated us to explore the difference in statistical efficiency between the two types of method.
We explored how the different chromosomal pairings affect the genetic structure of mapping populations. It is clear from Table 2 that the genotypic distribution of the mapping population differs substantially according to the mode of chromosome pairing (bivalent or quadrivalent), not only in the probability of a genotype, but also in the number of possible genotypes. In particular, if homologous chromosomes undergo quadrivalent pairing, there could be a substantial proportion of individuals in the mapping population carrying gametes from double reduction events (hereafter referred to as double reduction gametes) and hence being incompatible with the assumption of bivalent pairing. Fig. 2(a) (red dotted line) shows that the expected proportion of individuals carrying double reduction gametes increases with distance from the centromere and could be as high as 40%, but only a small number of incompatible offspring genotypes are observable. Furthermore, for three of the 12 possible parental genotype Ó 2020 The Authors New Phytologist Ó 2020 New Phytologist Trust New Phytologist (2021) 230: 387-398 www.newphytologist.com configurations, no incompatible offspring genotypes can be observed. A substantial gap therefore exists between the incompatible genotypes that can be observed directly from the marker data (Fig. 2a, coloured solid lines) and the expected number of incompatible genotypes (Fig. 2a, red dotted line). We also investigated the influence of excluding individuals with observable incompatible marker data on estimation of the level of double reduction, based on 100 repeated simulations with 200 offspring individuals from crossing two autotetraploid parents. The distribution of double reduction estimates is far below the simulated value for each of nine possible parental genotype configurations when the incompatible data are removed (Fig. 2b). This shows how excluding only a small (i.e. the observable) fraction of incompatible individuals is very sensitive in weakening the signal of double reduction. These observations largely, if not fully, explain why the development of QvMethod in this study outperforms the bivalentpairing-based methods when all or some of the homologous chromosomes undergo quadrivalent pairing during meiosis, and indicate that the bivalent-pairing-based methods, strictly speaking, cannot truly model and analyse the full spectrum of data from real QTL mapping experiments in autotetraploid species.

Discussion
Mapping of QTLs is a key initial step towards understanding the molecular mechanisms underpinning quantitative genetic variation. It has taken decades from the conception of an idea (Sax, 1923) to maturation of the methods (Lander & Botstein, 1989;Sen & Churchill, 2001) in diploid species, and QTL mapping is now a routine genetic analysis in almost all evolutionarily and economically important diploid species. However, linkage analysis with autotetraploids has remained a historical challenge since the work of prominent geneticists such as J. B. S. Haldane, K. Mather, and R. A. Fisher (Haldane, 1930;Mather, 1936;Fisher, 1947). Although a breakthrough on this topic was made in our previous work (Luo et al, 2004(Luo et al, , 2006Leach et al., 2010), linkage analysis involving QTLs under a tetrasomic inheritance model, the theoretical basis for mapping QTLs, has remained a theoretical bottleneck and a methodological gap in the field of quantitative genetic analysis with autotetraploid species.
Here, we have presented a theoretical and methodological breakthrough that enables QTL mapping to be carried out in autotetraploid species on the basis of a rigorous tetrasomic inheritance model. The methods account properly for the key features of gene  Hackett et al. (2001Hackett et al. ( , 2014Hackett et al. ( , 2017 pioneered the development of statistical methods for QTL mapping in autotetraploids that assume exclusively bivalent chromosomal pairing, and have been widely cited in the literature. This assumption may substantially simplify the statistical analyses so as to enable the methodology development, but it comes at the price of sacrificing the key features of tetrasomic inheritance in autotetraploid species. The empirical QTL mapping resolution is defined as the chromosomal region in centimorgans over which the logarithm of odds score drops by a value of 2.0 on each side of the QTL peak. Vg (%) is the proportion of genetic variance explained by the QTL detected. In the predicted parental QTL genotype, Q (or q) indicates the trait increasing (or decreasing) allele.l,ĥ 1 ,ĥ 2 ,ĥ 3 ,ĥ 4 , andr are the estimated population mean, monogenic, digenic, trigenic, quadrigenic effects, and residual error for the QTL detected based on the orthogonal contrast quantitative genetic model.q is the estimated proportion of genetic variance explained by additive QTL genetic effects. The 12 traits are yield (fresh tuber weight in kilograms per plot), Ht (canopy height), size (tuber), shape (tuber), DM (dry matter), Fry4 and Fry 10 (frying colour at 4°C and 10°C), Mat (maturity), Fb4 (foliage blight), Tb% (tuber blight), Gtc (Globodera pallida cyst counts), and Fc (flower colour). Dashes denote indeterminable parameters, because the relevant offspring genotypes are not observable for the given predicted parental QTL genotypes. ns denotes nonsignificant QTL. H2017 represents the method described in Hackett et al. (2017). Quadrivalent chromosomal pairing has been observed in various autotetraploid plants, with a frequency varying from up to 10% in kiwi , 20-30% in potato (Bourke et al., 2015), and up to 36% in Pennisetum orientale (Deniz & Dogru, 2006). In these cases, bivalent-based methods cannot deal with the full spectrum of the data because there will be a large proportion of observed data that is incompatible with the expected distribution under the bivalent pairing assumption. In practice, it has been proposed to exclude the incompatible data from QTL mapping analysis when using bivalent-based methods (Hackett et al., 2014da Silva Pereira et al., 2019). We have shown that the chance to directly observe the incompatible data is very low or even impossible, because a substantial proportion of individuals carry gametes from double reduction events that are undetectable in the mapping population and therefore cannot be excluded from the data. This creates a serious problem, whereby the genetic structure of the mapping population deviates significantly from that expected under bivalent chromosome pairing, even when the observable incompatible data are removed. Our development of QvMethod accounts for quadrivalent chromosome pairing and therefore outperforms the bivalent-based methods both in terms of statistical power of QTL detection and accuracy of locating QTLs, as demonstrated through simulation study and analysis with real data from a segregating population of autotetraploid potato (S. tuberosum). Recently, Bourke et al.
(2019) explored factors affecting QTL analysis in polyploids by combining the computer software 'TETRAORIGIN', which was designed by Zheng et al. (2016) to predict the parental origin of marker alleles in autotetraploid populations, and a simplified additive genetic model into QTL mapping analysis. However, the method described in Bourke et al. (2019) lacks an essential component; that is, genetic linkage analysis between QTLs and surrounding markers for QTL mapping in an outbred autotetraploid segregating population, as we have presented here.
The QTL mapping method presented here is based on a biallelic quantitative genetic model detailed in our recent work (Chen et al., 2018). A full multiple allele quantitative genetic model for a tetraploid individual, such as the one defined in Kempthorne (1957), is statistically intractable because it involves a total of 96 genetic parameters (Hackett et al., 2001). This number increases to 120 in a segregating population from crossing two parental lines divergent at all four alleles of a QTL (8 monogenic, 28 digenic, 48 trigenic, and 36 quadrigenic effects). A simplified version of this multiple allele model has been implemented by either setting linear constraints on the genetic effects, as in Hackett et al. (2001Hackett et al. ( , 2014Hackett et al. ( , 2017, or by modelling only some of the genetic effects of a full model (e.g. additive effects), as in Bourke et al. (2019). Using these simplified models, estimates of the genetic parameters may be biased and difficult to interpret because many other genetic effects (e.g. interactive effects between QTL alleles) are ignored. To solve the intractability problem and to adequately estimate the genetic parameters in a full model, we proposed an orthogonal contrastbased model for modelling quantitative genetic effects in various autotetraploid populations, as detailed in Chen et al. (2018), and implement it in the present QTL mapping method. The biallelic model focuses on genetic effects of QTL alleles on trait phenotypes and models their increasing or decreasing effects on the traits. It is statistically tractable and completely models all genetic effects (additive and interactive) at a putative QTL. In particular, the model does not imply that there are only two alleles on the molecular level (i.e. sequence variants), but instead the two 'alleles' define the increasing or decreasing quantitative genetic effects of QTL alleles on the trait.
In addition to trait phenotype data, molecular marker data present the other essential component in any QTL analysis. New-generation genomic DNA sequencing techniques enable discovery and generation of a large number of DNA sequence polymorphic markers in a tested genome. Genotyping by sequencing (GBS) is relatively straightforward in diploid species, though serious consideration must be given to several major sources of variation in collecting and processing the sequencing data for accurate identification of allele-specific sequencing reads (Geijn et al., 2015). GBS in tetraploids is a much more challenging task and involves distinguishing the number of each constituent allele (i.e. the allele dosage) in a heterozygote genotype (e.g. Uitdewilligen et al., 2013). Though methods for diagnosis of allele dosage from DNA sequencing data in tetraploids are beginning to emerge (Margarido & Heckerman, 2015;Gerard et al., 2018), this task remains an incredible challenge and is ripe for further investigation. The QTL methods developed in this study can utilize marker data with or without known allele dosage information. Hackett et al. (2014) showed that use of marker dosage allele information may improve the efficiency of QTL mapping analysis. An open question arises as to what extent the biased diagnosis of allele dosage would affect the efficiency of mapping QTLs in autotetraploids.

Supporting Information
Additional Supporting Information may be found online in the Supporting Information section at the end of the article.
Methods S1 Conditional probability of flanking marker genotypes given their surrounding marker data.
Methods S2 Relationship between the coefficients of double reduction at linked loci under the mixed chromosome pairing model.

Methods S3
The EM algorithm for maximum likelihood estimation of QTL genetic effects.
Notes S1 Relationship between recombination frequencies of QTL and flanking markers under quadrivalent and bivalent chromosome pairing models.

Notes S2 Simulation model and parameters.
Notes S3 Variance in the estimation of genetic effect parameters.
Table S1 Probability distribution of diploid gamete genotypes at a QTL and its flanking marker loci from a bivalent meiosis of an autotetraploid individual.

Table S2
Parameter estimation from BvMethod based on 100 simulations with biallelic SNP markers under a bivalent chromosome pairing model.     Please note: Wiley Blackwell are not responsible for the content or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.
New Phytologist is an electronic (online-only) journal owned by the New Phytologist Foundation, a not-for-profit organization dedicated to the promotion of plant science, facilitating projects from symposia to free access for our Tansley reviews and Tansley insights.
Regular papers, Letters, Viewpoints, Research reviews, Rapid reports and both Modelling/Theory and Methods papers are encouraged. We are committed to rapid processing, from online submission through to publication 'as ready' via Early Viewour average time to decision is <26 days. There are no page or colour charges and a PDF version will be provided for each article.