**Favourable case: genotypes can be deduced from double chromatograms. ** In favourable situations, it is possible to deduce, after DAS, the two sequences from ambiguous (double) chromatograms. This generally occurs for limited polymorphism values and sequence lengths. The most frequent alleles are often identified from homozygous individual sequences, so that diploid sequence genotypes can be deduced from ambiguous double sequences relatively easily. If the length of two mixed sequences is moderate, there is generally no shift between their corresponding chromatogram peaks, it is thus possible to use dedicated software to analyse the sequence files written with the IUPAC ambiguity code (Dixon, 2009; Garrick *et al.*, 2010; Stephens & Donnelly, 2003). These programs provide estimates of the probability of correct inference for each genotype or allele (but see Garrick *et al.*, 2010), and the presence of indels is not a problem, contrary to what is often supposed. When both sequences become shifted at some point in the chromatogram, the programs cited above cannot be used but, by contrast, it often becomes possible for the scientist, to visually distinguish the two mixed sequences owing to their shift, and sequence genotypes may then be obtained in some favourable cases, in general when polymorphism is low. Cloning may thus not be required to determine both sequences. With such markers however, cloning a few individuals is nevertheless useful to test that the marker behaves as a single diploid locus (SDL test).

**In the general case, cloning is required at least for some individuals. **Figure 4 schematizes the explanations below and displays the different phases for this general situation, according to polymorphism values (*H*_{o}).

After a cloning step, the strategy minimizing the number of sequencing reactions consists in sequencing two clones only, and then, when both clones are identical, adding clone sequences one by one until the second allele is obtained. However, this solution would be tedious and poorly efficient relative to the organization of laboratory work. I thus decided to focus on simple strategies respecting the constraint that only two sessions of sequencing are carried out per individual. Two such strategies are possible. In strategy 1, the first step consists in DAS for a set of individuals and is eventually followed, for unreadable (heterozygous) individuals, by a second step of CS, in which *S* clones are sequenced per individual (*S* being determined as to minimize the average total number of sequences). In strategy 2, cloning and sequencing a first set of *S*_{1} clones is carried out for the first step; then, for the individuals for whom all clones display the same sequence, a second set of *S*_{2} clones is sequenced (*S*_{1} and *S*_{2} being determined as to minimize the average total number of sequences). To determine the threshold value of *H*_{o}, under which strategy 1 is better than strategy 2, we used the following parameters and computations. *T*_{1} (respectively *T*_{2}) represents the average number of sequences per individual which is necessary to obtain a proportion (1-μ) of fully determined genotypes (both alleles known) under strategy 1 (respectively strategy 2). Noting *S*_{μ} the number of sequences necessary to ensure that the proportion of individuals whose genotypes are not fully determined does not exceed μ, and summing the first and second steps we obtain:

From a heterozygous individual, the probability of getting all *S* clones identical is 2^{(1 − S)} so the proportion of undetermined genotypes is μ = 2^{(1 − Sμ)}*H*_{o}. Therefore, *S*_{μ} is the smallest integer equal or superior to: 1 − [Ln(μ/*H*_{o})/Ln(2)]. As expected, the smaller the value of μ, the higher the number of sequences *S*_{μ} for realistic parameter values (μ < *H*_{o}). Under strategy 1, the undetermined genotypes are individually identified owing to the direct sequencing step and all are heterozygotes (Fig. 4).

The average number of sequences required for characterizing genotypes under strategy 2 as a function of *S*_{1} and *S*_{2} is: *T*_{2} = *S*_{1} + *S*_{2}[2^{(1 − S1)}*H*_{o} + (1 − *H*_{o})].

Replacing by its expression as a function of *H*_{o}, μ and *S*_{1} in *T*_{2} gives:

The next step is to determine the value(s) of *S*_{1} minimizing *T*_{2}, for a given value of μ, noted . This requires to study the variation of the function *T*_{2}(*S*_{1}) which reveals that there is a single value of *S*_{1} minimizing *T*_{2} (Appendix S2). The minimum relevant number of clones to sequence is 2, and for values exceeding 10 sequences, the probability that a heterozygote displays only identical clones drops below 2^{−9} (ca. 2.10^{−3}), which represents a very low proportion of genotypes that will not be fully determined. Therefore, *T*_{2} can be computed for all relevant integer values of its argument (starting from two) until the minimum is found to deduce . This was carried out to find the results presented in Table 3 and Fig. 4.

Table 3. Threshold values of *H*_{o} for different proportions of nonfully determined genotypes. | μ = 0.001 | μ = 0.01 | μ = 0.03 |
---|

Threshold *H*_{o} | 0.59 | 0.60 | 0.61 |

*T* (at threshold *H*_{o}) | 7 | 5.15 | 4.3 |

The minimum average number of sequences per genotype [min(*T*_{1}, *T*_{2})] as a function of *H*_{o} for a proportion of nonfully determined genotypes of μ = 0.01 is represented in Fig. 4. The threshold *H*_{o} value under which strategy 1 is more interesting than strategy 2 is close to 0.60. Strikingly, when stringency varies (μ = 0.001 or 0.03), the *H*_{o} threshold is nearly invariant, between 0.59 and 0.61 (Table 3). The mean number of sequences to perform per individual is never higher than 4.3 for μ = 0.03, 5.15 for μ = 0.01, and 7 for μ = 0.001. These maxima correspond to the threshold values of polymorphism (*H*_{o}).

When strategy 2 is better (i.e. *H*_{o} > threshold), the optimal number of clones to sequence at the first round is larger than two: three for μ = 0.03 or 0.01 (for all values of *H*_{o}), and four for μ = 0.001. With μ = 0.01, the number of sequences to perform to obtain an individual genotype is never higher than 8 (max = 3 + 5 sequences per individual, which occurs when *H*_{o} > 0.64), but the average number of sequences per determined genotype is much lower (4–5) for this range of *H*_{o} values (Fig. 4).