Disagreement in F ST estimators: A case study from sex chromosomes

Abstract Sewall Wright developed F ST for describing population differentiation and it has since been extended to many novel applications, including the detection of homomorphic sex chromosomes. However, there has been confusion regarding the expected estimate of F ST for a fixed difference between the X‐ and Y‐chromosome when comparing males and females. Here, we attempt to resolve this confusion by contrasting two common F ST estimators and explain why they yield different estimates when applied to the case of sex chromosomes. We show that this difference is true for many allele frequencies, but the situation characterized by fixed differences between the X‐ and Y‐chromosome is among the most extreme. To avoid additional confusion, we recommend that all authors using F ST clearly state which estimator of F ST their work uses.


| BACKG ROU N D
Genetic sex determination is common in both plants and animals (Bachtrog et al., 2014) and the pair of chromosomes where the sex-determination gene resides are referred to as the sex chromosomes. Sex chromosomes are hypothesized to often emerge from autosomes once they have acquired a novel mutation for sex determination (Abbott, Nordén, & Hansson, 2017;Bachtrog et al., 2014). Linked, sexually antagonistic alleles can help to drive a novel sex-determination allele to a higher frequency in the population ( van Doorn & Kirkpatrick, 2010) and mechanisms that reduce recombination between sexually antagonistic loci and the novel sex-determination locus are selectively favoured (Charlesworth, Charlesworth, & Marais, 2005;Rice, 1987). Due to the reduction in recombination, deleterious mutations accumulate and gradually decay the gene content within this region (Bachtrog, 2013;Blaser, Grossen, Neuenschwander, & Perrin, 2012). In some systems, largescale deletions or expansions of repetitive elements occur and lead to heteromorphic sex chromosomes (Bachtrog, 2013;Charlesworth et al., 2005). As a result of this process, sex chromosomes exist on a spectrum between harbouring a single nucleotide polymorphism (SNP) responsible for sex determination with no reduction in recombination from the surrounding region, as seen in fugu (Kamiya et al., 2012), to the highly decayed and heteromorphic sex chromosomes observed in many eutherian mammals (Bellott et al., 2014;Cortez et al., 2014).
Two common methods have been developed to identify sex chromosomes at different points on this spectrum using next-generation sequencing data sets. In the more advanced stages of sex chromosome evolution the X-and Y-chromosome share little genomic content. As a result, short reads from the Y-chromosome align poorly to an X-chromosome reference, resulting in a higher coverage in females than in males. These differences in coverage between males and females can be used to detect putatively nonrecombining regions of sex chromosomes (Fraïsse, Picard, & Vicoso, 2017;Huylmans, Toups, Macon, Gammerdinger, & Vicoso, 2019;Pal & Vicoso, 2015;Roesti, Moser, & Berner, 2013;Vicoso & Bachtrog, 2011, 2013Vicoso, Kaiser, & Bachtrog, 2013). In the less advanced stages of sex chromosome evolution, the X-and Y-chromosome differ only by a few base substitutions. Therefore, short reads from the X-and Y-chromosome align in nearly equal proportions to an X-chromosome reference and the identification of sex chromosomes instead relies on differences in allele frequencies in males and females (Böhne et al., 2019;Conte, Gammerdinger, Bartie, Penman, & Kocher, 2017;Dixon, Kitano, & Kirkpatrick, 2018;Fontaine et al., 2017;Gammerdinger, Conte, Acquah, Roberts, & Kocher, 2014;Gammerdinger, Conte, Baroiller, D'Cotta, & Kocher, 2016;Gammerdinger, Conte, Sandkam, Penman, & Kocher, 2019;Gammerdinger, Conte, Sandkam, & Ziegelbecker, 2018;Toups, Rodrigues, Perrin, & Kirkpatrick, 2018;Veltsos et al., 2019). Typically, SNPs are first identified among subpopulations of males and females, and regions with high levels of genetic differentiation between males and females are presumed to be sex-linked. This genetic differentiation between males and females is often described in terms of F ST .
F ST is a relative measure of population differentiation (Cruickshank & Hahn, 2014) and was outlined along with other Fstatistics by Sewall Wright (Wright, 1949). Estimates of F ST have been used for many novel applications, such as examining parallel adaptation in sticklebacks (Hohenlohe, Bassham, Etter, & Cresko, 2010), introgression in canaries (Lopes et al., 2016) and local adaptation in high-altitude populations of Tibetans (Peng et al., 2011;Xu et al., 2011). Recent work has used estimates of F ST to identify and describe the divergence between relatively homomorphic sex chromosomes (Bergero, Gardner, Bader, Yong, & Charlesworth, 2019;Böhne et al., 2019;Conte et al., 2017;Dixon et al., 2018;Fontaine et al., 2017;Gammerdinger et al., 2014Gammerdinger et al., , 2016Gammerdinger et al., , 2018Gammerdinger et al., , 2019Kirkpatrick & Guerrero, 2014;Natri, Shikano, & Merilä, 2013;Rodrigues & Dufresnes, 2017;Toups et al., 2018;Veltsos et al., 2019). However, there is a discrepancy within the literature regarding the expected estimate of F ST for a fixed difference between the X-and Y-chromosome. When comparing males and females for an allele that is either fixed on the X-or Y-chromosome of an XY pair, some studies expect an F ST estimate of 0.333 (Gammerdinger et al., 2014(Gammerdinger et al., , 2016(Gammerdinger et al., , 2018Kirkpatrick & Guerrero, 2014;Toups et al., 2018), while other studies expect an F ST estimate of 0.5 (Böhne et al., 2019;Fontaine et al., 2017;Rodrigues & Dufresnes, 2017). The difference in these expectations is not typically justified, nor is the specific estimator of F ST employed stated, leading to some confusion in the field.
A recent study highlighted inconsistencies between different estimators of F ST (Berner, 2019) and in particular pointed out that, for an SNP fixed on the Y-chromosome, one common estimator of F ST yields a value of 0.333 (Nei, 1973) while another yields 0.5 (Weir & Cockerham, 1984). How other popular estimators behave for an SNP that is alternatively fixed between the X-and Y-chromosome, and, importantly, why these discrepancies arise, have yet to be systematically reviewed.
Here, we aim to clarify why such discrepancies in the expected estimates of F ST can arise when comparing males and females for alternatively fixed alleles between the X-and Y-chromosome. Note that this difference in expectations of F ST is symmetric for ZW systems, so this analysis will only describe an XY system. Importantly, while this analysis focuses on the specific case of sex chromosomes, we also illustrate that the difference between F ST estimators can be substantial for a wide range of allele frequencies, making direct comparisons of F ST estimates between studies problematic in many contexts. Last, we apply a variety of population genetics software packages, which often generically refer to F ST , to estimate F ST for alternatively fixed alleles between the X-and Y-chromosome under various sampling schemes. Because these programs use different estimators and corrections for sample size and composition, a diverse range of expected F ST values can be recovered (0.16-0.67) and, as a result, further complicates the interpretation of experimental studies that use F ST to assess sex chromosome differentiation.

Since Wright introduced F ST
where 2 p is the variance in the allele frequency for p and p and q are the average allele frequencies across the subpopulations for p and q, respectively (Hedrick, 2005;Weir & Cockerham, 1984). When comparing the sex chromosomes of males and females, we will define subpopulation 1 to be males and subpopulation 2 to be females, while p is the frequency of the allele on the X-chromosome and q is the frequency of the allele on the Y-chromosome (Table 1). (1)

| E S TIMATOR S OF F S T
In practice, the parameter values for allele frequencies are unknown and thus many methods have been proposed to estimate F ST . Here, we contrast two common estimators of F ST , which we will denote as F ST Nei and F

ST Hudson
. The difference between these estimators has been previously discussed by others (Bhatia, Patterson, Sankararaman, & Price, 2013;Charlesworth, 1998), but not specifically in the context of sex chromosomes. While some estimators handle multiple alleles and multiple subpopulations, we will be considering only the biallelic state for two subpopulations. We make these simplifications because they provide a direct comparison between estimators and they reflect a situation with alternatively fixed alleles on the X-and Y-chromosome when comparing males and females. Also, we will focus on the example of a fixed difference between the Xand Y-chromosome because it is the fundamental component of these elevated estimates of F ST . However, F ST estimates for various degrees of difference in the allele frequencies between males and females, going from equal frequencies in both sexes to alternatively fixed differences between the X-and Y-chromosome, can be found in Figure S1. In an empirical study, the identification of an XY system would typically show a region overrepresented with these fixed differences between the X-and Y-chromosome.

| Nei's estimator of F ST
One estimator of F ST comes from Nei (1973) and is often referred to as G ST . G ST uses heterozygosity data to estimate F ST . G ST quantifies the difference between the total heterozygosity of the population and the average heterozygosity of the subpopulations and normalizes this difference by the total heterozygosity of the population. It was defined by Nei (1973) as: When considering two alleles in two subpopulations, this estimator can be simplified to: Thus, by comparing males and females for an allele that is alternatively fixed on the X-and Y-chromosome using Nei's (1973) estimator, the expected estimate of F ST is 0.333.
A similar estimator of F ST , called γ ST , is generally used in the context of haplotypes and estimates F ST using nucleotide diversity.
Nucleotide diversity, π, is the mean number of nucleotide differences between two randomly selected sequences from a population. γ ST is described as the difference between the total nucleotide diversity of the population and the average nucleotide diversity of the subpopulations normalized to the total nucleotide diversity of the population (Nei, 1982). Nei (1982) defined γ ST as: where π T the total nucleotide diversity of the population and π S is the mean of the subpopulations' nucleotide diversities. Notably, when considering a single, biallelic SNP, G ST and γ ST are equivalent (Nei, 1982). As a result, we will describe nucleotide diversities in terms of p and q, since nucleotide diversities and heterozygosities are equivalent for a SNP.
We introduce γ ST because it will lead to the most direct comparison between Nei's (1973) estimator and Hudson, Slatkin, et al. (1992) estimator in the next section.
π T estimates nucleotide diversity from p and q, the mean of p and q across the subpopulations, respectively. Importantly, π T makes comparisons between all alleles in the whole population. When considering a biallelic SNP in two subpopulations, π T in Nei's (1982) estimator can be simplified to: Using the values from Table 1, we can estimate π T as 0.375. The nucleotide diversity, π, of each subpopulation can be computed using Nei and Li's (1979) definition for this statistic and averaged together to become π S . For a biallelic SNP in two subpopulations, π S can be simplified to: (2) When utilizing the values in Table 1, π S is 0.25 and therefore Equation 4 recovers an expected estimate of F ST to be 0.333.

| Hudson, Slatkin and Maddison's estimator of F ST
A second, alternative estimator of F ST comes from Hudson, Slatkin, et al. (1992). This estimator considers the difference in the average nucleotide diversity between subpopulations and the average subpopulation nucleotide diversity and then normalizes this difference to the average nucleotide diversity between subpopulations. This estimator of F ST is defined as: where π W is similar to Nei's (1982) π S , except π W excludes pairwise comparisons of haplotypes against themselves and is thus dependent on the subpopulation sample sizes. π W can be expressed as: where n 1 and n 2 are the number of diploid individuals sampled from subpopulation 1 and 2, respectively. (A full derivation that can be found in the supplementary information of Bhatia et al. (2013). Note that in the Bhatia et al. (2013) derivation, n 1 and n 2 represent allele counts not diploid individual counts as are used here). As the subpopulation sample sizes go to infinity, π W will approach π S and thus the difference between π S and π W is often negligible with large subpopulation sample sizes. π B is an alternative estimator of nucleotide diversity, defined by Nei and Li (1979) as π XY , and it can be quite numerically different from π T . π XY is the mean number of nucleotide differences between two randomly selected DNA sequences, each of which is drawn from separate subpopulations. In the case of a biallelic SNP in two subpopulations, Hudson, Slatkin, et al. (1992) estimator for π B can be rewritten as: The values in Table 1 yield an estimate of π B to be 0.5. As subpopulation sample sizes approach infinity, estimating F ST for a biallelic locus in two subpopulations with this estimator can be written as: As subpopulation sizes go to infinity and using either Equation 7 or 10 with the values in Table 1, we arrive at an estimate of F ST approaching 0.5. F ST estimates for finite sample sizes using this estimator are demonstrated in Figure 1. Importantly, regardless of the subpopulation sample sizes employed, Nei's (1973) estimator and Hudson, Slatkin, et al. (1992) estimator are always quite different for the case of sex chromosomes (Figure 1).

| Why is there a difference in the expected estimate of F ST ?
By comparing Equations 4 and 7 with large subpopulation sample sizes, it is clear that the important difference in Nei's (1973Nei's ( , 1982 (7) Various estimates of F ST for a fixed difference between the X-and Y-chromosome when (a) using equal subpopulation sample sizes for two subpopulations, males and females, and (b) using unequal subpopulation sample sizes for the two subpopulations, males and females, while keeping the total sample size constant Weir and Cockerham, 1984Hudson, Slatkin and Maddison, 1992Nei, 1973Nei and Chesser, 1983 Weir and Cockerham, 1984Hudson, Slatkin and Maddison, 1992Nei, 1973Nei and Chesser, 1983 estimator and Hudson, Slatkin, et al. (1992) estimator arises in how they handle π T and π B . Nei's (1982) π T uses two randomly drawn sequences from the population as a whole, while Hudson, Slatkin, et al. (1992) π B requires that the two randomly drawn sequences be from the separate subpopulations. Figure 2 illustrates the difference between these two estimators for (a) a biallelic SNP present on an autosome in two subpopulations and for (b) sex chromosomes when comparing males and females. Figure 3 shows the F ST estimates produced from Nei's (1973) estimator and Hudson, Slatkin, et al. (1992) estimator when considering infinitely large subpopulation sample sizes, as well as the difference between these two estimators. Interestingly, the regions corresponding to an SNP that is alternatively fixed between the X-and Y-chromosome are among the regions where the difference in these estimators is highest.
However, these estimators can differ substantially across the range of plausible allele frequencies observed in two subpopulations. For example, SNPs that are not yet fixed differences between the X-and Y-chromosome also show this discordance between F ST estimators ( Figure 3; Figure S1). Additionally, the difference in these two F ST estimates when p 1 is 0.20, p 2 is 0.80 and both subpopulation samples sizes are 20 is slightly larger than the difference produced by sex chromosomes. This importantly illustrates that the disagreement in F I G U R E 2 Comparison of the nonzero components of π B in Hudson, Slatkin, et al. (1992) estimator and π T in Nei's (1973) estimator for biallelic SNPs in (a) two subpopulations and (b) an XY system. Each bar under the alleles represents a nonzero comparison that occurs in the formulation of π B or π T . The curly bracket beneath females in the sex chromosome comparison illustrates that females are homomorphic for this allele despite being diploid and thus only one nonzero comparison is made Hudson, Slatkin andMaddison, 1992 Nei, 1973 Two biallelic subpopulations Sex chromosomes Nei, 1973 Males M ales Females Females Hudson, Slatkin and Maddison, 1992 F I G U R E 3 Visualizations of Nei (1973), Hudson, Slatkin, et al. (1992) and the difference between the two estimators. (a) Estimates of F ST using the Nei (1973) estimator with white being no differentiation and dark blue being complete differentiation. (b) Estimates of F ST using the Hudson, Slatkin, et al. (1992) estimator given infinitely large subpopulation sizes with white being no differentiation and dark blue being complete differentiation. (c) A heatmap of the difference between Hudson, Slatkin, et al. (1992) estimator and Nei's (1973) estimator for F ST (Hudson, Slatkin, et al. (1992) minus Nei (1973)) given infinitely large subpopulation sample sizes and the allele frequencies of p in subpopulations 1 and 2. Warmer colours show more difference between the estimators, while cooler colours show less difference between the estimators. Because the assignment of p and q along with subpopulation 1 and 2 is arbitrary, we have placed black boxes at all of the locations that could fit the description of a fixed difference between the X-and Y-chromosome and provided an arrow to the scenario we outlined in Table 1

| Additional corrections to F ST
There are several estimators of F ST that attempt to provide corrections for sampling biases and can cause further deviations from the expected estimate of F ST . Some estimators, such as that proposed by Hudson, Slatkin, et al. (1992), change with the total number of individuals sampled (Hudson, Boos, & Kaplan, 1992;Hudson, Slatkin, et al., 1992;Nei & Chesser, 1983) (Figure 1a; Table 2).
As similarly pointed out by Berner (2019), Weir and Cockerham's (1984) estimator appears to respond dramatically to unequal numbers of males and females. As the proportion of the males in a constant sample size increases, the total variance of the sample increases and thus decreases the estimate of F ST (Figure 1b).
Regardless, in the case of a fixed difference between the X-and Y-chromosome, p is defined as 0.75. Thus, there is little need to correct for subpopulation sample sizes because this differentiation is similar whether a single male and female are analysed or a very large number of each sex are considered. However, this type of sample size correction may be applicable when considering SNPs that are more frequent on the Y-chromosome but not yet fixed or if it is TA B L E 2 Software packages for estimating F ST and their estimates using mock input. These input files contained fixed differences between the X-and Y-chromosome for various sample sizes of males and females  Hudson, Slatkin, et al. (1992) a This implementation appears to use a w i = n i − 2 n 1 + n 2 − 4 weighting factor. b These estimates are most consistent with Nei and Chesser (1983), which is also discussed in Hudson, Boos, et al. (1992). c These metrics appear to use a w i = n i n 1 + n 2 weighting factor, while Nei (1982) and Nei and Chesser (1983) state that in most practices the subpopulations can be assumed to be weighted equally. d The referenced estimator is consistent with F′ ST in Nei (1987). unknown whether an SNP is alternatively fixed between the X-and Y-chromosome.
An additional correction considers the number of subpopulations sampled (Hedrick, 2005;Weir & Cockerham, 1984). This correction is related to an infinite island model that assumes that the researcher is sampling a few subpopulations from a larger metapopulation. Because there are only two subpopulations, males and females, these corrections are probably unsuitable in this context.
Some estimators are particularly concerned with correcting for this bias (Hedrick, 2005;Meirmans & Hedrick, 2011); however, this correction for highly polymorphic loci is unlikely to be necessary for the biallelic locus in question.
While some of these corrections are probably inappropriate, authors may be using them as some software packages refer to their implementation generically as F ST (Table 2). Table 2 and Figure 1 highlight the wide range of results that researchers could get for estimating F ST depending on the subpopulation sample sizes and estimator employed. One may argue that any large deviation away from zero in F ST estimates is sufficient enough evidence for sex chromosomes. However, estimates of F ST that include the various aforementioned corrections may never reach 0.333 or 0.5 (Table 2) and thus the expected maximum estimate of F ST for a particular data set should ideally be considered and stated. Otherwise, deviations from the theoretical maximum due to these corrections could lead to an erroneous interpretation that there are no fixed differences between the X-and Y-chromosome.
While the particular case of variants on sex chromosomes leads to some of the largest differences between these estimators, substantial differences can occur under alternative scenarios as well, and it would often be helpful to know how much of the variance between studies is driven by how F ST is estimated. For instance, whether sexually antagonistic selection can explain the range of F ST values that are found between males and females of different species (Cheng & Kirkpatrick, 2016;Flanagan & Jones, 2017;Lucotte, Laurent, Heyer, Ségurel, & Toupance, 2016;Wright et al., 2018;Wright, Rogers, Fumagalli, Cooney, & Mank, 2019) has recently been the subject of debate (Kasimatis, Nelson, & Phillips, 2017;Kasimatis, Ralph, & Phillips, 2019 (Hudson, Slatkin, et al., 1992;Nei, 1986;Weir, 1996;Wright, 1949) was not considered. In the future, we strongly urge researchers to justify their estimator, so that appropriate F ST estimators are employed and estimates from various studies can be comparable.

| CON CLUS IONS
When considering fixed differences between the X-and Y-chromosome, we conclude that it is appropriate to use Nei's (1973) estimator since it is most consistent with the work of Wright and others. However, both Nei's (1973) and Hudson, Slatkin, et al. (1992) estimators are useful estimators of differentiation and there could be questions, such as those regarding polymorphisms that are not fully linked to the X-or Y-chromosome, which are better answered with different estimators that implement some of the previously mentioned corrections. Moving forward, we encourage researchers to state which estimator they choose, their rationale for that choice and what the expected estimate of F ST is for the data set they are investigating.

ACK N OWLED G EM ENTS
We would like to thank Matthew Hahn, Mark Kirkpatrick and Thomas Kocher for their thoughtful and timely insights as we have tried to differentiate between these estimators of F ST . This work was funded by an ISTPlus Fellowship to W.J.G. and by an ERC grant (Project P 28842) to B.V.  Table 2.

CO N FLI C T S O F I NTE R E S T
The authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data needed for reproducing these conclusions are within this work and Supporting Information.