Accurate Bayesian inference of sex chromosome karyotypes and sex-linked scaffolds from low-depth sequencing data

The identification of sex-linked scaffolds and the genetic sex of individuals, i.e. their sex karyotype, is a fundamental step in population genomic studies. If sex-linked scaffolds are known, single individuals may be sexed based on read counts of next-generation sequencing data. If both sex-linked scaffolds as well as sex karyotypes are unknown, as is often the case for non-model organisms, they have to be jointly inferred. For both cases, current methods rely on arbitrary thresholds, which limits their power for low-depth data. In addition, most current methods are limited to euploid sex karyotypes (XX and XY). Here we develop BeXY, a fully Bayesian method to jointly infer the posterior probabilities for each scaffold to be autosomal, X-or Y-linked and for each individual to be any of the sex karyotypes XX, XY, X0, XXX, XXY, XYY and XXYY. If the sex-linked scaffolds are known, it also identifies autosomal trisomies and estimates the sex karyotype posterior probabilities for single individuals. As we show with downsampling experiments, BeXY has higher power than all existing methods. It accurately infers the sex karyotype of ancient human samples with as few as 20,000 reads and accurately infers sex-linked scaffolds from data sets of just a handful of samples or with highly imbalanced sex ratios, also in the case of low-quality reference assemblies. We illustrate the power of BeXY by applying it to both whole-genome shotgun and target enrichment sequencing data of ancient and modern humans, as well as several non-model organisms.

scaffolds often require special treatment in population genomic analyses, for example for fair filtering on read depth and minor allele frequency, genotype calling and Hardy-Weinberg checks, but also because they bias estimates of genetic diversity such as heterozygosity (Ellegren, 2009), estimates of mutation rates (Ellegren, 2007), genome-wide association studies (Gao et al., 2015) and assessments of population genetic structure (Benestan et al., 2017).If genetic sex and/or sex-linked scaffolds are unknown, they need to be characterized using dedicated methods that can be easily integrated in population genomic analyses.
The mechanisms of sex determination are remarkably variable (The Tree of Sex Consortium et al., 2014), and genetic sex determination through sex chromosomes has evolved independently many times throughout eukaryotes (Graves, 2008).However, two major systems can be outlined.In therian mammals, beetles, many flies and some fish, males are heterogametic (XY) and females are homogametic (XX).In contrast, in birds, snakes, butterflies and some other fish, females are heterogametic (ZW) and males are homogametic (ZZ) (Bachtrog et al., 2014;Graves, 2008;Stöck et al., 2021).
Visual identification of the sex based on morphology or behaviour can be challenging or even impossible (Fairbairn et al., 2007), for example for species without sexual dimorphism (Kocijan et al., 2011), but also for juveniles (Kocijan et al., 2011), ancient or historical samples (Buonasera et al., 2020), forensic samples (e.g.hair, Madel et al., 2016), as well as samples of unknown origin, e.g.environmental samples or faeces (Peppin et al., 2010;Sastre et al., 2009).
Molecular sexing through genetic data is an alternative to sexing from observations as it is accurate, simple, cheap, time-efficient, applicable to any tissue and can often piggyback on other genetic analyses (Buonasera et al., 2020).For example, molecular sexing from next-generation sequencing data exploits the difference in sex chromosome ploidy between males and females: in XY-systems, roughly half as many reads are expected to map to the X-chromosome in males than females, and no reads are expected to map to the Ychromosome in females (Palmer et al., 2019).The same logic applies to ZW-systems.These unique mapping patterns can be used (i) to deduce the genetic sex of individuals (sexing), (ii) to deduce the type (i.e.autosomal or sex-linked) of scaffolds or (iii) to deduce both jointly in case both are unknown.In the following, we describe these three major scenarios.
Current methods to sex individuals assume a reference genome of which the scaffold types are known.A typical case is the sexing of ancient human samples, for which two main methods are currently used: R y and R x .R y (Skoglund et al., 2013) calculates the fraction of reads mapping to the Y-chromosome relative to the total number of reads mapping to both sex chromosomes (X and Y).R x (Mittnik et al., 2016) calculates the average of the fraction of reads mapping to the X chromosome relative to the number of reads mapping to each autosome.Both methods then assign the sex based on thresholds determined on small data sets (30 and 20 samples, respectively).Nonetheless, most current studies (e.g.Allentoft et al., 2015;Furtwängler et al., 2020;Lipson et al., 2022;Liu, Hunter-Anderson, et al., 2022;Margaryan et al., 2020;Reitsema et al., 2022;Scheib et al., 2018) use the same thresholds for sexing without considering differences in sequencing methods, sequencing depth or sample treatment.
The methods R y and R x are also both limited to the euploid karyotypes XY and XX, although sex karyotypes aneuploidies in humans are relatively common with an estimated incidence of 1 in 440 births (Breman & Stankiewicz, 2021).Several studies that used large cohorts detected aneuploid individuals, but only by manually inspecting individuals with ambiguous mapping statistics (Ebenesersdóttir et al., 2018;Rivollat et al., 2020;Villalba-Mouco et al., 2021).To our knowledge, there exists currently only a single method that can identify sex aneuploidy in humans (seGMM, Liu, Zeng, et al., 2022), and it does so based on the mean and standard deviation of normalized mapping counts to X and Y using predefined cutoffs.
In the second scenario, the goal is to infer the scaffold types, assuming that the genetic sex of the individuals is known.Scaffold types are often unknown for non-model organisms with low-quality and fragmented draft genome assemblies due to high cost and challenges associated with complete genome assembly (Ellegren, 2014).Approaches to identify sex-linked scaffolds include synteny-based alignments to closely related species with a chromosomal-level genome assembly (Grabherr et al., 2010) or comparison of the sequencing depth of each scaffold between males and females (Palmer et al., 2019).The tool findZX (Sigeman et al., 2022) implements a pipeline to identify X/Z chromosomes through differences in sequencing depth and heterozygosity.The method discoverY identifies Y-linked scaffolds through differences in sequence similarity and sequencing depth between males and females (Rangavittal et al., 2019).
In the third scenario, both the sex karyotypes of individuals as well as the scaffold types are unknown, as is often case for nonmodel organisms.The method SATC (Nursyifa et al., 2022) infers both sex and sex-linked scaffold in a two step procedure.It first conducts a principal component analysis (PCA) on normalized depth and uses Gaussian mixture clustering to identify males and females.It then uses a t-test to identify sex-linked scaffolds as those for which the normalized depth differs between the two sex groups.However, SATC relies on hard thresholds and cannot identify Y-linked scaffolds.The method SeXY (Cabrera et al., 2022) uses a synteny-based approach to identify sex-linked scaffolds and performs sexing based on the X to autosome depth ratios.While SeXY can perform sexing with a single individual at low depth and was tailored to also work with reference genomes of divergent taxa, its sexing is based on arbitrary thresholds.Other methods (Gautier, 2014;Robledo-Ruiz et al., 2023) can also infer both sex and sex-linked loci but are designed for SNP data sets (e.g.RAD sequencing) and classify each locus individually.
None of the tools allows for aneuploid sex karyotypes.
Here, we present a Bayesian method designed to work in all three scenarios described above.Our method, which we call Bayesian estimation of X and Y, or BeXY for short, jointly infers the type of all scaffolds as well as the sex karyotype for all individuals.In contrast to existing methods, BeXY (i) does not rely on hard thresholds but instead infers probabilities for scaffold types and sex karyotypes, (ii) is flexible in respect of incorporation of existing knowledge on sex-linked scaffolds and sex karyotypes in the model, (iii) can identify both X-and Y-linked scaffolds, (iv) can detect aneuploid sex karyotypes, (v) can detect trisomies for scaffolds known to be autosomal and (vi) retains high accuracy for low-depth and highly fragmented genome assemblies.To illustrate our method, we applied it to wholegenome shotgun and target enrichment sequencing data of ancient and modern humans as well as to data from several non-model organisms.Using simulations, we show that our method outperforms current methods in all scenarios.

| The model
For readability, we will present the model using the XY notation, although the model applies equally well to ZW-systems.
Consider a reference genome consisting of C scaffolds.Let n rc denote the number of sequencing reads of sequencing run r = 1, … , R mapped to scaffold c.These mapping statistics can be obtained easily from indexed BAM-files either with the idxstats command in Samtools (Li et al., 2009) or with the BAMDiagnostics command in ATLAS (Link et al., 2017).Each sequencing run r = (i, m) corresponds to a certain individual i = 1, … , I sequenced with a specific method m = 1, … , M such as, for instance, target enrichment sequencing, whole-genome shotgun sequencing or restriction site associated DNA sequencing (RAD).These methods likely differ in their mapping characteristics and are thus modelled with their own set of parameters.Individuals may have been sequenced with multiple runs and potentially different sequencing methods and the model described here allows to make use of all that data jointly.
We model the vector of counts n r = n r1 , … , n rC of sequencing run r using a Dirichlet-Multinomial (DM) distribution (Johnson et al., 1997) as Here, N r = ∑ c n rc is the total number of mapped reads of run r and the parameter vector (r) = 1 (r), … , C (r) models both the expected number and the variance of reads mapping to each scaffold.
We model n rc of run r = (i, m) using two major components: the ploidy p ic of scaffold c in individual i , and some method-and scaffoldspecific mapping attractor mc .These mapping attractors may differ among scaffolds due to, for instance, scaffold length or mappability as more reads are expected to map to longer scaffolds or to scaffolds with fewer repetitive regions.We model c (r) as Here, r is an run-specific parameter that allows for noise due to erroneously mapped reads and m is a positive scaling parameter that models the variance of the Dirichlet-Multinomial distribution.To avoid non-identifiability issues with m , we impose the constraint that ∑ C c=1 mc = 1.

| Ploidy
The ploidy p ic depends on two factors: the sex karyotype s i of individual i and the scaffold type, i.e. whether a scaffold is autosomal, X or Ylinked.We distinguish seven sex karyotypes, which correspond to the most frequent sex karyotypes observed in humans: the heterogametic sex (s i = XY), the homogametic sex (s i = XX), the monosomy s i = X0 as well as the three trisomies s i = XXY, s i = XYY and s i = XXX and the tetrasomy s i = XXYY.We will denote by A = {X0, XXY, XYY, XXX, XXYY} the set of considered aneuploid sex karyotypes.
In theory, the ploidy is given by the number of copies of each scaffold, for example p ic = 2 for all autosomal scaffolds or p ic = 1 for both X and Y-linked scaffolds of an individual with sex karyotype XY.However, in low-quality reference genomes, the ploidy ratio between the sex karyotypes often differs from the canonical cases due to noise (Nursyifa et al., 2022), for instance for a misassembled scaffold that consists of both autosomal and gonosomal sequences.
To account for this, we here model the ploidy using a continuous, scaffold-specific parameter c ∈ 0, 1 such that c = 0 corresponds to a canonical Y, c = 1 2 to a canonical autosome, c = 1 to a canonical X and intermediate values to deviations from these.The intermediate values are modelled using linear functions connecting the canonical cases as given in Table 1 and illustrated in Figure 1.

| Distribution of karyotypes
We assume that the distribution of sex karyotypes s i in a sample are described well by the frequency a of aneuploids and the frequency f of XX to XY karyotypes among euploids, such that where v = v X0 , … , v XXYY correspond to the relative frequencies of aneuploid karyotypes, with ∑ s∈A v s = 1.By default, we set v to the relative frequencies observed in humans (Skuse et al., 2018).

| Autosomal trisomies
It is straightforward to extend the BeXY model to account for autosomal trisomies.We introduce a binary parameter z ic that indicates whether individual i at scaffold c is diploid (z ic = 0) or triploid (z ic = 1 ).
The ploidies p ic for both cases are given in Table 1, where z ic = 0 results in a ploidy p ic = 2 and z ic = 1 results in a ploidy p ic = 3 at the canonical autosomes with c = 1 2 .To avoid the need for model choice, we allow z ic = 1 only for scaffolds known to be autosomal (i.e. with c ≈ 1 2 ).Since autosomal trisomies are lethal for most scaffolds, the user further needs to specify for which scaffolds trisomies are to be considered.
We assume z ic ∼ Bernouilli( ) and set = 1 1000 by default, which matches the frequency of trisomy 21 in humans.

| Bayesian inference
We use a Markov chain Monte Carlo (MCMC) scheme to generate samples from the posterior ℙ(s, , , , , z| N), where N denotes the R × C matrix of read counts with entries N rc = n rc .We update z ic and t c using Gibbs sampling and all other parameter using standard Metropolis-Hastings updates.See Figure S1 for a Directed Acyclic Graph (DAG) illustrating the full model and the Data S1 for a description of all prior distributions and MCMC updates used.

| Implementation
BeXY is a command-line tool implemented in C++ making use of the MCMC framework of the statistical library stattools (bitbu cket.org/ wegma nnlab/ statt ools).It implements three tasks: infer, inferScaffolds and sex.The task infer is used to infer sex karyotypes and scaffold types jointly, the task inferScaffolds is used to infer scaffold types if sex karyotypes are known, and the task sex is used to infer sex karyotypes, i.e.only the individual-and run-specific parameters s i and r , while using estimates of , m and m previously learned from a larger data set.BeXY reads mapping statistics obtained with either the idxstats command in Samtools (Li et al., 2009) or with the BAMDiagnostics command in ATLAS (Link et al., 2017).It is open-source and freely available through the git-repository at bitbu cket.org/ wegma nnlab/ bexy, along with a wiki detailing its usage and a custom Rpackage called bexy to visualize the results of BeXY and to classify individuals based on custom thresholds on the posterior probabilities of the sex karyotypes.

F I G U R E 1
Ploidy as a function of c for all sex karyotypes, obtained by following the line from the number of Y copies to the number of X copies of that karyotype.The red line exemplifies the ploidy function for the karyotype XXY.

Sex karyotype s
TA B L E 1 Ploidies p ic as a function of ploidy parameter c for all sex karyotypes.The first two columns with z ic = 0 are for the autosomal euploid case, while the latter columns with z ic = 1 are for the autosomal triploid case.

| Modern humans
We downloaded publicly available CRAM-files of 276 modern humans from the Simons Genome Diversity Project (SGDP, Mallick et al., 2016).Of these, 260 samples were sequenced using a PCRfree library preparation and 16 samples were sequenced using a PCR-based library preparation (Mallick et al., 2016).We used the idxstats command in Samtools to generate the input needed for BeXY.

| Non-model organisms
We ran BeXY on the six WGS data sets of varying assembly quality provided by Nursyifa et al. (2022): impala (Aepyceros melampus), muskox (Ovibos moschatus), waterbuck (Kobus ellipsiprymnus), grey whale (Eschrichtius robustus), leopard (Panthera pardus) and the Darwin's finches species complex encompassing 15 species.We used the idxstats files provided by Nursyifa et al. (2022) and ran BeXY and SATC on these data sets.We used the default filters of SATC for both BeXY and SATC, i.e. we removed scaffolds with <100 kb length and a normalized depth outside the range (0.3, 2).

| Downsampling experiments
The performance of BeXY likely depends on the sequencing depth of the individuals and, in case scaffold types are inferred, also on the sample size, the sex ratio among the samples, and the quality of the reference genome assembly.We tested the limits of BeXY and competing methods with respect to each of these challenging factors using dedicated downsampling experiments based on a subset of the ancient WGS data set consisting of all individuals with a depth > 2 × (n = 116) that were also all identified as euploid on the full data sets.

| Sexing individuals: euploids
To evaluate the robustness of sexing individuals with respect to lowdepth data of euploid individuals, we downsampled (with replacement) the read counts of all individuals in the subset to various depth ranging from 200,000 to 100 reads per individual.We ran BeXY with the task sex on 100 replicates of downsampled counts, and set the prior parameters on s to the values expected for a human data set, f = 1 2 and a = 1 440 (Breman & Stankiewicz, 2021).We used parameters , m and m inferred from a single run on the full data of all n = 116 individuals and considered all classifications with state posterior probabilities ℙ s i | n r > 0.9 as confident.We then compared the sex inferred from the downsampled data set with that inferred from the full data set.We also ran the two competing sexing methods R x (Mittnik et al., 2016) and R y (Skoglund et al., 2013) on the same data set using default parameters and counted the classifications "Sex assignment: The sample could not be assigned" (for R x ) as well as "Not Assigned" (R y ) as uncertain and all others as confident.

| Sexing individuals: aneuploids
To investigate the power of the task sex to detect aneuploid individuals at various depths, we simulated aneuploid individuals based on the samples in the subset.We used individuals with karyotype XY to simulate samples with karyotypes XXY, XYY and XXYY and individuals with karyotype XX to simulate samples with karyotypes X0 and XXX as follows: Let f x and f y denote the relative change in ploidies between the karyotype to simulate and the karyotype of the individual serving as template (e.g.f X = 2, f Y = 2 to simulate the karyotype XXYY from an XY individual, or f X = 0.5, f Y = 1 to simulate a karyotype X0 from an XX individual).We then sampled reads (with replacement) from the vector n r according to the probabilities Using this procedure, we simulated samples for various depth ranging from 200,000 down to 100 reads per sample.Using the same parameters , m and m as for euploids, we ran BeXY with the task sex on 100 replicates of downsampled counts and identified confident classifications as described above, both on a per-karyotype basis as well as for euploid and aneuploid karyotype classes, ensuring that all karyotype were equally frequent within a class.Since the prior on the number of aneuploid samples, a, has a large impact on the classification accuracy of euploid and aneuploid individuals, we ran BeXY with both a = 1 440 and a = 1 2 .We ran the competing sexing method seGMM (Liu, Zeng, et al., 2022) on the same data set.Since the available implementation of seGMM requires a VCF file to perform the clustering of the samples for subsequent sexing, we re-implemented their approach and calculated the relevant statistics needed for sexing (the mean and standard deviation of normalized read counts for the X and Y chromosome for XX and XY samples) based on the full counts.While re-implementing, we also fixed an obvious bug in their code that led to a misclassification of XXX individuals (XXX karyotypes are expected to have 1.5 times the number of reads mapping to the Xchromosome than XX karyotypes, not twice as many).We counted all classifications that could not be assigned as uncertain and all others as confident.

| Identifying autosomal trisomies
To investigate the power to detect autosomal trisomies at various depths, we simulated individuals with trisomy 21 based on the samples in the subset.We sampled reads (with replacement) from the vector n r as describes above and according to the probabilities Using this procedure, we simulated samples with trisomy 21 for various depth ranging from 200,000 down to 100 reads per sample.
We ran BeXY with the task sex on 100 replicates of these downsampled counts, setting --allowTrisomyForAutosomes 21 to allow for trisomies on chromosome 21 and using the same parameters , m and m used above to sex individuals.To test for false positive classifications, we additionally downsampled counts without simulating any trisomies as described above and analysed them with BeXY.We then considered all classifications with state posterior probabilities ℙ z ic | n r > 0.9 as confident.Since the prior on the number of samples with autosomal trisomy, , has a large impact on the classification accuracy, we ran BeXY with both = 1 1000 and = 1 10 .

| Inferring scaffold types
We also investigated the power to infer sex karyotypes scaffold types jointly, using the same downsampled counts as for the first downsampling experiment described above (only euploids).We ran BeXY with the task infer and identified confident classifications as described above.We ran the competing method SATC (Nursyifa et al., 2022) on the same data set.For comparability with BeXY, we turned off the normalized depth range filter by setting it to (0, 100) in SATC.SATC does not provide an estimate of classification uncertainty and we assumed all classifications as confident.But we note that SATC throws an error if no good candidates for sex scaffolds were found through clustering or when one of the inferred sex groups only had one member.We counted these cases as uncertain.
To evaluate the robustness of our method to small data sets or those with large sex ratio imbalances when jointly inferring sex karyotypes and scaffold types, we first downsampled all individuals in the subset to 20 K reads and then randomly compiled data sets with various XY:XX proportions differing in size and imbalance.We ran BeXY and SATC as described above.
To evaluate the robustness of our method with respect to low-quality reference genome assemblies, we simulated a lowquality reference genome based on the human reference genome GRCh38.To match the roughly exponential distribution of empirical low-quality reference genomes (e.g.waterbuck from Nursyifa et al., 2022), we defined length categories of 10 6 , 5 ⋅ 10 5 , 2 ⋅ 10 5 , 10 5 , 5 ⋅ 10 4 , 2 ⋅ 10 4 and 10 4 and split the chromosomes into scaffolds of these lengths.To limit mapping issues, we excluded the telomeres (15 kb at each end of each chromosome), the pseudo-autosomal regions (PAR) on the X and Y chromosome as well as the centromeres of all chromosomes (all downloaded from ncbi.nlm.nih.gov/ grc/ human ), with an additional buffer of 100 kb around each of these regions.After splitting, we further excluded all resulting scaffolds that overlapped by more than 90% with gap regions consisting of N (hgdow nload.soe.ucsc.edu/ goldenPath/hg38/database/gap.txt.gz,Nassar et al., 2023) as well as all scaffolds that overlapped by more than 90% with regions listed in the ENCODE Blacklist (Amemiya et al., 2019).We then re-mapped the sequencing data of all samples against this artificially created low-quality reference genome.To reduce computational time, we downsampled each fastq file to 10 millions reads using the sample option in seqtk v1.4 (github.com/ lh3/ seqtk ) prior to mapping.We then downsampled read counts to 1,000,000 reads per sample with replacement for 100 replicates and ran BeXY with the task infer and SATC.Because the performance of SATC turned out to be rather poor when no depth filter was used, we ran it only on scaffolds that had a normalized depth in the interval (0.3, 2), which corresponds to their default filter (Nursyifa et al., 2022).For BeXY, we only excluded scaffolds that had zero counts for all samples, as these are not identifiable.We considered all scaffold type classifications by BeXY as confident if their state posterior probabilities ℙ t c | n r > 0.9.SATC does not provide an estimate of uncertainty in classifying scaffold types and we assumed all classifications as confident.But as above, we considered all classifications as uncertain if SATC threw an error.
We counted all scaffolds that were removed by the filter as a uncertain classification.
n rc otherwise .

| Sexing for low-depth data
We first investigated the robustness of BeXY to low-depth sequencing data of euploid individuals and compared its performance to that of R x and R y .As shown in Figure 2a, BeXY correctly assigned the sex karyotype with high confidence to >97% of all samples even at only 1000 reads per sample and to 24% at 100 reads per sample, with a low false discovery rate (FDR) of around 3% at very low depth.The second most powerful methods was R x , which confidently classified a similar fraction of all samples correctly, albeit with a much elevated rate of misclassifications, which results in a FDR of up to 26% at 100 reads per sample.In contrast, R y confidently classified much fewer samples correctly at an even much higher FDR of up to 55%.
We then evaluated the power of BeXY to detect aneuploid sex karyotypes at low-depth data, and compared it to the competing method seGMM.We first evaluated the power of BeXY using a relaxed prior on the number of aneuploid individuals of a = 1 2 .As shown in Figure 2b, BeXY correctly assigned both euploid and aneuploid sex karyotypes with high confidence to >90% of all samples at 20,000 reads.We note that the classification accuracy of the euploid karyotypes XY and XX is lower than in Figure 2a, because the prior does not favour euploid karyotypes anymore at noisy counts.
seGMM correctly assigned >80% of euploid samples and only 47% of aneuploid samples at 20,000 reads.For lower read counts, seGMM has a higher power, but also a much elevated FDR of up to 80% for euploid samples and 50% for aneuploid samples.
In Figure 2c,d  When sexing individuals at low depth, the choice of the prior probability on an individual to be aneuploid may have a considerable impact on the performance of BeXY.To investigate this we repeated the above inference also with a stringent prior of a = 1 440 , which corresponds to the frequency of aneuploid sex karyotypes in humans and is strongly favouring euploid karyotypes.As a consequence, and in comparison to the case of a = 1 2 , euploid karyotypes required fewer reads (about 1000) to be classified correctly, while aneuploid karyotypes were wrongly classified as euploids (X0 and XXX as XX and XXY, XYY and XXYY as XY) at low depth (Figure S2).

| Autosomal trisomies for low-depth data
We evaluated the power of BeXY to detect autosomal trisomies from low-depth data.As shown in Figure 3, BeXY correctly identifies trisomy 21 with high confidence to >96% of all samples at 20,000 reads per sample and to 47% at 5000 reads per sample when using a prior of = 1 10 .At lower depths, the rate of misclassification increased and samples with trisomy were classified as euploid.BeXY correctly assigned autosomal euploidy with a high confidence to 97% of all samples at 10,000 reads per sample and to 57% at 100 reads per sample.Euploid samples were almost never misclassified as triploid, instead, the uncertainty increased at low depth.
When inferring autosomal trisomies at low depth, the choice of the prior probability on an individual to be triploid may have a considerable impact on the performance of BeXY.To investigate this we repeated the above inference also with a more stringent prior of and were wrongly classified as euploid at lower depth (Figure S3).

| Joint inference for low-depth data
We next compared the performance of BeXY to SATC in inferring sex karyotypes jointly with scaffold types.We first investigated the impact of sequencing depth using the full subset of 116 samples.If samples had >1000 reads, both methods performed well and correctly assigned the sex karyotype with high confidence to >99% and >95% of all samples respectively (Figure 4a).With lower depth, both methods lose power, but differed in that BeXY rarely confidently misclassified individuals while median misclassification rate for SATC was up to almost 40%.
We next investigated the impact of small sample sizes, imbalanced sex ratios, or both.In 98% and 72% of all simulations conducted, all samples were correctly classified by BeXY and SATC, respectively.In 27% and 87% of all cases with misclassifications, BeXY and SATC misclassified all samples to the opposite sex karyotype, suggesting that the challenge at low sample number or with imbalanced sex ratios lies in correctly identifying sex-

| Low-quality reference genome assemblies
We investigated the power of BeXY to identify scaffold types in the case of low-quality reference genome assemblies.The simulated reference genome consisted of 10,574 scaffolds ranging from 10 6 to 10 4 bases in length.As shown in Figure 4c, the median classification accuracy of BeXY was >99% for all larger scaffolds, but dropped slightly to about 93% and 89% for the smallest X-linked and Y-linked scaffolds simulated, respectively.SATC had similar power to identify autosomes (median of >98% for all scaffold lengths), although it interestingly misclassified a few long scaffolds as abnormally sex-linked in some replicates.While SATC does not classify Y-linked scaffolds, the median classification accuracy for X-linked scaffolds was around 75% for long scaffolds and dropped to 20% for short scaffolds of 10kb, albeit high variability between replicates.

| Application to ancient human data set
We inferred sex karyotypes of 954 ancient human samples sequenced with whole-genome shotgun sequencing with the task infer.As visualized in Figure 5a,c, all individuals were classified with high confidence as XY (n = 608) or XX (n = 345) except YGS-B2, which was classified as XXY with 100% posterior probability, in line with earlier reports (Ebenesersdóttir et al., 2018).We did not detect any sample with trisomy 21 in this data set.
We next run the task infer on 2457 ancient human samples sequenced with the 1240 k capture target enrichment technique.As visualized in Figure 5b,d, we identified 1141 samples as XX and 1303 samples as XY with high confidence (posterior probability for the given karyotype larger than 90%).We confirmed the karyotype of all three samples that were previously classified as aneuploid: ALM062.(XXY, 99%), and I18426 (XXX, 90.5%).For two samples, the karyotype remains unclear: I12437 had a posterior probability of 25.5% to be XYY and 74.5% to be XY and I20754 had a posterior probability of 34.6% to be XX and 65.4% to be XY.This last sample had the lowest read count of all samples (6983 reads) and was previously reported as being likely contaminated (Harney et al., 2021), which may explain its uncertain sex karyotype assignment.Finally, we identified a single sample, I19991, of having trisomy 21 with high confidence (99.5%), while all other samples were confidently classified has diploid on chromosome 21 (>99%).
To investigate the effect of the prior choice on aneuploid sex karyotypes, we re-analysed all individuals with the task sex and different values of a (Figure S4).As expected, the number of individuals classified as aneuploid increases when using less stringent priors.When using the extreme case of a = 1 2 , the maximum a posteriori estimate was the same sex karyotype for 99.7% and 99.6% of all samples as was obtained with the task infer above for the whole-genome and the 1240 k capture data set, with an additional three and ten individuals for which an aneuploid sex karyotypes was inferred with high confidence (posterior probability >90%), respectively.However, many of these are likely false positives, suggesting that a more stringent prior should be used.When using a = 1 440 that matches the expected frequency of aneuploid sex karyotypes in humans, the same individuals were classified as aneuploid as when using the task infer.Diversity Project (SGDP, Mallick et al., 2016).We first used BeXY to sex each sample using the parameters , m and m inferred on the full set of 954 ancient WGS samples used above.In that case, all but one sample were identified as aneuploid with high confidence (posterior probability >90%, Figure S5A,B) with samples reported as females inferred as XXY and samples reported as males as XYY.
This seems to be a consequence of the much higher fraction of reads of SGDP samples mapping to the Y-chromosome (around 1.0% for males and 0.15% for females) than of the ancient samples (0.25% for males and almost 0.0% for females, see Figures 5c and S5A).
A major difference between the ATLAS-pipeline we used to process the ancient samples and the pipeline used by the SGDP is that the former filters out reads with low mapping quality (<30) while was classified as XX in the original publication (Mallick et al., 2016), our classification as X0 is in line with its heterozygosity on the X chromosome, which was the lowest across all samples (including males).As this sample was sequenced based on a cell line, it is possible that this is an artefact that arose during cell line culturing (e.g.Bergström et al., 2020).

| Application to non-model organisms
We ran BeXY on the six mammal and bird WGS data sets provided by

| DISCUSS ION
With the rapid growth of population genomics studies for nonmodel organisms and ancient samples, there is a need for accurate sex karyotype and scaffold type assignment for low-depth samples.
We here present BeXY, a Bayesian method that jointly infers sex karyotypes and identifies sex-linked scaffolds from mapping statistics.Our method is flexible with respect to existing knowledge (e.g.scaffold types), retains high accuracy for low-depth samples, is robust to small data sets or those with highly imbalanced sex ratios, is capable of identifying autosomal trisomies, and outperforms all existing methods in the field.
One use case of BeXY are non-model organisms, for which both the genetic sex of individuals as well as scaffold types are unknown.
BeXY only requires mapping statistics, namely the number of reads mapping to each scaffold as well as scaffold lengths, and does not require any prior knowledge on sex karyotypes or sex-linked scaffolds.Importantly, BeXY introduces a continuous parametrization of the expected ploidy.This allows for the accommodation of noisy scaffolds showing aberrant ploidy ratios due to mapping issues or mistakes in the assembly, which was previously shown to complicate the identification of sex-linked scaffolds (Nursyifa et al., 2022).
Using simulations, we showed that BeXY has much higher power to accurately infer scaffold types than the competing methods SATC (Nursyifa et al., 2022), particularly for data sets with only few individuals or highly imbalanced sex ratios, and also for low-depth samples (1000 reads per sample).In contrast to SATC, BeXY also accurately identifies Y-linked scaffolds, which SATC puts in the same class as those with aberrant ploidy ratios.
The second major use case of BeXY is the genetic sexing of (single) individuals, for example for ancient humans.Genetically sexing individuals with BeXY has several advantages over existing methods such as R x , R y , SATC and seGMM.First, BeXY is a Bayesian method that calculates the posterior probabilities for each sex karyotype, whereas all existing methods that do sex assignment from read count data are based on hard thresholds.As we showed with simulations, the use of hard thresholds is problematic in the case low-depth samples, which were often un-or misclassified.Second, BeXY accurately identifies aneuploid sex karyotypes, which are expected to be found in most large data sets.There exists currently only a single method, seGMM, that identifies aneuploid karyotypes from read counts, and as we showed with simulations, BeXY has considerably higher power to accurately infer such karyotypes, only requiring about 20,000 reads per individual in the case of ancient human samples.
To sex an individual, BeXY only requires mapping statistics of the individual, along with estimates of parameters , m and m that have been inferred on a larger data set.It is important that these parameters were inferred not only from a data set of the same species, but also from data produced with the same sequencing method, as different methods may differ in their per-scaffold expected read counts.In the case of ancient human samples, for instance, wholegenome shotgun and target enrichment capture data differ considerably in their distribution of reads across chromosomes, as the former is mainly influenced by the length of each chromosome, and the latter mainly by the number of sites targeted per chromosome.
However, and as our analyses of modern human samples indicate, more subtle differences in both the preparation of sequencing libraries (e.g.PCR-based vs. PCR-free) as well as bioinformatic pipelines (e.g.mapping quality filter) may lead to considerable differences in mapping statistics and hence parameters learned on one data set may not be transferable to another data set easily and lead to errorprone inference of sex karyotypes.To avoid such issues, we thus recommend to use the task infer to learn all relevant parameters if the number of samples permits, i.e. if at least 5-10 samples are at hand.We assume this to be the case for most applications.
Since BeXY is a Bayesian method, the choice of the prior may have a big influence if data is limited.That is particularly true for a, the expected fraction of individuals with aneuploid sex karyotypes in the sample.Our simulations suggest that if a small value of a was used, such as the known frequency of humans with aneuploid sex karyotype a = 1 440 , individuals with euploid sex karyotypes should be accurately sexed with as few as 1000 reads, but individuals with aneuploid sex karyotypes would often be confidently misclassified as euploids as this little data can not overcome the prior.If a larger value of a = 1 2 was used, our simulations indicate that both euploids as well as aneuploids individuals should hardly ever be misclassified, but that about 20,000 reads per sample are required for confident classification.The analysis of real data, however, suggested that more reads may be required in practice as we inferred aneuploid sex karyotypes for several ancient human samples with a = 1 2 but not with a = 1 440 .Ultimately, the choice of prior is thus a question of sensitivity versus specificity: If the goal is to ensure that all aneuploid samples have been identified, e.g. to flag them for downstream analysis, a relaxed prior of a = 1 2 is advised.If, however, the goal is to infer aneuploids with a low false positive rate, a more stringent prior is advised.

BeXY
classifies each scaffold into one of four categories: the three canonical cases autosomal (t c = A), X-linked (t c = X), Y-linked (t c = Y), as well as the fourth case "different" (t c = D) for ploidies that differ from the canonical cases.To achieve such a classification, we model c through a mixture of four Beta distributions.For autosomes, we use a symmetrical Beta distribution c | t c = A ∼ Beta( , ), which has an expected value c = 1 2 .For X-linked scaffolds, we use a left-skewed distribution c | t c = X ∼ Beta( , 1) with  < 1, which has an expected value close to one.For Y-linked scaffolds, analogously, we use a right-skewed distribution c | t c = Y ∼ Beta(1, ) with  < 1, which has an expected value close to zero.Finally, for non-canonical scaffolds of type t c = D we use an uninformative Beta distribution c | t c = D ∼ Beta(1, 1) (see the Data S1 for a more detailed description of the parametrization).
we show the performance for each karyotype individually.As shown in Figure 2c, the karyotypes that contain a Y chromosome (XY, XXY, XYY and XXYY) require around 20,000 reads to be classified with an accuracy of >80% by BeXY, while the karyotypes that contain only X chromosomes (X0, XX and XXX) require only 2000 reads to obtain the same accuracy.This is explained by the short length of the Y chromosome, which results in small and rather noisy counts.As shown in Figure 2d, classification accuracy of seGMM was very variable across karyotypes, with the euploid karyotypes and XXY having a classification accuracy similar to that of BeXY and the karyotypes X0, XXX and XYY having low classification accuracy regardless of depth.At low depth, many karyotypes suffered from high misclassification rates, likely because the hard thresholds become unreliable.
is strongly favouring euploid autosomes.In this case, F I G U R E 2 Power to sex individuals.Shown are the median (line with symbols) and 98% confidence interval (shaded area) of the fraction of individuals classified as confidently correct (solid line) and confidently incorrect (dashed line) across 100 downsampling replicates.(a) The performance of BeXY compared to R x and R y on euploid karyotypes.(b) The power of BeXY compared to seGMM to classify both euploid and aneuploid karyotypes using a relaxed prior of a = 1 2 .(c) The same results of BeXY as in b, but plotted by karyotype.(d) The same results of seGMM as in b, but plotted by karyotype.almost always correctly classified, even at a depth of only 100 reads per sample, while samples with trisomy 21 required 50,000 reads to be classified correctly in most cases (98%) scaffolds.As shown in Figure 4b, the fraction of replicates in which all individuals were correctly classified was generally higher for BeXY than SATC, especially for data set with imbalanced sex ratios.If a sex karyotype was represented by only two individuals, for instance, SATC classified all samples correctly in less than 20% of the replicates.In comparison, BeXY still classified all samples correctly in 99% of the replicates under these conditions.Notably, BeXY also classified all samples correctly in more than 87% of the replicates if a sex karyotype was represented by a single individual -a case in which SATC does not attempt any classification.F I G U R E 3 Power to infer autosomal trisomies.The median (line with symbols) and 98% confidence interval (shaded area) of the fraction of individuals classified as confidently correct (solid line) and confidently incorrect (dashed line) as a function of sequencing depth across 100 downsampling replicates when the prior = 1 10 .The red line corresponds to the classification of individuals with simulated trisomy 21, while the blue line corresponds to the classification of euploid individuals.F I G U R E 4 Power to infer sex karyotypes of individuals and scaffold types jointly.(a) The median (line with symbols) and 98% confidence interval (shaded area) of the fraction of individuals classified as confidently correct (solid line) and confidently incorrect (dashed line) as a function of sequencing depth across 100 downsampling replicates for both BeXY and SATC.(b) The fraction of replicates where the sex karyotype of all individuals was correctly assigned for small (left) and imbalanced (middle and right) data sets.Note that SATC requires at least two individuals per karyotype and could not be evaluated for the data sets marked with an asterisk.(c) The power of BeXY and SATC to classify scaffold types for low-quality reference genomes.Shown are the median (line with symbols) and 98% confidence interval (shaded area) of the fraction of scaffolds classified as confidently correct as a function of scaffold length across 100 replicates for each scaffold type.Note that SATC does not classify Y-linked scaffolds.

F
Sex karyotype classification of ancient samples.(a, b) Ternary plot showing the posterior probabilities for each sample to have an XY, XX or aneuploid sex karyotype for the ancient WGS (a) and 1240 k target enrichment capture (b) data sets.The colour of the points refers to the posterior mode of the sex karyotype.The per karyotype posteriors of all samples identified as aneuploid or with uncertain sex karyotypes are shown shaded within a 7-cell matrix of the x-axis corresponds to having one, two or three copies of the X-chromosome and the y-axis corresponds to having zero, one or two copies of the Y-chromosome.The samples CLL011.A0101.TF1.1 and ALM062.A0101.TF1.1 were abbreviated to CLL011 and ALM062, respectively.(c, d) Scatter plot for the ancient WGS (c) and 1240 k target enrichment capture (d) data sets showing the percentage of reads mapped to Y against the percentage of reads mapped to X, with the same individuals highlighted as in a and b.

|
Application to modern human data set Many factors affect mapping statistics, including how sequencing libraries were generated as well as bioinformatic choices during mapping and filtering of reads.The accuracy of BeXY to sex an individual is therefore likely reduced if the parameters , m and m were inferred on a data set of different characteristics.To explore the sensitivity of BeXY to such deviations, we ran BeXY on publicly available CRAM-files for 276 modern humans of the Simons Genome the latter does not.Applying the same mapping quality filter on the SGDP CRAM-files leads to a noticeable reduction of reads mapping to the Y chromosome (FigureS5C).When sexing individuals with BeXY on that data set, only one individual was confidently classified as having an aneuploid sex karyotype (FigureS5C,D).However, most individuals had very uncertain sex karyotype classifications, suggesting that applying the additional mapping quality filter was not sufficient to correct for the mismatch between the two pipelines.We next re-estimated all parameters for the SGDP data by running the task infer of BeXY on the unfiltered CRAM files.That run resulted in most samples being confidently classified to have euploid sex karyotypes (FigureS5E,F) and only four samples were confidently classified to have aneuploid sex karyotypes: HGDP00725 (X0, 100%), HGDP01036 (XYY, 100%) HGDP01286 (XYY, 100%) and HGDP00546 (XYY, 95.8%).However, SGDP samples were generated with two different sequencing libraries and visual inspection showed that mapping statistics differed noticeably between these groups (FigureS5G).Specifically, the PCR-based samples generally had more reads mapping to the Y-chromosome, explaining why BeXY classified some of the XY individuals as having an extra Ychromosome (XYY).We therefore re-ran the task infer of BeXY while specifying the two library preparation protocols as distinct sequencing types (FigureS5G,H).In this case, no sample was classified has having trisomy 21 and only a single sample was classified confidently as aneuploid: HGDP00725 (X0, 100%).Although this sample Nursyifa et al. (2022)and compared the sex and scaffold-type assignment to that obtained with SATC.For all data sets, the two methods did not differ in the sex karyotype classifications.Both methods also mostly agreed in their scaffold-type assignments and classified on