Additional evaluations show that specific BWA‐aln settings still outperform BWA‐mem for ancient DNA data alignment

Xu et al. (2021) recently recommended a new parameterization of BWA‐mem as a superior alternative to the widely‐used BWA‐aln algorithm to map ancient DNA sequencing data. Here, we compare the BWA‐mem parameterization recommended by Xu et al. with the best‐performing alignment methods determined in the recent benchmarks of Oliva and colleagues (2021), demonstrating that BWA‐aln is still the gold‐standard for ancient DNA read alignment .

default parameters.
Here, we use the alignment performance metrics from Oliva et al. to directly compare the recommended BWA-mem parameterization reported in Xu et al. with the best performing alignment methods determined in the Oliva et al. benchmarks, and we make recommendations based on the results.

| ME THODS
We investigated the alignment performance of the parameterization recommended by Xu et al. (2021), that is, -k 19 and -r 2.5 (hereafter called BWA9) against several of the best performing strategies identified in Oliva et al. (namely,BWA1,BWA2,BWA8,Novo1IUPAC,Novo2IUPAC, and Novo2, see Table 1 for parameter settings).
Following the analytical framework of Oliva et al., our benchmark is based on simulated reads (including fragmentation, damage, and sequencing errors typical for ancient DNA samples; see (Oliva et al., 2021)) that were generated for each of the following three samples from the 1000 Genome Project dataset (1000Genomes Project Consortium et al., 2015, each coming from a distinct population, and were aligned to reference genome GRCh37: • Individual HG00119 from the British in England and Scotland population; GBR; labeled Europe in this study.
• Individual NA19471 from the Luhya population in Webuye, Kenya; LWK; labeled Africa in this study.
• Individual HG00513 from the Han Chinese population in Beijing, China; CHB; labeled East Asia in this study.
In addition to quantifying read alignment precision (i.e., the proportion of correctly aligned reads relative to all aligned reads) and proportion of aligned reads (i.e., the fraction of aligned reads relative to the total number of simulated reads) for each strategy, we tested the specificity (i.e., the fraction of unmapped reads) of these strategies for two sets of potential contaminants-that is, bacterial and dog reads-that were also used in Oliva et al. (2021).

| RE SULTS
BWA9 had a slight improvement in the proportion of total reads aligned relative to BWA-mem using default settings (BWA8), but this came at the cost of consistently lower precision (Figure 1, Figures   A1 and A2). These precision differences are particularly accentuated for reads between 30 and 60bp, the range of read lengths that is This is an open access article under the terms of the Creat ive Commo ns Attri bution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.  (Oliva et al., 2021), BWAaln (BWA1 and BWA2) is the most precise alignment method among the tested strategies, having moderately higher precision relative to BWA-mem for shorter reads while mapping a much higher percentage of reads overall (Oliva et al., 2021;van der Valk et al., 2021).
When comparing specificity against potential contaminants, BWA9 has a near-identical specificity to the default BWA-mem parameterization (BWA8) for dog reads, and slightly poorer specificity when testing with bacterial reads, but both parameterizations perform considerably worse than the tested NovoAlign (Novo1IUPAC, Novo2IUPAC, and Novo2) and BWA-aln (BWA1 and BWA2) strategies for dog reads ( Figure 2).
Finally, comparing running times of the two BWA-mem parameterizations for each of the three simulated human datasets showed that BWA9 is slightly quicker than BWA8 (Figure 3), confirming the results of ref. (Xu et al., 2021). Xu et al. (2021) report that BWA-mem produces alignment results that are comparable to a derivation of BWA-aln widely used in the ancient DNA field. Consequently, they recommend the use of a specific non-default BWA-mem parameterization for ancient DNA studies because of its superior runtime relative to BWA-aln. However, we find that this parameterization actually decreases alignment precision relative to BWA-mem using default settings for sequencing reads shorter than 70 bases, which are particularly abundant in ancient DNA samples. Moreover, BWA-mem is consistently out- (2021) we find that BWA-aln maintains a small but consistent advantage over BWA-mem across different levels of contamination for both tested alignment metrics (see Figure A3)-a result that is consistent with the findings in the present study using complementary metrics. Indeed, the lack of statistical support for the difference between the two alignment algorithms most likely results from the effect size being small relative to the variance observed across the tested replicates used in the Xu et al. study (see Figure 2 in Xu et al., 2021), leading to insufficient power to detect these differences.

| CON CLUS ION
Taken together, our results indicate that the BWA-aln strategies tested here provide a small but consistent improvement over the BWA-mem parameterization recommended by Xu et al. (2021) for simulated aDNA read datasets when evaluated using the complimentary sets of metrics employed in Xu et al. (2021) and the present study. Importantly, while the differences between the two alignment methods are relatively small (on the order of 0.1-0.5%; see Figure 1), they are sufficient to inflate reference bias in downstream analyses that can negatively impact inferences (Oliva et al., 2021).
Accordingly, despite having improved run times, we do not recommend that BWA-mem be prioritized over BWA-aln for research using short reads-such as ancient DNA, cell-free DNA, and forensic research fields. If run time is an issue for researchers, we recommend the use of NovoAlign using the free default parameterization, so long as an appropriate IUPAC reference can be generated. Readers interested in a more detailed discussion of these issues are directed to refs. (Oliva et al., 2021;Poullet & Orlando, 2020;Schubert et al., 2012;van der Valk et al., 2021) for recent benchmarks of different alignment strategies using short reads.

F I G U R E 2
Specificity of all tested alignment methods. Bacterial and dog reads were aligned to the GRCh37 reference using the seven tested parameterizations of four different alignment software, including an IUPAC reference-based alignment for a subset of the NovoAlign parameterizations (see key). The specificity corresponds to the number of unmapped reads, with higher values being better

ACK N OWLED G M ENTS
We thank Stefan Prost and an anonymous reviewer for their feedback and valuable comments. We thank the Novocraft Technologies team for providing access to their proprietary NovoAlign.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The scripts used to create the used datasets in this study are available in the github repository at: https://github.com/Adrie nOliv a/ Bench mark-aDNA-Mapping.  (Xu et al., 2021 parameterization; -k 19 -r 2.5) based on 1.5 million simulated reads F I G U R E A 1 Alignment precision relative to read length and mapping quality for the simulated European sample. See Figure 1 A PPEN D I X F I G U R E A 2 Alignment precision relative to read length and mapping quality for the simulated African sample. See Figure 1 F I G U R E A 3 Summary of alignment performance of BWA-mem relative to BWA-aln across increasing levels of contamination from results reported in Xu et al. supplementary table 4. Xu et al. (2021) summarise alignment results using two statistics: (1) the contamination rate after treatment (CRT; top panel), which measures the proportion of aligned contaminants relative to all contaminant reads, and (2) the loss rate of endogenous DNA (LRE; bottom panel), which records the proportion of unmapped endogenous reads relative to all endogenous reads. Notably, the reported mean and median values for these both metrics are consistently higher for BWA-mem relative to BWA-aln -as shown by the natural logarithm of the ratio of BWA-mem to BWA-aln being consistently above 0 (dashed line) for both metrics -indicating that BWA-mem tends to map more contaminant reads and less endogenous reads than BWA-aln across all tested contamination levels, whereby BWA-mem has poorer overall performance