What can a comparative genomics approach tell us about the pathogenicity of mtDNA mutations in human populations?

Abstract Mitochondrial disorders are heterogeneous, showing variable presentation and penetrance. Over the last three decades, our ability to recognize mitochondrial patients and diagnose these mutations, linking genotype to phenotype, has greatly improved. However, it has become increasingly clear that these strides in diagnostics have not benefited all population groups. Recent studies have demonstrated that patients from genetically understudied populations, in particular those of black African heritage, are less likely to receive a diagnosis of mtDNA disease. It has been suggested that haplogroup context might influence the presentation and penetrance of mtDNA disease; thus, the spectrum of mutations that are associated with disease in different populations. However, to date there is only one well‐established example of such an effect: the increased penetrance of two Leber's hereditary optic neuropathy mutations on a haplogroup J background. This paper conducted the most extensive investigation to date into the importance of haplogroup context on the pathogenicity of mtDNA mutations. We searched for proven human point mutations across 726 multiple sequence alignments derived from 33 non‐human species absent of disease. A total of 58 pathogenic point mutations arise in the sequences of these species. We assessed the sequence context and found evidence of population variants that could modulate the phenotypic expression of these point mutations masking the pathogenic effects seen in humans. This supports the theory that sequence context is influential in the presentation of mtDNA disease and has implications for diagnostic practices. We have shown that our current understanding of the pathogenicity of mtDNA point mutations, primarily built on studies of individuals with haplogroups HVUKTJ, will not present a complete picture. This will have the effect of creating a diagnostic inequality, whereby individuals who do not belong to these lineages are less likely to receive a genetic diagnosis.


| INTRODUC TI ON
Mitochondria are involved in a range of cellular functions such as apoptosis and cell death, calcium buffering and the generation of ATP by oxidative phosphorylation. Mitochondrial DNA (mtDNA) is a circular chromosome comprising of ~16 kbp and encodes 13 proteins, 22 tRNAs and 2 rRNAs. Cells contain hundreds or even thousands of copies of mtDNA. In cells or tissues where the mtDNA is homoplasmic, all mitochondrial genomic sequences are the same, which is the expected state. However, it is possible for more than one mtDNA genotype to exist. When we see two mtDNA genotypes in individual cells or a tissue type, this is a state known as heteroplasmy (Tuppen, Blakely, Turnbull, & Taylor, 2010). Patients with mitochondrial disorders normally exhibit heteroplasmy, where one of the genotypes in an mtDNA species has a pathogenic mutation.
Commonly, a biochemical defect will become apparent if the number of mutated sequences accounts for ≥60% of the mitochondrial genomic content, known as the threshold effect . It is estimated that the prevalence of mitochondrial disorders is ~1/4,300 within the adult European population and over 2/3 of these will be due to an mtDNA mutation (Gorman et al., 2015).
Mitochondria are inherited solely down the maternal lineage and, therefore, do not undergo bi-parental recombination (Elson et al., 2001). This has the effect that the evolution of mtDNA is defined by the emergence of distinct lineages called haplogroups. Databases such as MitoMap and Phylotree have compiled a wealth of information regarding human haplogroup lineages, mtDNA variation and disease association (Lott et al., 2013;Oven & Kayser, 2009). It should be noted that mtDNA accumulates single nucleotide variants (SNVs) at a higher rate than nuclear DNA (Song et al., 2005). This is useful for those looking at population histories as a sufficient phylogenetic signal can accumulate to study population histories (Howell, Elson, Howell, & Turnbull, 2007). However, all this variation presents challenges in the linkage of genotype to phenotype in the context of mtDNA disease.
Over the years, there has been considerable debate about the best way to link genotype to phenotype. The Yarham et al. (2011) pathogenicity scoring system is a well-recognized, widely used system in the mitochondrial community. It is weighted towards functional studies, namely cybrid and single fibre analysis, which can clearly link genotype to phenotype (Yarham et al., 2011). MitoTip is a new tool by MitoMap designed to provide an initial pathogenicity prediction for newly identified variants. It utilizes a frequentist and evolutionary approach, taking three key observations into account: variant history and conservation; variant location; and disruption to the secondary structure (Sonney et al., 2017).
Mitochondrial disorders have been most widely studied in patients with Caucasian European haplogroups and, whilst haplogroup divergence allows the opportunity to study global migration patterns, lack of knowledge of phylogenetic diversity in non-Caucasian and non-European haplogroups might reduce the accuracy of clinical diagnosis in these populations (van der Westhuizen et al., 2015).
Previous research in Black South African populations has shown discrepancies in the rate of diagnosis in the context of disease arising from mtDNA mutations (van der Westhuizen et al., 2015). This may be because either the pathogenic mutations or their presentation differs from those found in Caucasian Europeans. The phenotypic presentation of mitochondrial disease is also thought to differ between populations (Smuts et al., 2010) suggesting there is much still to learn about this group of diseases globally (van der Walt et al., 2012;van der Westhuizen et al., 2015). Here, additional evidence to support the importance of mitochondrial sequence context in the expression and penetrance of pathogenic mtDNA mutations is presented.
One means of exploring the impact of mtDNA sequence context is the use of sequences from non-human species. If a non-human animal that does not exhibit symptoms of mitochondrial disease harbours a proven point mutation associated with disease in humans, then exploring the surrounding sequence context of these species may give insight into the importance of haplogroup context in the presentation and manifestation of this group of mutations.
Previous research has suggested disease-associated point mutations are likely to be found in non-human species without the presence of disease. Magalhães (2005) searched a panel of consensus sequences from 12 primates and discovered a total of 46 human "disease-associated" mutations across the mitochondrial genomes of these species (Magalhães, 2005). Similarly, Kern and Kondrashov (2004) focused on the mt-tRNA genes and compiled single sequences from 106 species. They identified 52 pathogenic mutations across the mt-tRNAs and proposed four mechanisms for masking pathogenic mutations that fall in the stem regions of the molecules. However, both of these studies were conducted prior to the existence of an accepted methodology to link genotype to phenotype in the context of mtDNA disease. Thus on re-evaluation, the evidence to support a link between the variants/mutations they reported and disease in humans was often weak.
More recently, Queen, Steyn, Lord, and Elson (2017) performed a much larger study utilizing multiple sequence alignments from 33 non-human species. This more recent study also applied an accepted inclusion criterion for the pathogenic variants (Yarham et al., 2011). Queen et al. (2017) focused on the m.3243A > G mutation which is the most prevalent mitochondrial point mutation causing disease in humans. It is a common cause of mitochondrial myopathy, encephalopathy, lactic acidosis and stroke-like episodes (MELAS) amongst other phenotypes. Queen et al. (2017) studied the mt-tRNA-LEU(UUR) gene which is affected by the m.3243A > G mutation and found this pathogenic mutation was present amongst sequences from the dog (Canis lupus familiaris). Further exploration of the mt-tRNA-LEU(UUR) gene revealed two variants which change a G:U Wobble base pair and a mismatch pair to Watson-Crick like pairs within the D-stem. These changes to the secondary structure could mask the pathogenic effects of m.3243A > G in this species.
Four more pathogenic point mutations from mt-tRNA-LEU(UUR) were also identified in a selection of the non-human species and, like m.3243A > G, evidence of potential masking variants was present (Queen et al., 2017). Subsequently, O'Keefe, Queen, Meldau, Lord, and Elson (2018) searched across the seven mitochondrial complex I protein-encoding genes of the same 33 species. Again, pathogenic point mutations were found; however, at a much lower frequency.
Three proven pathogenic point mutations were found across the seven genes of complex I, in contrast to the five point mutations in a single mt-tRNA gene. Only one of the three mutations observed in the protein-encoding genes m.3308T > C, exhibited its disease-associated amino acid change as seen in humans.
Furthermore, the evidence supporting pathogenicity of this mutation is debated, particularly surrounding its role in left ventricular hypertrabeculation/noncompaction (Salas & Elson, 2012). This contrasting finding suggests that sequence context may be of less importance in the presentation and penetrance of mtDNA mutations in mt-protein-encoding genes . One explanation is the differential strength of purifying selection in these genes. In murine models, changes in the protein-encoding genes are rapidly eliminated, but changes in the mt-tRNAs persist for many more generations (Kauppila et al., 2016;Stewart et al., 2008). This phenomenon could help maintain interactions between the mitochondrial and nuclear proteins responsible for oxidative phosphorylation. Mitonuclear protein interactions have been shown to influence disease manifestation in some cases (Loewen & Ganetzky, 2018), and it has been suggested that supernumerary nuclear proteins could mask certain pathogenic mutations by stabilizing the protein complexes (Mimaki, Wang, McKenzie, Thorburn, & Ryan, 2012). It might also be related to differential selective pressure on mt-protein-encoding genes and mt-tRNA genes during the formation of primordial germ cells.
To expand our understanding of sequence context on the expression and penetrance of mitochondrial mutation, this study aims to continue the work of Queen et al. (2017) by identifying whether pathogenic point mutations are present in the remaining 21 tRNA genes and how the pathogenic effect seen in humans is suppressed. Queen et al. (2017) previously compiled 2,784 mt-tRNA sequences from 33 non-human species GenBank records. These species were restricted to the Chordata phylum; each species also required a minimum of 30 complete mitochondrial sequences in GenBank as part of the selection criteria. The mt-tRNA gene sequences from the Revised Cambridge Reference Sequence (rCRS), NC_012920.1, were added to the corresponding non-human species FASTA files.

| Multiple sequence alignment generation and quality control
Multiple sequence alignments were produced from the FASTA files using the ClustalW alignment algorithm (Thompson, Higgins, & Gibson, 1994). All multiple sequence alignments were quality-controlled. Sequences > 5 nucleotides longer or shorter than the rCRS were identified with a short script which uses the Biopython AlignIO module (Cock et al., 2009;O'Keefe, 2018). These sequences were then manually inspected and either removed or trimmed accordingly.
Similarly, the script was also utilized to identify sequences with ≥5 unknown bases ("N"). These sequences were then removed from the alignments.

| Variant scoring and analysis of the MitoTip scoring system
As the genomic location of the mt-tRNA genes can vary between species, all variants refer to the nucleotide positions within the rCRS. Equivalent positions were identified within the species alignments by their location within the individual genes. A list of human disease-associated variants was compiled from the MitoMap database (Lott et al., 2013) [Accessed: 26-05-2018. Each variant was scored for pathogenic status in accordance with the widely accepted Yarham et al. criteria (Yarham et al., 2011). FASTA sequences for each of the mt-tRNA genes were collected from the following GenBank records: NC_012920, NC_001643, NC_001644, NC_005089, NC_001655.2, NC_006853, NC_001323, NC_002081 and NC_002082.1. ClustalW alignments of the nine sequences for each mt-tRNA gene were performed. The Biopython AlignIO module was used to assess conservation at the site of each disease-associated variant as part of the pathogenicity scoring criteria set forth by Yarham et al. (2011).
A search of the MitoMap database looked for each of the variants MitoTip pathogenicity predictions (Sonney et al., 2017) [Accessed: 26-05-2018]. The pathogenicity predictions by MitoTip were compared with the pathogenicity status of each variant derived from the scoring system devised by Yarham et al. (2011). Furthermore, the conservation index and GenBank frequencies for each pathogenic mutation were queried using the MitoMaster SNV Query search tool (Lott et al., 2013) [Accessed: 08-06-2018].

| Secondary structure analysis and assessment of Watson-Crick like base pairing
The mamit-tRNA database was used to identify whether pathogenic mutations fell within a stem or loop of the mt-tRNAs secondary structure and the corresponding base for each of the pathogenic mutations within the stem regions (Pütz, Dupuis, Sissler, & Florentz, 2007). The maintenance of Watson-Crick base pairing for those within the stem regions was then assessed to identify changes at the corresponding base (O'Keefe, 2018).

| Assessment of tertiary structure interactions
Mt-tRNAs have nine core tertiary interactions which contribute to the folding of the cloverleaf molecule (Sprinzl, Horn, Brown, Ioudovitch, & Steinberg, 1998). Each of the pathogenic mutations and any corresponding bases involved in Watson-Crick like pairs were assessed to determine whether they are involved in one of these tertiary interactions. If so, the multiple sequence alignments were examined to identify any variation at the corresponding sites of these interactions (O'Keefe, 2018).

| Phylogenetic analysis and secondary structure modelling
We can investigate within-species variability, by analysing pathogenic mutations where the minor allele arises in ≥5 individuals in any given species. Clades were determined by the collective polymorphic sites within the mt-tRNA gene of the given species. Each clade and its frequency were then loaded into Network 4.6.1.6 for phylogenetic analysis (Bandelt, Forster, & Röhl, 1999). In addition, the sequence of each clade was modelled using the tRNA-SE SCAN search server (Lowe & Chan, 2016). Sequence source was set to vertebrate mitochondria with search mode set as default. The models were used to identify population variants which have the potential to mask these pathogenic mutations.
MitoMap was used to retrieve human mtDNA sequence records which contain the masking variants.

| RE SULTS
A total of 726 multiple sequence alignment files, comprising 22 mt-tRNAs from each of the 33 species, were produced. Quality control was used to ensure the removal of poor-quality sequences resulting in the removal of 50 Sequences across 10 species (Queen et al., 2017), see Table 1.  Table 2. Overall, the results suggest that MitoTip is specific at the cost of some sensitivity.

| Pathogenic point mutations in nonhuman species
MitoMap lists 217 mt-tRNA variants that have been associated with disease (Lott et al., 2013). As the assessment of pathogenicity shows, 113 of these are classified as definitely pathogenic mutations. Each of the species alignments was searched for the presence of any of these 217 variants. Across the 22 mt-tRNA genes, 175 variants presented as either polymorphic or monomorphic changes in ≥1 species. Fifty-eight of the changes seen are classed as definitely pathogenic mutations with 34 of these mutations being monomorphic and 24 being polymorphic, Table 3. These 58 definitely pathogenic mutations are dispersed across 19 of the 22 mt-tRNA genes.
mt-tRNA-GLN, mt-tRNA-THR and mt-tRNA-TYR do not harbour any proven pathogenic mutations in these species. Interestingly, ~82% of the monomorphic mutations arise in less than 10 species studied here and one, m.5703G > A, appeared in all primate species ( Figure 1). This mutation has been associated with early-onset disease presenting as muscle weakness, ophthalmoplegia and a loss of subcutaneous fat, resulting in an emaciated physique (Vives-Bauza et al., 2003). The MitoMaster SNV query tool searches 45,494 full-length sequences to produce a conservation index. This is the percentage of sequences which contain the same nucleotide as the rCRS at the query site (Lott et al., 2013). By consulting this, a clearer picture of conservation can be established. Table 4 demonstrates the conservation index and GenBank frequency of all 58 definitely pathogenic mutations seen in the non-human species.

| Positions of pathogenic mutations and the maintenance of Watson-Crick base pairing
Of the 58 pathogenic mutations found in the non-human species,  Table 3.

| Variability in bases involved in the nine tertiary interactions
The cloverleaf structure of mt-tRNAs undergoes a tertiary folding pattern to become an L-shaped 3D molecule. In order to achieve this, nine long-range folding interactions are required (Helm et al., 2000). Involvement in these tertiary interactions was noted for each cases showed no changes at the additional sites, Table 3.

| Phylogenetic analysis and secondary structure modelling
Eight genes held a single pathogenic mutation where the minor allele was present in at least five individuals in any given species, Table 5.
One of these was the m.3243A > G in the mt-tRNA-LEU(UUR) gene.
As Queen et al. (2017) have already investigated this mutation and gene in detail, it was not considered for further analysis. The polymorphic mutations across the remaining seven genes were taken forward for analysis. The multiple sequence alignments of each of the species containing the polymorphic mutations were subdivided into clades according to the total polymorphic variability within the gene.
Each of these clades was modelled to demonstrate the impact of the total nucleotide variability on the secondary structure of the mt-  shown as (Figures S1-S4).
F I G U R E 1 All pathogenic mutations presenting in 100% of the sequences for every species in which they are identified are classified in this study as monomorphic mutations

| m.5650G > A mt-tRNA-Ala
The m.5650G > A mutation is associated with Myopathy (McFarland et al., 2008). The MitoMaster SNV query tool revealed that the conservation index for this position is 64.44% and a single GenBank record from a disease report is available, as shown in Table 4 (Annunen- Rasila et al., 2006;Lott et al., 2013).
This mutation is present in 118 of the Sus scrofa sequences studied here. It is also present monomorphically and as a low frequency polymorphism in other species, as shown in Table 5. The

| m.8344A > G mt-tRNA-Lys
The m.8344A > G mutation is particularly interesting as this is for this mutation, one from a disease report and three from population studies, as shown in Table 4 (Kutanan et al., 2018;Mishmar et al., 2003;Neparáczki et al., 2017;Zsurka et al., 2007). Nine individuals from Ovis aries and nine individuals from Pan troglodytes verus harbour this mutation. It is also seen as a monomorphism, near monomorphism or low frequency polymorphism in other species, Table 5

| m.1644G > A mt-tRNA-Val
Macaca fascicularis exhibited the m.1644G > A mutation in 27 individuals. It is in two other species as a monomorphism and low frequency polymorphism, Table 5. This mutation is associated with MELAS (Tanji et al., 2008). The conservation index for this position is 91.11%, and no entries of this mutation are present in GenBank (Table 4; (Sonney et al., 2017) and MitoMaster SNV (Lott et al., 2013). The table also includes information on whether the variants are monomorphic and polymorphic in the named species. Importantly, information as to whether the variants are predicted to affect secondary and tertiary interactions is given, see Supplemental Tables.xlsx. interactions. In the current study, this phenomenon is seen with approximately half of the pathogenic mutations that arise in the stems (Table 3). It is thought that the disruption to the Watson-Crick like bond is responsible for the manifestation of disease rather than the specific variant (McFarland, Elson, Taylor, Howell, & Turnbull, 2004).

| D ISCUSS I ON
There are nine core tertiary interactions that are important for correct folding of the mt-tRNAs into the canonical L-shape (Helm et al., 2000). Each of the 58 pathogenic point mutations and the 41 corresponding bases in the stems were assessed for their involvement in tertiary interactions (Table 3). It is important to consider the corresponding bases in the stems as any involvement in tertiary interactions may be dominant over maintaining the secondary structure bonds. Approximately 25% of the 99 sites were involved in one of the nine interactions, just over half of which showed further changes at the other sites (Table 3) (Table 5). Three of the pathogenic point mutations were of particular of interest as they arise in species that have substantial sequence variability to define multiple clades.
Variants within mt-tRNA-Ala have been associated with isolated myopathy. Isolated myopathy presents as pure muscle weakness with variable age of onset (Lehmann et al., 2015). One pathogenic point mutation which causes this condition is m.5650G > A. As mt-tRNA-Ala is encoded on the heavy strand of the mitochondrial genome, all sequences, variants and mutations refer to the tRNA molecules complement sequence. Therefore, m.5650G > A is equal to a C > U change in the tRNA molecule itself. This mutation is polymorphic at high frequency within Sus scrofa, Table 5 (Varani & McClain, 2000). Previous studies in Escherichia coli have shown that alteration of the G:U Wobble pair to an A:U pair, as seen here (Figure 2), lowers the recognition sites binding affinity but increases the stability of the backbone (). The pathogenic mutation, m.5650G > A, gives rise to a U:G Wobble pair. The angle of U:G Wobble pairs is ~2 Å different to G:U Wobble pairs and, because these sites are adjoining, the small difference in angle may be enough to abate the loss of binding affinity (Masquida & Westhof, 2000). It is plausible that this change has allowed m.5650G > A to arise without disease.
The two most commonly seen point mutations in patients with mitochondrial disorders are m.3243A > G in the D-loop of mt-tRNA-Leu(URR) and m.8344A > G in the T-loop of mt-tRNA-Lys.
These variants are assoctiated with the MELAS and MERRF syndromes, respectively (Yarham, Elson, Blakely, McFarland, & Taylor, 2010). Myoclonic Epilepsy with Ragged Red Fibres is a highly debilitating disorder that presents primarily as ataxia, progressive spasmodic seizures and an accumulation of abnormal mitochondria under the sarcolemmal membrane of skeletal muscle fibres (Brinckmann et al., 2010;Lorenzoni et al., 2014). In all mt-tRNAs, meaning an A,G or G,A couple is always seen at these positions ( Figure 5). This suggests that the angle these nucleotides create F I G U R E 2 Phylogenetic analysis and secondary structure modelling of Sus scrofa mt-tRNA-Ala. Polymorphic variability in the Sus scrofa sequences divides the alignment into 11 clades. The phylogenetic network demonstrates the clades with and without the m.5605G > A mutation, drawn using NETWORK 4.6.0.6. As mt-tRNA-Ala is encoded on the heavy strand of the mitochondrial genome, all sequences and variants are denoted as the complement to the mtRNA molecule. These differences can be seen between the alignment of the clades and the secondary structure models here. Secondary structure analysis demonstrates m.5650G on the Human rCRS and its G > A change in group 5 of Sus scrofa. The adjoining G:U wobble pair in the rCRS and its change to an A:U Watson-Crick like pair in Sus scrofa is also noted F I G U R E 4 Phylogenetic analysis and secondary structure modelling of Pan troglodytes verus mt-tRNA-Lys. Four clades were derived from the alignment, based on polymorphic variation. The phylogenetic network demonstrates clades with and without m.8344A > G. Secondary structure modelling of the human rCRS and group 1 Pan troglodytes verus sequences demonstrate the m.8344A > G and m.8310T > C F I G U R E 3 Phylogenetic analysis and secondary structure modelling of Ovis aries mt-tRNA-Lys. Nine clades were derived from polymorphic variability within the alignment. The phylogenetic network demonstrates the clades with and without the m.8344A > G mutation. Secondary structure modelling of the rCRS demonstrates m9344A in the T-loop, This reiterates the importance of considering sequence context when looking at mtDNA mutations, particularly in understudied populations.
We noted that in these species, quite often the nearest stem pair is modified in some way (Figures 2, 3, and 5), as seen with m.3243A > G by Queen et al. (2017). Kern and Kondrashov noted also that stem pairs nearby are often modified and indirectly stabilize the pathogenic mutation where they are seen in the absence of disease (Kern & Kondrashov, 2004 complete human sequences suggesting an importance of mtDNA background, or haplogroup context in the penetrance of disease. Their data suggested disease-causing mutations were more frequent in young sequences, or lineages (Wei, Gomez-Duran, Hudson, & Chinnery, 2017).
Similar observations have been reported in the past (Howell et al., 2007); however, other papers presented evidence to suggest all branches of the human phylogeny have been subject to the same level of purifying selection (Pereira, Soares, Radivojac, Li, & Samuels, 2011). This raises the question of the speed at which purifying selection takes place at the population level. The timeframe of this process has been a long-standing area of debate with ramifications on the use of mtDNA as a molecular clock to study population histories (Howell et al., 2007;Howell, Howell, & Elson, 2008). The ages of the lineage in such studies are calculated using the number of differences seen in the sequences in question compared with the reference sequence the rCRS or revised Cambridge reference sequence. The reference sequence is a European sequence; thus, the age of lineages calculated by this method is dependent on the location of the sequence in question compared with the reference sequence. If the reference sequence had been at a different location different lineages would be deemed to be young/old, this has been highlighted by Behar et al. (2012). It was suggested that a change in reference sequence to a hypothesized most recent common ancestor (MRCA) of all modern humans to help avoid such confusion in the context of the "age" of a lineage.
Others however argued that any such change would instigate confusion in the database that would impact negatively on the medical and forensic fields (Bandelt, Kloss-Brandstätter, Richards, Yao, & Logan, 2014).
It is worth highlighting that compensatory nuclear DNA variants for mtDNA mutations of Complex I have also been seen in other species. The interdependent nature of mito-nuclear proteins means nuclear variability, particularly in the supernumerary subunits, is likely to be able to resolve stability within the protein complexes (Mimaki et al., 2012). The work of Loewen and Ganetzky (2018) is an important exemplar when considering nuclear mitochondrial interactions. Their paper showed that that the phenotypic severity of a complex 1 mutation causing Leigh syndrome phenotype varies depending on the maternally inherited mitochondrial background. Leigh syndrome is a severe disorder characterized by early, progressive neurodegeneration, with both intellectual and motor difficulties, and deficient mitochondrial respiration (Lake, Compton, Rahman, & Thorburn, 2016).
For a long time, the presence of a variant as a haplogroup marker excluded it as a candidate for disease (Schon, Bonilla, & DiMauro, 1997).
The data presented here and other data (Queen et al., 2017;Smuts et al., 2010) suggest out of place haplogroup markers sometimes called "private variants" should be considered as candidates and investigated using defined approaches (Yarham et al., 2011). In summary, studies such as the one presented here will allow us to gain a greater sense of the impact of mutations on tertiary structure and improve mechanistic understanding. They suggest there is clinical as well as anthropological motivation to continue to learn about mtDNA variation in populations where the mtDNA phylogeny is less well known. This knowledge might be essential to the diagnosis of disease (van der Westhuizen et al., 2015), which will be required if cutting edge therapies are to be offered to all population groups (Meldau et al., 2016). Certainly, this study reiterates that researchers and clinicians should not consider variants in isolation.

ACK N OWLED G EM ENTS
Tom May for careful and thoughtful proofreading of the manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in GenBank at https ://www.ncbi.nlm.nih.gov/genba nk/, reference number NC_012920, NC_001643, NC_001644, NC_005089, NC_001655.2, NC_006853, NC_001323, NC_002081, NC_002082.1.