First mitogenome phylogeny of the sun bear Helarctos malayanus reveals a deep split between Indochinese and Sundaic lineages

Abstract The sun bear Helarctos malayanus is one of the most endangered ursids, and to date classification of sun bear populations has been based almost exclusively on geographic distribution and morphology. The very few molecular studies focussing on this species were limited in geographic scope. Using archival and non‐invasively collected sample material, we have added a substantial number of complete or near‐complete mitochondrial genome sequences from sun bears of several range countries of the species' distribution. We here report 32 new mitogenome sequences representing sun bears from Cambodia, Thailand, Peninsular Malaysia, Sumatra, and Borneo. Reconstruction of phylogenetic relationships revealed two matrilines that diverged ~295 thousand years ago: one restricted to portions of mainland Indochina (China, Cambodia, Thailand; “Mainland clade”), and one comprising bears from Borneo, Sumatra, Peninsular Malaysia but also Thailand (“Sunda clade”). Generally recent coalescence times in the mitochondrial phylogeny suggest that recent or historical demographic processes have resulted in a loss of mtDNA variation. Additionally, analysis of our data in conjunction with shorter mtDNA sequences revealed that the Bornean sun bear, classified as a distinct subspecies (H. m. euryspilus), does not harbor a distinctive matriline. Further molecular studies of H. malayanus are needed, which should ideally include data from nuclear loci.

DNA extracted from such material is usually highly degraded, rendering downstream analyses difficult (e.g., Pääbo, 1989). However, targeted capture coupled with high throughput sequencing enables retrieval of genetic information from such degraded sample material, with mitochondrial DNA (mtDNA) often the marker of choice, because the high copy number of mitochondria per cell benefits sequence recovery (e.g., Jones & Good, 2016;Paijmans et al., 2013).
MtDNA sequences enriched by targeted capture can then be used to resolve population genetic structure and taxonomic relationships within species, and can provide insights into the historical processes that have shaped contemporary populations (Avise, 2000).
With regard to threatened species, conservation management greatly benefits from such knowledge (Martins, Schmidt, et al., 2017;Wilting et al., 2011Wilting et al., , 2015, as observational data or phenotypic characters may not accurately reflect how genetic variation is partitioned within species Patel et al., 2017;Wilting et al., 2016), or which populations have evolved independently for long periods of time ('evolutionary significant units'). Such information is still lacking for many wildlife species, including those inhabiting ecosystems in global biodiversity hotspots, such as Southeast (SE) Asia, that are increasingly adversely affected by anthropogenic land-use and land-cover change (Nguyen et al., 2021;Tilker et al., 2020).

Such a knowledge gap also exists for the Malayan sun bear
Helarctos malayanus, which is currently listed as Vulnerable on the IUCN Red List of threatened species (Scotson et al., 2017) and is in Appendix 1 of CITES (CITES, 2022). Sun bears perform significant ecological functions throughout their range, including soil turnover, seed dispersal, and the maintenance of trophic relationships (e.g., Augeri, 2005;Mcconkey & Galetti, 1999). This monotypic species of small bears was once widely distributed across SE Asia (Figure 1), occurring in eastern India, Bangladesh, Myanmar, Thailand, Laos, Cambodia, Vietnam, Malaysia, Indonesia, and southern China (Fitzgerald & Krausman, 2002). Over the past decades, the species' distribution area has dramatically shrunk, and it is estimated that sun bears now only occur in 32%-40% of their former range (Crudge et al., 2019), with a concomitant decline in population size (e.g., Crudge et al., 2016;Jenks et al., 2011;Scotson et al., 2017). It is the sole ursid that resides in the Sundaic region, which has experienced significant forest loss in recent years, and population reductions in this region are anticipated to be the most severe (Scotson et al., 2017).
In order to describe and properly manage the remaining genetic diversity in sun bears molecular data are needed. For example, a phylogeographic approach can reveal management and evolutionary units (Avise, 1992;Moritz, 1994), help to identify conservation priorities (Goossens et al., 2013), and pinpoint suitable reintroduction locations (Apollonio et al., 2014).
Existing studies of H. malayanus using molecular markers (mtDNA sequences and microsatellites) were limited in their geographic scope (Kunde et al., 2020;Onuma et al., 2006), and we thus lack knowledge about the distribution of intraspecific genetic diversity across the species' range. To address this shortcoming, we sequenced mitochondrial genomes from samples across a large portion of the sun bear distribution, utilizing both archival material from natural history museums as well as noninvasively collected material (mucosal cells from saliva). Using these complete mitochondrial genomes (mitogenomes), we elucidated the geographic distribution of maternal lineages (matrilines) and provide insights into the species' history.

| Samples
Dried tissue remains from the skull or tissue from the nasal cavity of 29 archival sun bear specimens were collected from several natural history museums (Table S1), providing a broad geographic distribution of samples.
We also collected saliva samples from 21 sun bears, which originated from 12 provinces in Cambodia and which were housed at the 'Free the Bears' sanctuary in Phnom Tamao Zoo and Wildlife Rescue Centre. We included up to two samples per province in order to obtain a balanced geographic coverage of Cambodia.

| Archival and non-invasively collected DNA
DNA from archival samples was extracted following Wilting et al. (2016), in a laboratory designed for and limited to the use of archival material. DNA extractions were conducted in batches consisting of three samples and one negative control and verified to be of sun bear origin by amplification of a 146 bp long portion of the mitochondrial cytochrome b gene using species specific primers (Table S2).
DNA extractions from non-invasively collected samples (salivary mucosal cells) were carried out using the GEN-IAL First DNA All-tissue DNA extraction Kit (GEN-IAL GmbH) following manufacturer's instructions. This work was carried out in a separate facility, dedicated to molecular work with non-invasively collected material.

| Library preparation and hybridization capture
Illumina sequencing libraries were constructed following Fortes and Paijmans (2015), using double-indexing with 8-bp indexes. Prior to indexing we conducted qPCRs to determine the optimal PCR cycle number for each sample.
To reduce effects by target DNA degradation and external DNA contaminations, we carried out hybridisation capture for both sample types (archival, non-invasively collected).
Baits for hybridization capture were generated using long range (LR) PCR products that spanned three large overlapping regions of the mitogenome (using a fresh Helarctos malayanus sample as template). The LR-PCR primers were designed using the genbank ref- Bioline Myfi mix (Bioline GmbH), 2 μL of forward and reverse primers (10 μM), 3 μL of template DNA (100 ng/μL). PCR conditions were as follows: initial denaturation at 94°C for 3 min, followed by 40 cycles of denaturation at 94°C for 30 s; annealing at 60°C for 30 s; extension at 68°C for 7 min with the final extension at 68°C for 10 min.
LR-PCR products were sheared (target size 300 bp), pooled equimolarly and then converted into biotinylated baits following Maricic et al. (2010). Targeted capture was carried out using hybridization temperatures appropriate for sample type (Paijmans et al., 2016).
Paired-end sequencing was carried out on the Illumina MiSeq platform (Illumina) using v3 150-cycle kits.

| Bioinformatic workflow
We de-multiplexed paired-end reads using bcl2fastq v2.17.1.14 (Illumina, Inc.) and removed adapter sequences using cutadapt v1.3 (Martin, 2011). Using trimmomatic (Bolger et al., 2014), we applied a sliding window approach for quality trimming with the phred F I G U R E 1 Former and current H. malayanus distribution range (following Scotson et al., 2017). The number of samples for which we were able to retrieve mitochondrial genomes is indicated, as is the Isthmus of Kra ("IoK").

| Summary statistics
The final data set of complete or near-complete sun bear mitogenomes consisted of 15 archival and 17 non-invasively collected samples, analyzed together with three published sequences (GenBank: MN807949, FM177765 and EF196664; see Table 1 for details).
Mitogenome sequences were aligned and manually curated in Geneious v. 8.1.9 (Kearse et al., 2012). Summary statistics for the complete dataset (N = 35), as well as subdivisions based on phylogenetic analyses (below) were generated using DnaSP v.6.12.03 (Rozas et al., 2017). For these estimates we excluded sites with missing data, ambiguous data, or gaps.

| Phylogenetic analyses
To generate a time-calibrated mitochondrial phylogeny of sun bears, we first estimated the root age in a species level analysis calibrated based on the fossil ages of other representatives of the Ursidae. We then applied this root age estimate to population level analyses of sun bear mitochondrial sequences. This two step strategy was chosen to best accommodate the assumptions of the tree priors used in these respective analyses.
For the species level analysis, we aligned two divergent sun bear mitogenomes (HMA_26_TH, and HMA_35_CAMB) and one mitog-  (Lanfear et al., 2016) to identify an optimal set of partitions and substitution models among all combinations of tRNAs, rRNAs and the three codon positions of the protein coding genes, using the greedy search algorithm and considering all substitution models available in BEAST, using the Bayesian Information Criterion. The Bayesian phylogenetic analysis package BEAST v.1.8.2 (Drummond et al., 2012) was then used to estimate species phylogeny and divergence times ( Figure S1). A birth-death tree model was used, with a lognormal relaxed clock model, with a uniform prior on the mean substitution rate of 0 to 20% per million years (My). Timecalibration was achieved using informative uniform priors based on fossil and other evidence, following the approach of a previous study of the Ursidae (Kumar et al., 2017). Monophyly was enforced for all calibrated nodes. These were: (1) a prior based on the divergence of the Tremarctinae and Ursinae clades between 7 and 14 My, based on a fossil of Tremarctine bear Plionarctos (Tedford, 2001); (2) a prior for the basal divergence of the Ursidae with an upper limit of 12 My, based on a fossil of the Ailuropodinae (Abella et al., 2012) and a lower limit of 20 My, determined by a molecular dating study (Wu et al., 2015); (3) a prior on divergence of the Ursinae between 4.3 and 6 My based on the age of Ursus minimus (Gustafson, 1978); and (4) a prior on the divergence of U.
All other priors were left at default values. Initial runs showed a lack of convergence for some substitution model parameters, and so we substituted them with simpler models (Table S3) The divergence estimate recovered for the two sun bear mitogenomes (mean 0.3054 My, standard deviation 0.0371 My) was then applied as normal prior on the root height for the intraspecific analyses, and were run with a Bayesian Skyline coalescent population model with eight groups. Data partitions and substitutions were estimated using PartitionFinder as described previously. As we did not expect variation in mitochondrial substitution rate within sun bears, we used a strict clock model with a uniform prior on the per-lineage substitution rate of 0 to 20% per My. Details of the MCMC runs were as described above. The maximum clade credibility tree was then selected from the posterior sample, and node heights scaled to the median posterior age, using TreeAnnotator 1.8.2.
Clade support was assessed using 100 rapid bootstrap replicates assuming the GTR + CAT substitution model, with a subsequent thorough maximum likelihood search for the best tree assuming the GTR + G substitution model. For the species level analysis, the tree was rooted using the Giant Panda (acc. no. EF196663) as an outgroup ( Figure S2). For the intraspecific analysis, the Asian black bear (acc. no. EF196661) was used as an outgroup ( Figure S3).
We also generated a dataset combining portions of our mi- Note: Further sample details are provided in Table S1. mitogenomes. Putative haplotypes were mostly shared among sun bears from Cambodia (Table S5), however, one such putative haplotype was shared by bears from Thailand and Borneo (HMA_15_ TH, HMA_19_BOR).
The 32 mitogenomes were analyzed together with three published sequences (GenBank: acc. no. MN807949, FM177765 and EF196664; see Table 1 for details). Summary statistics for the complete mitogenome dataset (N = 35), as well as subdivisions of the data based on phylogenetic analyses (below) are presented in Table 2.

| Mitogenome phylogeny
Phylogenetic analysis of the sun bear mitogenome dataset (35 sequences) revealed two well-supported clades (posterior clade credibility 1 for each clade), henceforth referred to as "Mainland clade" and "Sunda clade" following Kunde (2017) Table S1). This clade also gave rise to two major lineages, which diverged ~128 kya (CI 95 : 170-93 kya), that correspond to mtDNA lineages previously identified : the Peninsular lineage and Sabah lineage (indicated in Figure 2). These two lineages have a large geographic overlap: both include sun bears from Sumatra, Peninsular Malaysia and Thailand.
Notably, we were only able to retrieve the mitogenome for one sample from Borneo, so we are unable to determine if both or only one of these lineages occurs on this island. Furthermore, as mentioned above, the sample from Borneo shared its putative haplotype with a sample from Thailand.
It is evident that both the Mainland and Sunda clades are present in Thailand (Figure 2; Figure S3). Unfortunately, it is not known whether these Thai sun bear samples originated from North or South of the Isthmus of Kra (5-13°N), an important zoogeographic transition zone between Sundaland and continental mainland.

| Shorter mtDNA sequences
We also considered the 35 mitogenome sequences in the context of a recent phylogenetic study on shorter mtDNA sequences of sun bears from Borneo, Peninsular Malaysia and Thailand . Phylogenetic analysis of this dataset (N = 117 sequences; alignment length 1466 bp; Figure 3) also recovered the two major clades with high support (posterior clade credibility 1 for each clade; Figure 2), but both were much younger than the previous estimate of ~1.5 Mya . This discrepancy in coalescence time estimates between the two studies likely reflects (i) our removal of the repetitive portion of the control region (which is difficult to resolve using short-read data), and (ii) the differences in choice of calibration points and dating methods between the two studies (see Section 2 for details).
We also enlarged the Mainland clade by Thailand samples T7,

| DISCUSS ION
By incorporating archival and non-invasively collected material in our sampling, and using targeted capture coupled with high throughput sequencing, we successfully retrieved data for mitochondrial genomes of sun bears from a large portion of their geographic distribution.
We identified two matrilines within the sampled distribution of  presence on the Sundaic islands is dated to the late Pleistocene (Long et al., 1996;Medway, 1964;Tougard, 2001), although an earlier presence is conceivable as the mammalian fossil record for Sumatra and Borneo is sparse prior to the Late Pleistocene (Meijaard, 2004). While molecular estimates of divergence times need to be considered with caution, our mitogenome data do suggest the presence of sun bears on the Sundaic islands by the late Middle Pleistocene.
It could be argued that the coalescence time for Sundaic sun bears Further sampling throughout the sun bear distribution may improve our understanding of the species' origin, as additional mitogenome lineages may be unsampled thus far (e.g., Mengüllüoğlu et al., 2021).
However, while the reconstructed mitochondrial phylogeny in general is very recent, historical and recent demographic processes may have resulted in loss of variation (mitochondrial lineages) that existed in the past, thereby obscuring older evolutionary processes.
Thus, the question of the sun bears' origin is likely best addressed using nuclear markers. The issue is further compounded by possible secondary contact of sun bears in the region where the species may have originated.

| Secondary contact
The geographic distribution of mitogenome clades reflects vicariance between sun bears of mainland Indochina and Sundaland, a biogeographic pattern that is observed in many other SE Asian species that are widely distributed (Woodruff & Turner, 2009 Pleistocene. If the loss of the Mainland clade occurred after colonization, this suggests secondary contact of the two lineages in the Thai-Malay Peninsula, as both major clades are found in Thailand. Such migration was possible when the exposed shelf connected the mainland and the major islands of the Sundaland during the low sea levels of the late Pleistocene (Bird et al., 2005), and has been inferred for sun bears  and other large mammals, such as leopard Panthera pardus (Wilting et al., 2016), leopard cat Prionailurus bengalensis (Patel et al., 2017), and muntjac Muntiacus muntjak .
If there was secondary contact of these two lineages following a northward migration of sun bears of the Sunda clade, we can con- Clearly, the available molecular data are not adequate to address these open questions, and further sampling is needed. Analyses should ideally also include nuclear genomic data, which would also be extremely valuable to address the taxonomic uncertainty regarding the sun bears of Borneo.

| Bornean sun bears
There are clear phenotypic differences between the sun bears on Borneo and those from elsewhere in the species' range, and currently two subspecies are recognized: the broadly distributed H.
However, several authors have suggested that further morphological and molecular data are needed to conclusively resolve the F I G U R E 3 Bayesian phylogenetic tree reconstructed using a dataset of short mtDNA sequences (N = 117). Two main branches are labeled according to clade assignment, and previously reported lineages (Peninsula and Sabah;   subspecific status of Bornean sun bears (Kitchener, 2010;.

| Human-mediated translocations
Sun bears are hunted for the illegal pet trade (Foley et al., 2011;Gomez et al., 2020;Krishnasamy & Shepherd, 2014;Lee et al., 2015;Shepherd & Shepherd, 2010), and are traded on and among the Sundaic islands (e.g., Gomez et al., 2019). Rescue and release of such trafficked bears may result in the introduction of non-endemic mtDNA haplotypes into local populations. It is worth noting that such translocations do not need to be recent. There is a history of human-mediated introduction or translocation of mammals (e.g., Long, 2003) for various reasons (e.g., cultural, commercial, pestcontrol, accidental). Such translocations included large mammals and carnivores in SE Asia, among others cervids (Rusa timorensis and R. unicolor, Martins, Schmidt, et al., 2017), pigs (Groves, 1984), the leopard cat (Prionailurus bengalensis, Patel et al., 2017), the Malay civet (Viverra tangalunga, Veron et al., 2014), and the Asian palm civet (Paradoxurus hermaphroditus, Flannery et al., 1995). Because some of these human-mediated translocations had commercial reasons (e.g., Groves, 1984), it is conceivable that there was also historical trade in bears and their derivatives (e.g., Hose, 1893), which may have included the translocation and subsequent (intentional or unintentional) release of sun bears. Investigation of Early Holocene samples using ancient DNA techniques could address the impact of human-mediated translocations. Unfortunately, sun bear fossils are rare, and the species inhabits an environment that is not conducive to ancient DNA preservation (e.g., Paijmans et al., 2013).

| CON CLUDING REMARK S
Thus far, the classification of sun bear populations has been based on geography and morphological traits, the latter having only been investigated using a small dataset. It is important to corroborate/supplement such efforts using molecular studies (Kitchener, 2010 formal analysis (equal); investigation (equal); software (supporting); visualization (supporting); writing -review and editing (supporting).

ACK N OWLED G M ENTS
We would like to thank the Kingdom of Cambodia and the Australian 13-IZW-2.

CO N FLI C T O F I NTER E S T S TATEM ENT
None.

DATA AVA I L A B I L I T Y S TAT E M E N T
We have deposited the primary data underlying these analyses as follows: Mitochondrial genome sequences: GenBank acc. no.
OQ564458-OQ564489; Sampling locations, haplotypes, and collection details are in Table S1; Cytochrome B primer sequences to verify target species are in Table S2; Sequences used in analysis of short mtDNA sequence are in Table S4.