Phylogeny and biogeography of the Japanese rhinoceros beetle, Trypoxylus dichotomus (Coleoptera: Scarabaeidae) based on SNP markers

Abstract The Japanese rhinoceros beetle Trypoxylus dichotomus is one of the largest beetle species in the world and is commonly used in traditional Chinese medicine. Ten subspecies of T. dichotomus and a related Trypoxylus species (T. kanamorii) have been described throughout Asia, but their taxonomic delimitations remain problematic. To clarify issues such as taxonomy, and the degree of genetic differentiation of Trypoxylus populations, we investigated the genetic structure, genetic variability, and phylogeography of 53 specimens of Trypoxylus species from 44 locations in five Asian countries (China, Japan, Korea, Thailand, and Myanmar). Using specific‐locus amplified fragment sequencing (SLAF‐seq) techniques, we developed 330,799 SLAFs over 114.16M reads, in turn yielding 46,939 high‐resolution single nucleotide polymorphisms (SNPs) for genotyping. Phylogenetic analysis of SNPs indicated the presence of three distinct genetic groups, suggesting that the various subspecies could be treated as three groups of populations. PCA and ADMIXTURE analysis also identified three genetic clusters (North, South, West), which corresponded to their locations, suggesting that geographic factors were important in maintaining within population homogeneity and between population divergence. Analyses of SNP data confirmed the monophyly of certain subspecies on islands, while other subspecies (e.g., T. d. septentrionalis) were found to be polyphyletic and nested in more than one lineage. AMOVA demonstrated high level of differentiation among populations/groups. Also, pairwise F ST values revealed high differentiation, particularly between South and West, as well as between North and South. Despite the differentiation, measurable gene flow was inferred between genetic clusters but at varying rates and directions. Our study demonstrated that SLAF‐seq derived markers outperformed 16S and COII sequences and provided improved resolution of the genetic differentiation of rhinoceros beetle populations from a large part of the species’ range.


| INTRODUC TI ON
The Japanese rhinoceros beetle, Trypoxylus dichotomus (Coleoptera, Scarabaeidae, Dynastinae) first described by Linnaeus (1771), is widely distributed in China, Japan, the Korean Peninsula, Vietnam, Myanmar, Laos, India, and Thailand (Adachi, 2017;Nagai, 2006Nagai, , 2007Satoru, 2014). It is typically found in broad-leaved forests in tropical and subtropical mountainous habitats. The Japanese rhinoceros beetle has been important in Chinese traditional medicine for nearly 2000 years. Recent studies have found that extracts from the beetle have antihepatofibrotic, antineoplastic, and antibiotic effects (Chung et al., 2014;Kim et al., 2007;Miyanoshita et al., 1996;Ratcliffe et al., 2011;Sagisaka et al., 2001;Yoshikawa et al., 1999). A previously undescribed lectin purified from the larvae of Japanese rhinoceros beetles may assist in the suppression of human cervical cancer (HeLa), murine fibroblast (L929), and murine L1210 leukemic cells (Kui et al., 2000). Additional studies have found that Japanese rhinoceros beetles contain compounds with possible application as immunomodulators and tumor growth inhibitors (Hee et al., 2001;Ratcliffe et al., 2011;Umetsu et al., 1984Umetsu et al., , 1985. In addition to human medical applications, the bacterium Bacillus amyloliquefaciens KB3, isolated from feces of 3rd instar larvae of Japanese rhinoceros beetles, can be used as biocontrol against certain fungal plant pathogens (Nam et al., 2016). The beetle's high protein content gives it great potential as a food source when a suitable processing method is employed (Chung et al., 2013;Kim et al., 2016). In many East Asian countries, keeping Japanese rhinoceros beetles as pets has become a personal hobby and popular due to their large size and unique body shape.
Apart from the numerous applications, the Japanese rhinoceros beetle have received widespread attention owing mainly to their remarkable sexual dimorphism and individual variation, especially in the size and shape of the male's cephalic and pronotal horn (Buchalski et al., 2019;Hosoya & Araya, 2005;Ito et al., 2013;Morita et al., 2019;Ohde et al., 2018;Warren et al., 2013;Zinna et al., 2018). Male rhinoceros beetles have robust prominent cephalic horns for intraspecific competition over reproductive opportunities and in interspecific competition (like members of the Lucanidae family) over territory and resources. Horn shapes and sizes diverge greatly even among closely related species and sometimes between populations of beetles, suggesting differential selective pressures on horns among different lineages (Buchalski et al., 2019;Emlen et al., 2007Emlen et al., , 2012Hongo, 2003Hongo, , 2007Ito et al., 2013, Johns et al., 2014Karino et al., 2005;Zinna et al., 2018). Studies on the genetic mechanism of horn development suggest that the genes underlying horn formation in rhinoceros beetles (Dynastinae) were the same genes recruited for the independent derivation of horns among the dung beetles (Scarabaeinae) (Ito et al., 2013;Ohde et al., 2018). As such rhinoceros beetles and related taxa are ideal for comparative studies of sexual selection, sexual dimorphism, and the evolution of morphological innovations.
tsunobosonis Kôno in Taiwan; T. d. tsuchiyai (Nagai, 2006) in Japan (Kuchinoerabu-jima Island); T. d. shizuae in Japan (Yakushima Island and Tanegashima Island); T. d. xizangensis (Li et al., 2015) in Tibet (China); T. d. shennongjii (Satoru, 2014) in China (Hubei Province) (Adachi, 2017;Kôno, 1931;Kusui, 1976;Li et al., 2015;Linnaeus, 1771;Nagai, 2006Nagai, , 2007Prell, 1934;Satoru, 2014); and the species T.kanamorii (Nagai, 2006) in Myanmar, India and China (Yunnan and Tibet). Species and subspecies division of Trypoxylus have been controversial, with traditional identifications mainly based on morphological characteristics, such as horn shape, body size and shape, lustrousness of elytra, and pubescence on the pygidium, pronotum, and elytra. Historically, delimitations among subspecies could be arduous because of limited divergence between the morphological characters used for subspecific identification. Later, Nagai (2006Nagai ( , 2007 proposed seven subspecies of T. dichotomus in Asia and illustrated differences among these subspecies based on morphological characteristics. Satoru (2014) and Adachi (2017) described additional T. dichotomus subspecies from Japan based on body color and pubescence on elytra. However, these studies have only employed a small number of diagnostic characters to separate the subspecies without placing the characters or taxa into a phylogenetic context to better understand character evolution within Trypoxylus. Taxonomy and distribution of T. dichotomus subspecies in Asia, especially in China, has also been challenging for many of the same reasons. As such, a molecular phylogenetic approach can provide much needed resolution in evaluating the monophyly of T. dichotomus subspecies that have been delimited with sometimes overlapping characters.
Understanding spatial distribution and species delineations is critical in clearly defining patterns of biodiversity and subsequently informing conservation decisions as well as providing lineage-based comparisons for future studies, such as trait evolution and/or identifying novel medicinal compounds. The accurate inference of genetic clustering in natural beetle populations has important implications to understand the process and pattern of evolution. For instance, K E Y W O R D S beetle taxonomy, genetic variation, island biogeography, phylogeography, population structure, SLAF-seq improved resolution of within-species patterns of genetic diversity, divergence, and clustering can in turn improve estimates of population genetic processes, such as the resolution of spatial genetic differentiation among populations and increases accuracy and precision of gene flow rates estimates (Black & DuTeau, 1997;Zhu et al. 2018), which can be useful in identifying historic corridors of migration.
Numerous different molecular marker systems have been used to infer population structure and evolutionary patterns for a variety of insect species (e.g., Batista et al., 2016;Pascoal & Kilner, 2017).
For instance, Pascoal and Kilner (2017)  With random amplified polymorphic DNA (RAPD) and mitochondrial DNA COII sequence analysis, homogeneity between small brown planthoppers from different geographic regions could be inferred (Xu et al., 2001;Wang et al. 2019). Single nucleotide polymorphisms (SNPs) from genome-wide panels are known to improve the resolution of population genetic inferences such as linkage disequilibrium (LD) and population structure because of the abundant and generally even distribution across the genome over other markers, and the well-understood models of evolution applied to nucleotide mutation analyses.
The use of SNPs in studies dissecting complex traits is also attractive because they can be used to quantify heterozygosity in biallelic loci, they can be sequence-tagged to ensure orthologous comparison of loci, and genotyping can be massively multiplexed to increase throughput and reduce cost (Blanco-Bercial & Bucklin, 2016). High-throughput next-generation sequencing (NGS) technologies are now critical tools F I G U R E 1 A map showing the distribution of Trypoxylus species in East Asia with illustrations of various subspecies in developing SNP panels and genotyping subsequent individuals in studies of the ecology and evolution of traditionally poorly resolved cryptic groups. An example of a successful NGS method for large-scale de novo SNP discovery and genotyping is SLAF-seq (specific length amplified fragment sequencing) developed by Sun et al. (2013). The SLAF-seq method has proven efficient and effective for SNP based population genetic analyses in numerous plant and animal studies (Su et al., 2017;Sun et al., 2013;Wang et al. 2019).
To better understand the evolution of morphological diversification as well as clarify genetic differentiation, gene flow, and population genetic structure of Japanese rhinoceros beetle populations, we conducted phylogenetic and population genetic analyses across 10 subspecies from a large portion of the species range in eastern Asia.
DNA sequences from the mitochondrial cytochrome c oxidase subunit II (COII) and 16S ribosomal RNA (rRNA) genes were employed in reconstructing the phylogeny of Trypoxylus species and subspecies. SLAF-seq derived SNPs were also used in population genetic and phylogenetic analyses for comparison to sequence-based datasets and for improved resolution of inference.

| Morphological data
All specimens in our study were identified based on morphological characters published in previous studies (Adachi, 2017;Kôno, 1931;Kusui, 1976;Li et al., 2015;Nagai, 2006Nagai, , 2007Prell, 1934;Satoru, 2014). Some morphological characters, including body size, lustrousness, cuticle color, cephalic and pronotal horn size, pubescence on elytra, pronotum, head, and pygidium, have provided useful criteria for species identification (Table 2). In this study, the taxonomic revisions by Nagai (2006Nagai ( , 2007, Satoru (2014), and Adachi (2017) were followed as they provided the clearest subspecies descriptions. The genus Xylotrupes was treated as an outgroup in our comparisons and thought be a more distantly related to Trypoxylus than Xyloscaptes (also used as an outgroup) based on Dutrillaux et al. (2013). Digital images of diagnostic features were taken with a Nikon D7000 camera with AF-S VR Micro-Nikkor 105mm f/2.8G IF-ED-Nikon lenses and edited with Adobe Photoshop CS4, Microsoft Paint, and Autodesk Sketchbook.

TA B L E 1 Sampling information of Trypoxylus specimens used in this investigation
Gaps were treated as missing data, and all characters were equally weighted. Trees were inferred using the heuristic search option with TBR branch swapping. The robustness of the most parsimonious trees was evaluated by 1,000 bootstrap replications (Felsenstein, 1985). Other calculated measures were the tree length (TL), consistency index (CI), retention index (RI), and rescaled consistency (RC).
Bayesian analyses (BI) were performed using MrBayes v.3.1.2 with Markov chain Monte Carlo (MCMC) and Bayesian posterior probabilities (Ronquist & Huelsenbeck, 2003). Default parameters were selected, and the evolutionary model was set to the GTR + I + G model, which was the best model predicted for the concatenated 16S and COII alignment by MrModeltest v. 2.3 (Nylander, 2004). Simultaneous Markov chains were computed for 1,000,000 generations, and the trees were sampled every 100th generation (You et al., 2019).
Maximum likelihood analysis (ML) was performed by RAxMl v. 1.3 (Stamatakis, 2006). The bootstrapping was carried out with 1,000 replicates using the GTR + I + G model. Xylotrupes mniszechi and Xyloscaptes davidis were selected as outgroup taxa.

| SLAF library construction and highthroughput sequencing
For SLAF-seq analysis, DNA concentration was quantified using a

| SLAF-seq data grouping and genotyping
Raw pair-end reads were clustered based on sequence similarity.
Sequences with over 90% identity were grouped in one SLAF tag, SLAFs with low-depth coverage were filtered out Sun et al., 2013;Wang et al. 2019). Only groups with higher depth and four tags or fewer were identified as high-quality SLAFs with SLAFs possessing two, three, or four tags identified as polymorphic.
In this study, depth was 17.63× on average, and a total of 1,374,985 high-quality unique SLAF tags were obtained with 330,799 of those tags considered polymorphic.
Development of SNP markers was based on reference sequence with very high depth in each SLAF tag. SAMtools and GATK were used for mapping and SNP calling (Li et al., 2009;McKenna et al., 2010;Wang et al. 2019). A total of 46,939 SNPs with minor allele frequencies (MAF) of ≥0.05 and an integrity score of ≥80% were employed in downstream analyses.

| Population genetics analyses
Phylogenetic trees based on a polymorphic SNP matrix (53 individuals × 46,939 SNPs, also used in all subsequent analyses unless noted otherwise) were constructed using the neighbor joining (NJ) method employed in MEGA X with p-distance and pairwise deletion option parameters (Kumar et al., 2018;Saitou & Nei, 1987;Tamura et al., 2011).  (Nei, 1978). The polymorphism information content (PIC) value is commonly used as an estimate of the probability of finding polymorphism between two random samples (Shete et al., 2000).

| Genetic differentiation and gene flow between genetic clusters
Genetic differentiation was calculated using an analysis of molecular variance (AMOVA) as applied in ARLEQUIN v. 3.5.1.2, which is a method of quantifying at what level (individual, population, or total) genetic variation is greatest (Excoffier & Lischer, 2010;Peakall & Smouse, 2005;Tsui et al., 2012Tsui et al., , 2014. The divergence index and F-statistics (F ST ) were used to measure the level of genetic differentiation among T. dichotomus populations based on genetic polymorphism in the SNP data (Hudson et al., 1992), and F ST was calculated via the PopGen 32 package in BIOPERL (Yeh et al., 1997) using a 100-kb sliding windows in 10-kb steps. Pairwise F ST values were evaluated using a randomization test with 1,000 iterations using ARLEQUIN v. 3.5.1.2 (Excoffier & Lischer, 2010;Tsui et al., 2012Tsui et al., , 2014. The direction and rate of migration between genetic clusters (derived from the results of ADMIXTURE) were inferred using MIGRATE-N v. 3.6.4 (Beerli & Palczewski, 2010), which uses an expansion of the coalescent theory to estimate migration rates. Four possible models for migration between each genetic cluster were evaluated (Yang et al., 2018). The number of recorded steps in the chain was set to 500,000, and the models were run three separate times to confirm convergence of parameter estimates. The marginal likelihoods of all models were compared to infer the direction and rate of gene flow, and only the results of the run that yielded the highest Bezier approximation score (1b) value were presented (Han et al., 2016;Tsui et al., 2012Tsui et al., , 2014Yang et al., 2018).

| Morphology
External morphology of all Trypoxylus subspecies were observed and recorded (

| Phylogenetic analysis based on 16S and COII
To assess the monophyly of the Trypoxylus subspecies, a phylogenetic analysis was performed based on combined 16S and COII sequence

| SLAF sequencing and SNP discovery
Using the genome of mountain pine beetles D. ponderosae

| Phylogenetic analysis and genetic structure based on SLAF-seq derived SNPs
Phylogenetic analyses were conducted to determine the relation-  (Figure 4b). Considerable divergence was also illustrated within group South, in which samples from Taiwan and southern Japan (Okinawa and Kumejima) could be further divided into two well-supported clades. The North population was slightly differentiated from the West population along the third PCA axis (7.89% of the variation; however, they were apparently separated along the second PCA axis (10.4% of the variation).
ADMIXTURE analysis generated similar genetic clustering patterns to those based on phylogenetic analysis of SNPs and PCA.
The estimated membership fractions of 53 specimens for different values of K were assessed across a range from 1 to 10, with the optimum cluster value of 3 (K = 3) determined using the cross-validation error rate ( Figure 5), which indicated that the sampled specimens could be categorized into three genetic groups. Group North contained 14 specimens, which were from Northeastern China

| Genetic diversity among populations
The genetic diversity of the three genetic clusters inferred by genomewide SNP data is presented in Table 5. Expected heterozygosity in the South genetic cluster was the highest (H e = 0.338), followed by genetic cluster North (H e = 0.356) and lowest in the West genetic cluster (H e = 0.29). However, observed heterozygosity (South = 0.097, North = 0.197, and West = 0.177) was lower than expected in all genetic clusters suggesting past genetic bottlenecks or some other demographic or selective process is common in Trypoxylus beetles.
The mean PIC also revealed that group South was highly polymorphic (0.281), whereas group West exhibited the lowest PIC (0.240).

| Population divergence and gene flow
Analysis of molecular variance (AMOVA) indicated that 35.89% of the total variation attributed to variation among populations, while 28.29% and 35.82% were attributed to among individuals within populations and within individual, respectively (Table 6).
Pairwise F ST revealed differing levels of divergence between North, West, and South genetic clusters. The highest differentiation was between genetic clusters North and South (F ST = 0.345), while the least differentiation was between West and North (F ST = 0.094) (Table 7), which was corroborated with the results based on SNPs.
In addition, the gene flow and migration rates among populations were estimated using MIGRATE-N. Based on the Bezier was also inferred to be the highest among the clusters we compared. As such, a strong barrier to gene flow was inferred between the two population groups ( Figure 6).

| Population structure and biogeography of Trypoxylus
Resolving patterns of intraspecific variation such as genetic diversity, populations genetic structure, and gene flow within and among populations are essential to understand historic processes such as migration, isolation, and adaptive radiation (Mendelson & Shaw, 2005 = 2 to 4). Population codes are given in Table 2 The phylogenetic branching pattern within this genetic cluster is consistent with the pattern expected for an island radiation wherein individuals from the same island cluster together and the distance of islands from one another is reflected in branching order (i.e., ancestors from islands nearby to each other coalesce before coalescing to ancestors from more distant islands) (MacArthur & Wilson, 1967;Wilson, 1959;Zimmerman, 1970). From the branching pattern, it appears that Taiwan was colonized first, and then Okinawa followed by Historic and ongoing gene flow reduces differentiation between populations (Ellstrand & Elam, 1993 on Okinawa than they are to individuals on mainland China, suggests that dispersal from the mainland to islands is rare and may be more strongly influenced by ocean currents (Karl, 1999) than geographic proximity. While island dispersal has occurred in both North (Japan and associated islands and Jeju, Korea) and West (Hainan island, China) genetic clusters, these dispersal events are inferred to be more recent and/or include ongoing migration as individuals on these islands did not resolved in separate genetic clusters. However, branching patterns in the phylogenetic analysis ( Figure 3) suggests subclustering of island lineages within North and West genetic clusters. Additionally, gene flow between North and West genetic clusters is more frequent, possibly because of fewer geographic barriers.
In all genetic clusters, the observed heterozygosity was lower than expected heterozygosity suggesting that demographic and/ or behavioral processes might be decreasing genetic diversity in Trypoxylus populations. It is well known that male Trypoxylus beetles compete against one another for access to females (Hongo, 2003(Hongo, , 2007 Despite the geographic isolation of various subspecies/population, some individuals were inferred to be of admixed origins (Figures 3 and 4). Given the popularity of keeping Trypoxylus beetles as pets and importance in medicine, they have been translocated throughout Asia. Beetles kept as pets or used in breeding are often released back into the wild (Nagai, 2007). Through such actions, beetles from divergent lineages may be brought into contact and reproduce.
Translocation and release into the wild may explain the presence of a

| Taxonomic implications
Ten recognized subspecies of T. dichotomus and closely allied species T. kanamorii have been described from different parts of Asia, including the countries China, Japan, Korea, Myanmar, Thailand, Laos, and India (Adachi, 2017;Kôno, 1931;Kusui, 1976;Li et al., 2015;Linné, 1771;Nagai, 2006Nagai, , 2007Prell, 1934;Satoru, 2014). Taxonomic boundaries among these subspecies were not clearly defined in the past. We attempted to resolve the phylogenetic relationships among Trypoxylus subspecies based on 16S and COII sequences, which have provided phylogenetic resolution for delineating Coleoptera genera in past studies (Lee et al., 2015). However, 16S and COII did not resolve Trypoxylus subspecies delineations with high support.
Using broadly distributed SNP markers, subspecies were resolved in three well-supported genetic clusters that generally  genetic, environmental conditions (e.g., temperature or latitude), and/or biotic factors seen in other species (Davis et al., 2008;Gross et al., 2004). However, according to Buchalski et al. (2019),

T. dichotomus populations in Kyoto and Hokkaido (in central and
Northern Japan, corresponding to North group in this study) had significantly longer horns for a given body size, than those specimens in Taiwan (Nagai, 2007). Compared with T. d. septentrionalis, they were both smaller in body size and possessed smaller cephalic/pronotal horns and were darker in body color in male individuals. Differences between these two subspecies were indistinguishable in smaller individ-  (Nagai, 2006). The clade containing T. kanamorii is paraphyletic with respect to T. d. politus. As such the designation of "species" does not appear to be valid for T. kanamorii. Furthermore, if strictly applying a phylogenetic approach to taxonomic designations (de Queiroz & Gauthier, 1990), the recognition of clades (like that containing T. kanamorii) that result in the creation of basal paraphyletic grades does not adhere to the goal of species delineations recognizing monophyletic groupings.

| CON CLUS ION
Our study demonstrated that SLAF-seq derived markers outperformed 16S and COII sequences in beetle taxonomy and provided improved resolution of the genetic differentiation of rhinoceros beetle populations from a large part of the species' range. Phylogenetic analysis of SNPs indicated the presence of three distinct genetic groups, suggesting that the various subspecies fall into three distinct and well-supported lineages. PCA and ADMIXTURE analysis also identified three genetic clusters (North, South, West), which corresponded to their origin, suggesting that geographic factors were important in maintaining within population homogeneity and between population divergence. Further study including more detailed morphological work, increased taxon sampling, and taxonomic circumscription is needed to improve the current systematics of Trypoxylus beetles. The current study provides an important dataset, analyses, and method for genotyping (SLAF-seq) that can be used and built upon to answer unresolved questions of systematics, character evolution, chemical and immunological chemistry, and phylogeography in Trypoxylus and related taxa.