Phylogeography of Dendrolimus punctatus (Lepidoptera: Lasiocampidae): Population differentiation and last glacial maximum survival

Abstract Although the Masson pine moth, Dendrolimus punctatus, is one of the most destructive forest pest insects and is an endemic condition in China, we still do not fully understand the patterns of how its distribution range varies in response to Quaternary climatic oscillations. Here, we sequenced one maternally inherited mitochondrial gene (COI) and biparentally inherited nuclear data (ITS1 and ITS2) among 23 natural populations across the entire range of the species in China. A total of 51 mitotypes and 38 ribotypes were separately obtained using mtDNA and ITS1 data. Furthermore, significant phylogeographical structure (N ST > G ST, p < 0.01) were detected. The spatial distribution of mitotypes implied that two distinct groups existed in the species: one in the southwest distribution, including 10 locations, and the other located in the northeast region of China. It is suggested, therefore, that each group was derived from ancestors that occupied different isolated refugia during previous periods, possibly last glacial maximum. Mismatch distribution and Bayesian population dynamics analysis suggested the population size underwent sudden expansion, which is consistent with the results of ecological niche modeling. As a typical phytophagous insect, the history of population expansion was in accordance with the host plants, providing abundant food resources and habitat. Intraspecific success rate of barcoding identification was lower than interspecific ones, indicating a level of difficulty in barcoding individuals from different populations. However, it still provides an early insight into the pattern of genetic diversity within a species. OPEN RESEARCH BADGES This article has been awarded an Open Data and Open Materials. All materials and data are publicly accessible via the Open Science Framework at https://doi.org/10.5061/dryad.2df87g2. Learn more about the Open Practices badges from the Center for Open Science: https://osf.io/tvyxz/wiki.


| INTRODUC TI ON
It has been widely accepted that the distribution and genetic structure of current organism have been greatly affected by environmental factors and their historical processes (Hewitt, 2000). The oscillations of Quaternary climate changes played critical roles in multiple contraction-expansion processes, which have greatly shaped current geographical distributions and genetic differentiations of many species in temperate zones of East Asian (Hickerson et al., 2010;Qiu, Fu, & Comes, 2011). However, compared with the numerous studies focused in the Qinghai-Tibet Plateau and Mts.
Hengduan (Lei, Qu, Song, Alström, & Fjeldså, 2015;Liu et al., 2016;Wang et al., 2018), such investigations have just started in the subtropical regions of China. During the Last Glacial Maximum (LGM) period, the climate of this region became much cooler and dryer, which drove the southward migration of the warm-temperate evergreen forests as far as c.24°N (Harrison, Yu, Takahara, & Prentice, 2001;Qiu et al., 2011). Moreover, the vast continental hilly areas in the subtropical China provided potential refugia for temperate deciduous forests and boreal conifer trees Lei et al., 2015).
Previous research implied that much of the genetic variety of phytophagous insects was motivated by the diverging specialization processes onto different host plants (Peterson & Denno, 1998).
More detailed studies based on population-level variations of host plants implied that the diversification of host could have an influence on the differentiation of insects (Friberg, Schwind, & Roark, 2014;Friberg, Schwind, & Thompson, 2016;Matsubayashi, Ohshima, & Nosil, 2010;Singer, Ng, & Moore, 1991). Dendrolimus punctatus (Walker, 1985) distributed in subtropical China was adopted in our study to identify whether the genetic differentiation of host plants, caused by climate changes during the ice age, affects genetic variation of their predators, the phytophagous insects. Dendrolimus species (Lepidoptera: Lasiocampidae) are one of the most serious phytophagous pests worldwide and caused extensive forest damage (Dai et al., 2012;Zhao, Wu, Lv, Chen, & Lin, 1999;Zhang, Wang, Tan, & Li, 2003;Zhang, Li, Chen, & Zhang, 2004) (Figure 1). Dendrolimus punctatus is endemic to southern Asia and, in recent years, has become the most serious and economically damaging insect pest in the south China forests. Pinus massoniana, which is a typical warm-temperate evergreen coniferous in south mainland China, is the main host plant of D. punctatus (Cai, 1995;Hou, 1987). Based on a phylogeographic analysis, P. massoniana experienced a recent expansion of its smaller populations during the Quaternary period (Ge et al., 2012). The increasing population density, which increased rapidly after a few generations, posed a great threat to forestry production.
Consequently, great efforts were devoted to keep the size of population at a relatively low and stable level. Notably, low-density populations are difficult to detect in the field, which makes it extremely arduous to collect samples.
Here, we examined the genetic variation across the distribution areas of D. punctatus with the barcode fragment of the COI gene as well as two nuclear loci: ribosomal first internal transcribed spacer (rDNA ITS1) and second spacer (rDNA ITS2).
Two different inherited molecular markers were included in this study: mtDNA marker (COI) represented maternal inheritance; the ITS fragments represented the biparental inheritance. In addition, these three molecular markers shared barcoding sequences well and proved to be excellent methods to help explain the interspecific relationship among the sibling species of D. punctatus (Dai et al., 2012). Nevertheless, although DNA barcoding has been widely used as a concept to facilitate biological identifications at the species level (Hebert, Cywinska, Ball, & deWaard, 2003), there has been limited use related to lower categories such as subspecies and populations (Huemer & Hebert, 2011;Valade et al., 2009). These lower categories, defined by evidence of geographical, morphological, and ecological criteria, can be potential or cryptic species (DeSalle, 2006). It would be particularly desirable to test whether these widely accepted interspecific barcodes could also be adopted as population identification. Although the typical DNA barcoding was not sufficient to identify geographical clusters within species (Hajibabaei, Singer, Hebert, & Hickey, 2007), the limited geographical isolation was observed between different regions (Rach, DeSalle, Sarkar, Schierwater, & Hadrys, 2008;Zhao et al., 1999). The success rate of barcoding at lower categories largely depends on the phylogeographic structure of the taxon (Zhang et al., 2003).
Factors such as the geographic distribution of the populations, time and degree of isolation, effective population sizes, and particularly gene flow showed fundamental effects. Therefore, our present work seeks to address the following points: 1. to investigate the phylogeographical structure of the masson pine pest; 2. to test whether the demographic history of this moth corresponds to the "contraction-expansion" mode similar to its host plant; 3. to explore the potential availability of population identification using the DNA barcode.
F I G U R E 1 Photograph of Dendrolimus punctatus (Walker, 1985) (by Dr. Chong-Hui Yang)

| Specimen sampling
The specimens of D. punctatus were collected as larvae, which were gathered with tweezers during the day. The larvae of D. punctatus displayed an aggregated distribution pattern. Hence, one larva of pine caterpillar was collected every 5 m. The sampling scheme, including a total of 236 specimens from 23 localities, covered the major geographic distribution of this species (Figure 2; Table 1). The specimens were collected in the natural forests. In order to preserve DNA sequence integrity, the larvae collected in natural forests were immediately placed in 95% ethanol and ultimately stored at 4°C. Voucher specimens were deposited at the Capital Normal University, Beijing, China.

| DNA extraction, PCR amplification, and sequencing
DNA samples were isolated from the legs of adult individuals using DNeasy Blood and Tissue kit (Biomed) following the manufacturer's protocol. Sequences of mitochondrial gene COI and nuclear ITS were amplified using rTaq (Takara). The same PCR primers and amplification reaction of Dai et al. (2012) were used in our study. The amplification products were subjected to electrophoresis in a 2% (w/v) agarose gel in TAE buffer (0.04 M Tris-acetate, 0.001 M EDTA) to check whether the amplification reactions were successful. Then, the amplification products were used for sequencing, which was performed with an ABI3130 sequencer at Biomed Biotechnology Co., Ltd. All sequences have been deposited in GenBank (accession numbers: KF366914-KF367149 for COI, KF367150-KF367271 for ITS1, and KF367272-KF367452 for ITS2), and the COI barcodes were also submitted to the BOLD system (accession numbers: GBGL14669-GBGL14904).

| Phylogenetic analysis and network reconstruction
The raw DNA sequences were all checked manually and aligned with software MUSCLE (Edgar, 2004). Aligned COI sequences were translated into amino acid sequences using the MEGA program (Kumar, Tamura, & Nei, 2004). The phylogenetic relationships within D. punctatus were reconstructed using maximum likelihood (ML) inference and Bayesian inference (BI). Two specimens from species Dendrolimus superans (Butler, 1877) were used as outgroups. The best-fit model of nucleotide substitution was selected from 88 models using the Akaike information criterion (AIC) with jModel-Test 2 (Darriba, Taboada, Doallo, & Posada, 2012). The ML analysis was conducted using the program RAxML v.7.0.4 with 1,000 bootstraps conducted (Alexandros, 2006). The GTR+G model was used for all genes. The Bayesian analysis was performed using MrBayes v3.1.2 (Ronquist & Huelsenbeck, 2003). The MCMC analysis was run for 10,000,000 generations, following a burn-in series of 1,000 generations. Haplotype networks for COI sequence and each ITS dataset were constructed to better visualize the reticular relationships of D. punctatus. The median-joining (MJ) haplotype networks were constructed to draw an unrooted network with Network 5 (Fluxus Technology Ltd.) based on COI gene, as well as ITS1 and ITS2.

| Population genetic analysis
All analyses were executed separately for the COI gene and the ITS fragments. We used DnaSP 5.10 (Librado & Rozas, 2009) to assess F I G U R E 2 A map of the sampling sites and the geographic distribution of Dendrolimus punctatus mtDNA haplotypes. Pie charts show the proportion of mitotypes within each population, and colors represent different lineages. The numbers beside the circles represent population numbers listed in Table 1 population genetic diversity, including the number of sequences (N), number of haplotypes (Nh), haplotype diversity (h), and nucleotide diversity (π). The program PERMUT (available at www.pierr oton. inra.fr/genet ics/labo/Softw are/) was applied to estimate the average gene diversity within a population (H S ), total gene diversity (H T ), and between-population divergence (G ST , N ST ) with 1,000 permutations tests.

| Demographic history analysis
To evaluate the population history of D. punctatus, we calculated Tajima's D (1989) and Fu's Fs (1997) neutrality test statistics based on the mitochondrial DNA dataset and calculated the significance with 10,000 simulations (Rogers & Harpending, 1992;Schneider & Excoffier, 1999). Mismatch distribution analysis was also conducted to detect the population expansion of the D. punctatus with Arlequin V3.5 software (Excoffier & Lischer, 2010); moreover, 1,000 parametric bootstrap replicates were used to test the suitability of observed mismatch distributions to the theoretical distribution under a sudden expansion model (Rogers & Harpending, 1992). When a prominent expansion event was identified, the parameter value for the mode of the mismatch TA B L E 1 Locations of populations of Dendrolimus punctatus sampled, sample sizes (N), frequencies of mtDNA haplotypes per population, and estimates of haplotype diversity (h), and nucleotide diversity (π) for mitotypes distribution (τ) was applied to estimate time since expansion (in generations) using the equation t = τ/2u (Rogers, 1995). In this study, u was calculated as u = μ × k × g, where μ represents the number of substitutions per site per year (s s −1 y −1 ), k represents the average DNA sequence length, and g represents the generation time in years.
Bayesian skyline plots (BSPs) implemented in BEAST v2.0 were also used to evaluate the timing of the population expansion (Bouckaert et al., 2014). For both COI and ITS fragments, the program was run with three independent runs steps (30 million simulations), and we discarded the burn-in of the first 20% steps. The evolutionary models using AIC were selected for COI (GTR) and ITS1 (TVMef+I) using jModeltest2 (Darriba et al., 2012). The proposed conventional mutation rates for insect mitochondrial COI gene 2.3% per million years were used to estimate coalescent time (Brower, 1994). After multiple runs, similar results and convergence to stationarity were reached with effective sample size (ESS) of at least 400. The Bayesian skyline plot (BSP) was constructed for each data set using the program TreeAnnotator included in BEAST v 2.0 (Bouckaert et al., 2014).  (500)) was executed in ten replicates with default settings; 70% of the sites were set to train the model, and 30% of the sites were set to test the model predictions. Model performance was evaluated by both the area under the curve (AUC) of the receiver operating characteristic (ROC) plot and the binary omission rate over ten replicate runs. Area under the curve is a composite measure of model performance, which can equally weigh the omission error and commission error, and has an index of suitability between 0 and 1. The value of AUC that is >0.5 indicates the predicted results of the model are better than those predicted stochastically, and an AUC > 0.9 usually indicates excellent predictive power (Ye, Zhu, Chen, Zhang, & Bu, 2014).

| Populations identifying with distance-based barcoding methods
The distance between intraspecific variation (the DNA barcoding gap) is considered an important criterion in DNA barcoding. To explore the feasibility of populations that identify within D. punctatus, DNA barcoding gap analysis was performed based on distance-and character-based methods (Dai et al., 2012). The "best close match" (BCM) (Meier, Shiyang, Vaidya, & Ng, 2006) and the minimum distance (MD)  are two distance-based methods that proved to be effective in species identification in DNA barcoding studies. To examine whether the correct local populations could be identified with these methods, we therefore performed the two protocols with "single-sequence-omission" or "leave-one-out" simulations. Here, we removed one sequence at a time and used it as a query, with all other sequences remaining as the reference database.
We performed 500 random replications for each dataset. Moreover, several species of Dendrolimus, D. superans, D. kikuchii, and D. houi were chosen for interspecific level identification. We fully investigated success rates of barcoding identification at both interspecific and intraspecific levels with each single barcode, with their corresponding 95% confidence intervals (CIs) computed using equation (20) from Zhang et al. (2012). The BCM and MD were performed following the protocol of the program TaxonDNA and package MD (Meier et al., 2006;Zhang et al., 2012).

| Phylogenetic relationships and lineages divergence
Phylogenetic analyses were performed using the ML and BI methods. The

| Population structure analysis
Significant genetic differentiation was identified in D. punctatus samples based on both mitochondrial and nuclear sequences (Table 1).
Further examinations using permutation tests showed a prominent TA B L E 2 Estimate of nucleotide diversity (Pi), haplotype diversity (Hd), total gene diversity (H T ), average gene diversity within populations (H S ), interpopulation differentiation (G ST ), and the number of substitution types (N ST ) (mean + SE in parentheses) as indicators of mtDNA haplotypes and ITS ribotypes diversity   respectively. This suggested high genetic differentiation between these two geographic groups.

| Demographic history of D. punctatus
As previously mentioned, a star-like structure in a network analysis is generally interpreted as a sign of population expansion/rapid growth (Rogers & Harpending, 1992 Figure 6). However, the raggedness indices and observed variance (SSD) were not significantly different from the expectation that suggested strong and wide range gene flow occurred in the population demographic history. In addition, Tajima's D and Fu's Fs were also significantly negative (Table 3), which indicated that the population of D. punctatus underwent rapid expansion. The estimated time of this expansion was 0.232 Mya (Table 3). The population expansion of haplotype Clade I and Clade II were also detected based on mismatch distribution analysis, as well as the Bayesian population dynamic analysis (Table 3)

| Species distribution modeling
For this study, ecological niche modeling was used to explore the re-

| DNA barcoding in population identification
In general, a large DNA barcoding gap makes species/population dis-  Note: Related value of Clade IV was not calculated due to insufficient individuals (2 sequences). Abbreviations: g, estimate of population exponential growth rate; NS, not significant; RAG, the Harpending's raggedness index; SSD, sum of squared deviations; t1, time in million years derived from 0.0115 s s −1 y −1 ; t2, time in million years derived from 0.0177 × 10 −6 s s −1 y −1 ; τ, time in number of generations elapsed since a sudden expansion episode. *p < 0.05. **p < 0.01. NS p > 0.05.

| Genetic diversity and population structure in D. punctatus
ITS2: G ST = 0.331, N ST = 0.301), as within-population diversity was also decreased compared with mtDNA (H S = 0.595 for ITS1 and H S = 0.587 for ITS2). The primary cause of this discrepancy is due to the different modes of inheritance, maternal for mtDNA and biparental for ITS. The latter method leads to higher levels of gene flow. Compared with the protein-coding COI region, the noncoding ITS marker was more sensitive at detecting gene flow among interbreeding populations. Similar inconsistencies were shared in many other Lepidopteran insets, for example, Biston suppressaria Cheng, Jiang, Yang, et al. 2016), Apocheima cinerarius , Biston panterinaria Cheng, Jiang, Yang, et al., 2016), and Hyles euphorbiae (Mende & Hundsdoerfer, 2013).
The AMOVA analysis of the mtDNA data revealed that withinpopulation variation was greater than among-region variation.
Nevertheless, D. punctatus is split into two major genetic lineages based on the analysis of mtDNA and ITS1 (F CT = 0.127 and 0.313, separately) (Table 4). This phenomenon could be explained as re- have been reported in many pervious researches, including the Leucodioptron canorum (Li et al., 2009), Squalidus argentatus (Yang et al., 2012), and Microvelia horvathi (Ye et al., 2014).
Meanwhile, present limited gene flow between southwest and east regions was also caused by the decreased ability of long-distance dispersal by D. punctatus (Hou, 1987).   (Cai, 1995;Hou, 1987). The extensive hilly areas of subtropical China (especially between 22°N and 34°N) provide advantages to harbor temperate deciduous forests and coniferous (Yu et al., 2000). Many researchers have argued that lots of local plant species experienced long-term fragmentation and refugial survival, such as Cathaya argyrophylla (Pinaceae) and Eurycorymbus cavaleriei (Sapindaceae), and several postulating separate refugia (such as Mts Dayao, East Yungui Plateau, and Mts Nanling) in subtropical China (Qiu et al., 2011;Wang, Gao, Kang, Lowe, & Huang, 2009

| Applicability of DNA barcoding at population level
To assign unknown individuals to certain populations within a species or to identify population from DNA sequences is not a new approach (Lou & Golding, 2010). Here, we present a study in which we systematically investigated the population structure and dynamic of D. punctatus with a variety of DNA barcoding methods, based on both commonly used COI gene, and two alternatives genes, ITS1 and ITS2. The intraspecific success rate of barcoding identification was found to be significantly lower than the interspecific ones, indicating the difficulty in barcoding populations ( Figure 9). For the COI gene, among 23 populations of D. punctatus, only one single local population (pop23, HBCD) formed a monophyletic clade ( Figure 3).
Moreover, compared with the COI gene, less population structure was also found for ITS genes on the phylogenetic trees (Figure 4,5).
The results may not be surprising since the coalescence time is generally shorter than the time of different species to their most common ancestor (TMCA), therefore, no large enough variation or mutations accumulated during this relatively short time period. Furthermore, The AMOVA analysis of the mtDNA data suggested that strong gene flow existed in the D. punctatus populations, which also impaired the success rate of population assignments and resulted in no positive barcoding gaps being detected (Figure 9).
Despite the fact that the sequence information collected from DNA barcoding was not sufficient to solve population-level questions, it reflected genetic diversity among different population (Bazin, Glémin, & Galtier, 2006;Moritz & Cicero, 2004). It should be noted, however, that DNA barcoding markers have the valuable potential to investigate the historical population dynamics of D. punctatus. It would be interesting to investigate further the applicability of barcoding information in the study of genetic diversity of other species.

| CON CLUS ION
We investigated the population structure and dynamics of D. punctatus based on the genetic variation of mtDNA as well as nuclear markers. A history of postglacial recolonization of the populations and multiple refugia during the ice age of the Quaternary was inferred.
Many previous phylogeographic studies have paid close attention to the effects of very large mountainous systems such as the Mts Hengduan and Mts Qinling, but less attention was paid to the effect of the smaller mountain regions in the subtropical regions of China.
The findings of this study proved that several potential refugia and geographic barriers existed in these hilly areas. These results provide a phylogeographical hypothesis for further testing; however, given the small sample size and limited molecular markers, many low-copy nuclear genes and mitochondrial genomes are required to further examine the genomic divergence within D. punctatus.

CO N FLI C T O F I NTE R E S T
None declared.

O PE N R E S E A RCH BA D G E S
This article has been awarded an Open Data and Open Materials.