Diversity investigation by application of DNA barcoding: A case study of lepidopteran insects in Xinjiang wild fruit forests, China

Abstract To investigate the species diversity of lepidopteran insects in Xinjiang wild fruit forests, establish insect community monitoring systems, and determine the local species pool, we test the applicability of DNA barcoding based on cytochrome c oxidase subunit I (COI) gene for accurate and rapid identification of insect species. From 2017 to 2019, a total of 212 samples with ambiguous morphological identification were selected for DNA barcoding analysis. Five sequence‐based methods for species delimitation (ABGD, BINs, GMYC, jMOTU, and bPTP) were conducted for comparison to traditional morphology‐based identification. In total, 2,422 samples were recorded, representing 143 species of 110 genera in 17 families in Lepidoptera. The diversity analysis showed that the richness indices for Noctuidae was the highest (54 species), and for Pterophoridae, Cossidae, Limacodidae, Lasiocampidae, Pieridae, and Lycaenidae were the lowest (all with 1 species). The Shannon–Wiener species diversity index (H′) and Pielou's evenness (J′) of lepidopteran insects first increased and then decreased across these 3 years, while the Simpson diversity index showed a trend of subtracted then added. For molecular‐based identification, 67 lepidopteran species within 61 genera in 14 families were identified through DNA barcoding. Neighbor‐joining (NJ) analysis showed that conspecific individuals were clustered together and formed monophyletic groups with a high support value, except for Lacanobia contigua (Denis & Schiffermüller, 1775) (Noctuidae: Hadeninae). Sixty‐seven morphospecies were classified into various numbers of MOTUs based on ABGD, BINs, GMYC, jMOTU, and bPTP (70, 96, 2, 71, and 71, respectively). In Xinjiang wild fruit forests, the family with the largest number of species is Noctuidae, followed by Geometridae, Crambidae, and the remaining families. The highest Shannon diversity index is observed for the family Noctuidae. Our results indicate that the distance‐based methods (ABGD and jMOTU) and character‐based method (bPTP) outperform GMYC. BINs is inclined to overestimate species diversity compared to other methods.


| INTRODUC TI ON
Xinjiang wild fruit forests is located in the eastern end of the Tianshan belt in central Asia, and is distributed mainly in the Yili River Valley region (Lin et al., 2006). It is a special broad-leaved forest comprised of a mixture of paleotemperate broad-leaved forests and northern forest meadows, and is a priority conservation ecosystem in China (Cheng et al., 2020;Xu et al., 2006). This region has high species diversity and a relatively complex community structure, and is an important ingredient of Xinjiang's biodiversity (Li et al., 2011). Due to the uniqueness of the community, the Xinjiang wild fruit forests have been included in the list of China's priority ecosystems (Yang et al., 2003), containing a natural gene pool which has extremely high genetic and species diversity. However, due to the unique local climate, geographical conditions, and characteristics of vegetation evolution, coupled with human factors including agricultural reclamation, tourism development, overgrazing, etc., this area of Xinjiang wild fruit forests has decreased sharply and this ecosystem has been seriously damaged. In particular, the self-regulation and recovery abilities of the Xinjiang wild fruit forest ecosystem are delicate (Ding, 2007;Fang et al., 2019).
Insects are an important constituent of biodiversity and play a very significant role in maintaining the structure and function of forest ecosystems (Springett, 1978). Lepidoptera is the second largest order of Insecta, which makes it an extremely important indicator in monitoring and assessing biodiversity. About 130,000 to 160,000 species of butterflies and moths are known in the world, occupying about 15% of insect diversity (Blair & Launer, 1997;McKinney, 2008). In recent years, wild fruit forests have been degenerating due to major threats outside of human disturbance.
For example, the larvae of Agrilus mali Matsumura (Coleoptera: Buprestidae) is inflicting serious damage to the trunk cortex of Malus sieversii (Ledeb.) M. Roem. (Rosales: Rosaceae), which is an endangered and key protected apple tree species in China (Cui et al., 2015;Wang, Zhang, Yang, et al., 2013;Zhang et al., 2018). Additionally, other lepidopteran pests that also extensively feed on various plants and crops in Xinjiang wild fruit forests cause serious damage to the wild fruit forest ecosystem (Hei et al., 2016;Zhang et al., 2019;Zhou, Dong, et al., 2020). Therefore, the study of the diversity of lepidopteran insects in Xinjiang wild fruit forests plays an important role in monitoring the changes in wild fruit forest ecosystem. At the same time, it is vital to explore the effects of environmental factors (e.g., temperature and precipitation) and external disturbance factors (e.g. pests, tourism, and animal husbandry) on the biodiversity of the Xinjiang wild fruit forests.
Traditional morphological classification methods have been used to describe the diversity of life on Earth (Packer et al., 2009). However, this traditional method requires a great deal of expertise and is laborious. In addition, it may be difficult to identify damaged specimens or distinguish closely related species that are morphologically similar. DNA barcoding is a complementary identification method to morphological identification, and assists in distinguishing morphological similar species (Batovska et al., 2016). In other words, DNA barcoding has the distinct advantage of being independent of morphological characteristics by using one or several short-standardized DNA regions for species identification (Hebert, Cywinska, et al., 2003;Kress et al., 2005;Ren et al., 2019;Yang et al., 2018). It has become an important scientific approach for understanding world biodiversity (Elías-Gutiérrez & León-Regagnon, 2013). Some early critics have proposed limits to DNA barcoding, attributed to the variation in standard threshold for species discrimination in some groups, the presence of nuclear mitochondrial pseudogenes (Numts), and incomplete lineage sorting (Hickerson et al., 2006;Meyer & Paulay, 2005;Wheeler, 2005). Others view this approach as an additional and useful tool for species identification (Schindel & Miller, 2005;Tautz et al., 2003) and an effective method for a variety of applications that involve the identification of species (Boissin et al., 2017;Bozorov et al., 2019;Liu et al., 2014;Yang, Zhai, et al., 2016;Yang, Landry, et al., 2016).
In this study, we employ both the traditional morphological method and DNA barcoding technology through analysis of the mitochondrial marker cytochrome c oxidase subunit I (COI) gene to determine species diversity of the Xinjiang wild fruit forests over 3 consecutive years. The species diversity was analyzed and the influence of external factors on the wild fruit forest ecosystem was explored in order to illuminate the composition and dynamic changes of the insect community in this special ecosystem. Our study not only establishes the preliminarily insect monitoring system and local species gene pool of the Xinjiang wild fruit forests but also provides a scientific basis for ecological recovery and conservation measures of the wild fruit forests in the future, as well as a basis for rational development and utilization of wild fruit forest resources today.

| Study area
The study site is located at the southern edge of the Tianshan Mountains. Samples were collected in a chessboard-type pattern on K E Y W O R D S DNA barcoding, lepidopteran insects, species diversity, species identification, Xinjiang wild fruit forests

T A X O N O M Y C L A S S I F I C A T I O N
Biodiversity ecology the north slope of the wild fruit forest of Xinyuan County in Xinjiang.

| Sample collection and identification
Samples were collected by light trapping and net catching in 25 sample plots during July from 2017 to 2019. In order to avoid the interference of subjective factors, two 20W ultraviolet lamps were used to collect samples from 22:00 to 02:00 at fixed sample collection sites in every collecting year. During the day, samples were collected in the checkerboard pattern using sweeping nets in a 100 × 100 m area. The specimens were sorted, collected information was recorded, and partial samples of each species were placed in triangular collection papers. These samples were carried back to the laboratory where pinned dry specimens were used to carry out morphological identification using available taxonomic references. Remaining specimens were preserved in 100% ethanol and stored at −20°C for molecular identification based on the mitochondrial COI gene. In this study, 212 samples with ambiguous morphological identification were selected for DNA barcoding analysis. All specimens were deposited in the Entomological Museum, Northwest A&F University (NWAFU), Yangling, Shaanxi, China.
Species richness was used for a count of species observed in a sample area. Pielou's evenness index was used for determining the evenness of species in the community (Subedi et al., 2021).
Species richness was defined as the number of species.

| DNA extraction, PCR, and sequencing
Genomic DNA extraction was carried out using one or two legs of adult specimens by the DNeasy DNA Extraction kit (TransGen Biotech, Beijing, China), following the manufacturer's protocol.
The fragment of mitochondrial COI gene was amplified using the primers LCO1490 (5′-GGCTCAACAAATCATAAAGATATTGG-3′) and HCO2198 (5′-TAAACTTCAGGGTGACCAAAAAATCA-3′) (Folmer et al., 1994). PCR reactions were performed in a total volume of 25 μl using 2 μl of DNA extract, 1 µl each of forward and reverse primer, 12.5 μl Green-Mix, and 8.5 μl ddH 2 O. The reaction cycle consisted of an initial 1 min at 94°C, followed by a preamplification step of 5 cycles of 94°C for 1 min, 94°C for 1.5 min, 45°C for 1.5 min, 72°C for 1 min, an amplification step of 30 cycles of 94°C for 1.5 min, 51°C for 1.5 min, and 72°C for 1 min with a final extension of 72°C for 5 min. PCR products were separated by electrophoresis in a 1% agarose gel, and sequencing was performed at AuGCT Biotech (Beijing, China) using the same primers as in the PCR.

| Sequence analysis
Sequences generated in this study were assigned through a similarity search against the GenBank public database (https:// www.ncbi.nlm.nih.gov/) and BOLD system (Barcode of Life Data System) (http://www.bolds ystems.org/). A reference sequence library was then constructed for each species with sequences with 98%-100% similarity (Hosein et al., 2017;Larranaga & Hormaza, 2015). Contaminated sequences were excluded in this study.

| Species diversity and abundance of lepidopteran insects
The results of the 3 years of analysis of diversity (Table 2)

| Genetic distance analysis
The overall mean genetic distance was 15.70%, and pairwise genetic distances ranged from 0% to 35.15% (Appendix S3

| Clustering analysis of NJ tree
The neighbor-joining (NJ) analyses of the 196 COI sequence dataset (Appendix S2) revealed that conspecific individuals clustered together in most cases with high bootstrap support (more than

| Species identification and delimitation
Five methods examined in our study for species delimitations are shown in Figure 3. ABGD analysis results based on JC, K2P, and pdistance showed significant DNA barcoding gaps (genetic distance thresholds were 0.01-0.03, 0.01-0.03, and 0.02-0.03, respectively), suggesting that the minimum interspecific genetic distance was F I G U R E 1 Analysis of diversity, species richness, and the proportion of total species of lepidopteran insects in Xinjiang wild fruit forests (2017-2019) TA B L E 2 Descriptive measures of analysis (Shannon-Weiner diversity index, Simpson diversity index, species richness, and Pielou evenness) calculated for each family of lepidopteran insects observed in Xinjiang wild fruit forests, as well as the overall values when data from all families were pooled together (2017-2019)

| DISCUSS ION
In this study, a total of 2,422 individuals representing 143 species of 110 genera in 17 families were recorded from Xinjiang wild fruit forests, China. We found that the lepidopteran insects in Xinjiang services (Ayres & Lombardero, 2000;Bradford et al., 2002;Kremen et al., 2007). Therefore, climate changes are considerably related to the biodiversity of lepidopteran insects and their ecological features (Wilson & Maclean, 2011). We also found that the species diversity index of lepidopteran insects in wild fruit forests in 2018 increased compared to the other 2 years, which is likely related to the increase in precipitation during that year resulting in more species diversity and an abundance of host plants that increased in the area providing sufficient food sources for lepidopteran insects. However, due to the lack of data on plant diversity changes in the sample areas during the survey, further investigations in combination with an analysis of climate and other factors are needed in the future.
Additionally, exogenous disturbances (e.g., tourism and animal husbandry) could be an important factor affecting change in community composition and the diversity index in 2019. It is speculated that the area of wild fruit forest decreased and germplasm resources continued to disappear mainly due to ice and snow damage, plant diseases, insect pests, overgrazing, and frequent human activities in the mountainous areas (Cao et al., 2016;Cui et al., 2018;Fang et al., 2018Fang et al., , 2019, resulting in the decline in plant community diversity in wild fruit forests. This in turn leads to the decline in local insect community diversity. In recent studies, the influence of extreme climate and geological disasters such as landslides that occurred frequently in the wild fruit forest and surrounding areas pose a serious threat to the ecological environment of the region Shan et al., 2021).
The existence of DNA barcode gaps between species is a strong guarantee for successful DNA barcoding for species identification (Čandek & Kuntner, 2015;Zhao et al., 2014). Ideally, DNA barcodes should be characterized by short fragments, large interspecific variation, and high identification efficiency (Kress et al., 2005;Ren & Chen, 2010;Taberlet et al., 2007). Our analysis using molecular identification based on COI barcodes indicates that the mean genetic distance between species is 10 times greater than the mean genetic distance within species, which is consistent with the "10 times rule" of DNA barcoding (Hebert, Ratnasingham, et al., 2003).
The existence of significant interspecific and intraspecific barcoding gaps proves that most species can be distinguished using these generated sequences in this study. Similarity, our NJ tree showed that the majority of the 67 barcoded species formed distinctive clusters, confirming the utility of this DNA barcoding method in lepidopteran insect surveillance in the wild fruit forests in Xinjiang. As a rapid and accurate species identification method, DNA barcoding has become more efficient when additional species and populations are included in biodiversity surveys (Sonet et al., 2019). However, small sample sizes and low level of species coverage for barcoding indicate the DNA barcode library established in this study for the local species identification is far from comprehensive. Therefore, more material and further study are needed in the future.
Identification success rate analysis is often used to evaluate the success rate of DNA barcode identification (Yang et al., 2018). How to classify species and which method should be used are commonplace dilemmas in species identification, although diverse methods were described in previous studies (Gao et al., 2021;Hausdorf & Hennig, 2010;Hu, 2019;Marshall et al., 2006;Monaghan et al., 2009;Pons et al., 2006;Sites & Marshall, 2004;Wiens, 2007). In the present study, the usefulness of DNA barcoding identification was tested using five methods based on ABGD, BINs, jMOTU, GMYC, TA B L E 3 Automatic partition produced by ABGD with three metrics (JC6 9 , K2P, and p-distance) and bPTP. Our ABGD results imply that the MOTUs of the initial partition is less than that of the recursive partition, which is closer to the classification based on morphology. This result is consistent with the theory that a recursive partition is expected to better handle heterogeneity in the dataset, while an initial partition is usually stable on a wider range of prior values and is usually closer to the morpho-based species identifications (Puillandre et al., 2012).
According to Fujisawa and Barraclough (2013), the GMYC singlethreshold model is more reliable than the multiple thresholds model which could lead to an overestimation of the number of OTUs.
However, we found that a considerably smaller number of MOTUs was produced in sGMYC than mGMYC, which is most likely due to the small sample size in our dataset . In addition, 97 OTUs were generated by the BIN system, suggesting that the BIN system tends to overestimate the number of species designated for our dataset, which is consistent with many previous studies (Song et al., 2018;Yang, Landry, et al., 2016;Zhou et al., 2019). In contrast to BINs, both jMOTU and bPTP yielded 71 MOTUs, which are very similar to ABGD designations. In general, our findings indicate that the distance-based methods (ABGD and jMOTU) and the characterbased method (bPTP) outperform GMYC. BINs is inclined to overestimate species diversity compared to other methods.
Although the lepidopteran DNA barcode database established in this study not only can provide scientific data for the prevention and control of agricultural and forestry pests in the Xinjiang wild fruit forests but it also has an important significance for monitoring changes in local species in the future. However, future applications of this approach should involve barcoding more species and adding other genetic markers that will increase the discriminatory power of this identification method. DNA barcoding could also be utilized with next-generation sequencing (NGS) to identify large numbers of species at one time (i.e., bulk samples), thereby significantly lowering the processing time involved in species identification (McCormack et al., 2013). Yet, several factors can affect the accuracy of species identification based on mitochondrial COI DNA barcodes and metabarcoding, for example, Numts, mitochondrial heteroplasmy, or phylogeography, and should be used with caution in biodiversity surveillance for some groups (Li et al., 2021).

| CON CLUS ION
In this study, 2,422 specimens belonging to 143 species of 110 genera representing 17 families were counted and recorded in Xinjiang wild fruit forests. The most abundant family was Noctuidae (54 species), followed by Geometridae (23 species) and Crambidae (13 species). The Shannon diversity index was also highest for Noctuidae in 2018 (2.5198). The majority of the 67 barcoded species formed distinctive clusters, confirming the utility of the DNA barcoding method for insect biodiversity surveillance. We establish a preliminary DNA barcode library for the local species, which is clearly not yet complete and needs to be pursued for assembling a comprehensive barcode reference library. This would then serve in monitoring insect community dynamics by using DNA barcoding as an additional tool for accurate and quick species identification. In this study, five methods, ABGD, BINs, jMOTU, GMYC, and bPTP, were used to classify species. The results of distance-based methods (ABGD and jMOTU) and the character-based method (bPTP) outperformed GMYC. BINs overestimated the species diversity compared to other methods. Our results reveal that the diversity of lepidopteran insects is highly susceptible to ecological impacts in the Xinjiang wild fruit forest ecosystem.

ACK N OWLED G M ENTS
We thank John Richard Schrock (Emporia State University, Emporia, KS, USA) for reviewing the manuscript. This research was supported by the National Natural Science Foundation of China (31772508) and the National Key Research and Development Program of China (2016YFC0501502).

CO N FLI C T O F I NTE R E S T
All authors have no conflicting interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in GenBank of NCBI at: https://www.ncbi.nlm.nih.gov, DNA sequences (GenBank accessions MZ686723-MZ686918). Appendices in the study are available in Dryad (https://doi.org/10.5061/dryad. hmgqn k9jv).