Metabarcoding reveals massive species diversity of Diptera in a subtropical ecosystem

Abstract Diptera is often considered to be the richest insect group due to its great species diversity and broad ecological versatility. However, data on dipteran diversity from subtropical ecosystems have hitherto been scarce, due to the lack of studies conducted at an appropriate large scale. We investigated the diversity and composition of Diptera communities on Tianmu Mountain, Zhejiang, China, using DNA metabarcoding technology, and evaluated their dynamic responses to the effects of slope aspect, season, and altitudinal zone. A total of 5,092 operational taxonomic units (OTUs) were discovered and tentatively assigned to 72 dipteran families, including 2 family records new for China and 30 family records new for the locality. Cecidomyiidae, Sciaridae, and Phoridae were the predominant families, representing 53.6% of total OTUs, while 52 families include >95% unidentified and presumed undescribed species. We found that the community structure of Diptera was significantly affected by aspect, seasonality (month) and elevation, with richer diversity harbored in north‐facing than south‐facing slopes, and seasonality a more profound driver of community structure and diversity than elevation. Overall, massive species diversity of Diptera communities was discovered in this subtropical ecosystem of east China. The huge diversity of potentially undescribed species only revealed by metabarcoding now requires more detailed taxonomic study, as a step toward an evolutionary integration that accumulates information on species’ geographic ranges, ecological traits, functional roles, and species interactions, and thus places the local communities in the context of the growing knowledge base of global biodiversity and its response to environmental change.


| INTRODUC TI ON
Insects are the most diverse group of invertebrates on Earth, accounting for more than 50% of all known species in terrestrial and freshwater ecosystems (Schowalter, 2016;Stork, 2018). Although insects constitute only a small proportion of the total biomass, their importance for ecosystem functions and services is profound (Albrecht et al., 2020;Bar-On et al., 2018;Kunin, 2019;Price et al., 2011;Yang & Gratton, 2014). However, widespread declines in insect diversity and abundance have raised alarm about the future of global biodiversity, after a tremendous biomass reduction was detected across continents during the past few decades, particularly in insect groups that play key roles for various ecosystem processes including pollination, natural pest control, and decomposition (Powney et al., 2019;Seibold et al., 2019;Soroye et al., 2020;Wagner, 2019). However, no net insect abundance and diversity declines were observed in long-term ecological research sites (Crossley et al., 2020;Outhwaite et al., 2020). Despite the general agreement that Anthropocene forces threaten biodiversity, evidence of wholesale declines of insects remains elusive (Outhwaite et al., 2020;Van Klink et al., 2020). There is an urgent need to develop novel strategies for biodiversity monitoring, which often rely on detailed local inventories for recognizing the species under observation (Creedy et al., 2019;Ji et al., 2020). Thus, in selecting target taxa for inventory, attention should be paid to the groups that play important roles in the structure and function of an ecosystem or geographic area (García-López et al., 2011).
The Diptera was suggested as an excellent indicator group due to their great diversity of ecological adaptations, life histories, and morphological divergence Engels et al., 2020;Yeates et al., 2007). They occur abundantly and display great species diversity with 160,000 named species worldwide, which comprises about 10% of the global described biodiversity (Marshall, 2012;Smith & Mayfield, 2015). Estimates of total Diptera species diversity have been ranging from 400,000 to 800,000 Smith & Mayfield, 2015;Srivathsan et al., 2019), but this is likely to be a great underestimate. Recent studies based on DNA barcoding of temperate and tropical ecosystems predicted there are 1.8 million species of Cecidomyiidae alone Hebert et al., 2016). In addition to their high diversity, Diptera have important ecological functions in maintaining terrestrial ecosystems, and great impact on mankind such as human health and agricultural and forestry management, or as models in the study of genetics and forensic science (Skevington & Dang, 2002). All of these roles contribute to the value of Diptera as a suitable indicator of biodiversity.
However, compared to beetles and moths which attract many insect enthusiasts, most Diptera are relatively small, dull-colored, and soft-bodied species, and thus more difficult to preserve and less attractive to study (Hebert et al., 2016;Morinière et al., 2019). Thus, while representing the potentially largest and taxonomically most challenging insect order, a relatively large proportion of the dipteran species remains unidentifiable and unnamed (Marshall, 2012), in particular in subtropical and tropical areas.
DNA barcoding overcomes some of the limitation of taxonomic identification with classical methods, and it is now widely used in species identification and biomonitoring (Hebert & Cywinska, 2003;Kress & Erickson, 2008;Morinière et al., 2019). Yet, cost limitations of traditional Sanger sequencing or high-throughput sequencing (HTS) of each individual have severely restricted the possibilities of characterizing millions of specimens required for the study of highly diverse insect groups (Gautier et al., 2013;Polz & Cavanaugh, 1998;Shokralla et al., 2014). Instead, metabarcoding using HTS on the Illumina platform provides a reliable and cost-effective method for monitoring biodiversity at large scale (Noguerales et al., 2021;Wang et al., 2018). It has proven particularly useful for the study of mixed environmental samples from Malaise traps that capture huge species diversity of Diptera (Arribas et al., 2016;Serrana et al., 2019;Shokralla et al., 2015;Voelkerding et al., 2009). Already, metabarcoding has been widely applied to the study of microbial communities of soil and animal guts, macrozoobenthos in marine and fresh water, and invertebrates from soil and Malaise traps, emerging as a promising way of advancing biodiversity research (Arribas et al., 2016;Elbrecht & Leese, 2015;Ji et al., 2013;Morinière et al., 2019;Orgiazzi et al., 2015;Zhang et al., 2016). It is an efficient method for detecting cryptic species and micro-insects, which has revealed a huge species diversity of usually neglected species-rich groups and species only known from sequence data that often are referred to as "dark taxa" (Janzen et al., 2017;Morinière et al., 2019;Page, 2016).
Here, we use metabarcoding to estimate the species diversity and distribution of Diptera in a local biodiversity hotspot, the Tianmu Mountain, located in a subtropical zone of East China at the transition of the Oriental and Palearctic regions (Holt et al., 2013;Wallace, 1958). The flora and insect fauna of this mountain range have been relatively well studied, with 924 species in 51 families of Diptera hitherto recorded (Yang et al., 2016a(Yang et al., , 2016b, out of 17,827 species in 113 families known from China (Yang et al., 2020a(Yang et al., , 2020bYang et al., 2018). Using metabarcoding based on the mitochondrial cytochrome oxidase subunit I (COI) marker, we applied HTS techniques to investigate the diversity of Diptera to answer the following questions: (i) What is the total species diversity and how diverse are local communities in the Tianmu Mountain? (ii) How does the community composition vary with environmental factors, that is, between the different slope aspects and across elevation zones and months of the year (seasonality)? The study will add to a small number of such inventories conducted to date, none of which in the subtropical zone, which will contribute to an improved knowledge of global species diversity of insect and a baseline for studying insect decline.

| Study sites
Tianmu Mountain is located in Zhejiang Province, East China (119°24′-119°27′E, 30°18′-30°21′N). The site is strongly influenced by typical seasonal monsoon climate and represents a transition from the mid-subtropics to the northern subtropics (Wang et al., 2017). The annual mean temperature was 15.8°C during the period 2004-2013, with annual precipitation of 1,390-1,870 mm and a relative humidity of 76-81% (Li et al., 2017). This site is favorable for tree growth, and natural forest cover exceeds 95%. The elevation of the highest peak of the site is 1,587 m, with a vertical distribution of the vegetation varying with elevation that consists of evergreen broadleaf forests, mixed evergreendeciduous broadleaf forests, deciduous broadleaf forests, and shrubs around the peak. The insect fauna of the region is considered to be extremely diverse and community composition to be unique (Wu & Pan, 2001).

| Sample collection
Slope aspect (slope orientation facing north or south), elevation, and month periods were selected as environmental factors in this study.  Figure S1) and collecting information was presented in Table S1. Therefore, there are totally 81 samples in this study, that is (5 elevation zones in southfacing slope + 4 elevation zones in north-facing slope) × 3 month periods × 3 Malaise traps (each trap as a biological replicate).

| Sample preparation and Illumina sequencing
For each sample, all adult dipterans were selected from the mixed invertebrates. Individuals were sorted into three size classes: <0.5, 0.5-1.5, and >1.5 cm, referred to as small, medium, and large, respectively (Elbrecht et al., 2017). Samples for bulk DNA extraction were prepared using one leg of large specimens, the thorax of medium-sized specimens that equal to the tissue of a fly of 0.5 cm, and whole bodies of small specimens (Zhang et al., 2016).
The mixed materials of each sample were dried at room temperature to remove the ethanol and then proteinase K digested overnight at 56°C in lysis buffer. DNA was obtained by a destructive extraction method using the Phenol/Chloroform technique (Green & Sambrook, 2017). PCR was conducted on the DNA pools for 418 bp of the COI barcode region using the primers III_B_F (5′. CCIGAYATRGCITTYCCICG. 3′; Shokralla et al., 2015) and Fol_degen_ rev (5′. TANACYTCNGGRTGNCCRAARAAYCA. 3′; Yu et al., 2012).
Each sample was amplified for triplicates on an ABI GeneAmp ® 9700 (Applied Biosystems, USA), following the PCR protocol of Arribas et al.

| Sequence data processing
Raw sequencing reads were demultiplexed, quality-filtered using Trimmomatic v. 0.32 (Bolger et al., 2014) and merged with FLASH v.
1.2.11 (Fast Length Adjustment of Short reads; Magoč & Salzberg, 2011) under the following criteria: (i) The reads were truncated at any site receiving an average quality score <20 over a 50-bp sliding window, discarding truncated reads shorter than 50 bp and reads containing ambiguous characters; (ii) exact barcode matching and primer mismatch of no more than 2 nucleotide were required; and (iii) sequences whose overlap being longer than 10 bp were merged according to their overlap with mismatch of no more than 2 bp (Table   S2). Only contigs with the expected length of 418 bp and unique sequences with ≥5 copies were retained (Creedy et al., 2019). The resulting high-quality sequences were clustered into operational taxonomic units (OTUs) at 95% (Hebert et al., 2016) similarity cutoff using UPARSE v. 7.1 (http://drive5.com/upars e/) with a greedy algorithm that performs chimera filtering (Edgar, 2013). The taxonomy of each COI gene sequence was analyzed with the RDP Classifier algorithm (http://rdp.cme.msu.edu/) against the NCBI nt database (see Arribas et al., 2016). For the OTUs placed to genus level, the species assignment was further confirmed by sequence similarity in the NCBI database. Those OTUs with >95% similarity to a fully identified entry were assigned with the species name of the given record.
Based on the ratio of observed number of OTUs to the number of individuals of Diptera in each sample, two outliers were replaced by average values (Table S3). After discarding low-abundance OTUs (total counts <5 reads), the BioConductor package EdgeR was used to normalize the resulting OTUs by the trimmed means of M values method (Robinson et al., 2010). OTU relative abundance in each sample was determined from the result of the normalization and applied in the subsequent analysis.

| Data analysis and measures of diversity
In order to gain a deeper understanding of the dipteran diversity of Tianmu Mountain, the proportion of OTU number exceeding the species number recorded from the local ("excess taxa") and the proportion of OTUs not identified to specific species according to NCBI database ("unidentified taxa") were calculated. The "unidentified" category includes species that may be known to science, but no relevant sequence data are available to be recognized here, or they may be species not known and only recognized here as distinct sequences, that is, constituting "dark taxa" (Page, 2016). In a few cases, a close hit was recorded to a sequence from an unidentified specimen. Rarefaction curves of the Shannon-Wiener index of each sample reached a plateau, which indicated that the amount of sequencing data is large enough to capture most of the dipteran diversity represented in the samples ( Figure S2). The distribution patterns of OTUs among different groups were visualized through Venn diagrams. Diversity indices (Richness, Shannon-Wiener diversity, and Pielou's evenness) of the total communities were calculated using the R package "vegan" (Oksanen et al., 2019;Pielou, 1966;Shannon, 1948). Community composition structure was estimated using Bray-Curtis dissimilarities between samples, and canonical analysis of principal coordinates (CAP) was performed to visualize the differences among samples across seasons and elevation zones (Lozupone & Knight, 2005). Additionally, permutational multivariate analysis of variance (PERMANOVA) was used to test the effect size and significance on the community composition of aspects and month periods (Edwards et al., 2015). Plots and calculations were performed using R v.3.6.0 (R Core Team, 2019). A nonparametric test was performed using SPSS version 20 (Chicago, USA).
The ratios of unknown taxa and median body size were negatively correlated: The smaller the average body size of a family, the higher both the ratio of "excess taxa" and "unidentified taxa" ( Figure S5).
Also remarkable was the detection of two families new to China (Rangomaramidae and Aulacigastridae) and thirty families new to Tianmu Mountain. In addition, 151 OTUs could not be identified to the family level (Table S4).

| Endemic and shared taxa
There were noteworthy overlaps in widespread OTUs among different treatments, with 54.6% OTUs shared between south-facing to June (MayJ) and July to August (JulA) showed noticeable overlap (62.0% on the south-facing slope; 43.1% on the north-facing slope; Figure 2b). All of the families were detected in these periods, except for Rangomaramidae, which was exclusive to September to October (SepO) (Figure 2b,d). Most OTUs were shared in the five elevation zones, and the endemic OTUs were richest in elevation zone A (320-500 m) and E (1,000-1,150 m) on both aspects (Figure 2c,e).

| Diversity indices
The community structure of flies in Tianmu Mountain differed significantly between the slope aspects, seasons, and elevation zones.

| Community composition
Both CAP and PERMANOVA analysis showed the dipteran community to be affected by aspect, which explained 7.4% and 8.4% of the variation, respectively (p = .001) (Table 1 and Table S6, Figure 4a).
The cluster dendrogram notably separated all samples into three month-long periods for both aspects ( Figure S6). Month (season) accounted for 28.8% and 33.5% of explained variance by CAP on the south-facing and north-facing slopes, respectively ( Figure S6a,c), while in the PERMANOVA month accounted for 21.5% of community variation (p = .001) ( Table 1). CAP analysis also suggested that elevation accounted for 32.8% and 19.4% of explained variance, respectively, for both aspects ( Figure S6b,d), while PERMANOVA results showed that elevation explained 18% of the variation in the community (p = .001) ( Table 1).
The results of CAP and PERMANOVA also indicated significant interactive effects of aspect/month (26.5% and 8.5%) and aspect/ elevation (23.8% and 7.7%) on the variation within the dipteran community (Figure 4b, c, Table 1). Additionally, the interactive effect of aspect/month/elevation explained 10.6% of the dipteran community variation (Table 1).

| Massive diversity of Diptera in subtropical ecosystem
Metabarcoding revealed an unexpectedly high level of dipteran species diversity in Tianmu Mountain. A total of 5,092 OTUs representing 72 families were detected based on HTS, equivalent to 54% of the dipteran families and 29% of the species recorded in China to date (Yang et al., 2020a(Yang et al., , 2020b, and exceeding the diversity previously recorded in Tianmu Mountain by 138% at the family level and 552% at the species level (Yang et al., 2016a(Yang et al., , 2016b. The findings indicate that existing dipteran species diversity surveys have been inadequate in spite of the great efforts invested in the former morphological studies (Wu & Pan, 2001;Yang et al., 2016aYang et al., , 2016b.
Traditional morphological taxonomic methods are not well suited for generating complete inventories of minute dipterans, while metabarcoding makes it easier to portray the complete diversity in complex Malaise samples.
However, as most OTUs could not be identified morphologically, the OTU or Barcode Index Number (BIN) (Ratnasingham & Hebert, 2007) assignments remain to be validated. OTUs in this study were clustered at the 95% similarity level, following barcoding data of Canadian insects that Diptera had mean nearest-neighbor divergences averaging 5.44 ± 0.26% (mean ± SD), with the Cecidomyiidae (5.06 ± 0.04%) and Sciaridae (4.82 ± 0.07%) showing no evidence of unusually low sequence divergences (Hebert et al., 2016).
Consequently, the 1,529 and 698 OTUs placed into Cecidomyiidae and Sciaridae are considered a good approximation of the actual species present in the sample. In fact, these threshold values are conservative, given that most metabarcoding studies of invertebrate diversity have applied a 97% similarity threshold, for example, when characterizing arthropod communities from the tropical forest canopy (Creedy et al., 2019;Zhang et al., 2016), the soil mesofauna (Arribas et al., 2016), and stream macroinvertebrate biodiversity (Serrana et al., 2019). In addition, individual barcoding and largescale integrative taxonomy of Sweden scuttle flies (Phoridae) suggested a threshold of 3% for most species of this species rich family (Hartop et al., 2021;Meier et al., 2006). Malaise traps also showed a huge BIN count of Diptera Hebert et al., 2016;Srivathsan et al., 2019). Based on these kinds of studies using samples from passive traps and metabarcoding, it is now widely accepted that the actual species diversity in Diptera in tropical and temperate regions has been greatly underestimated and the Diptera may constitute the most species-rich insect order, exceeding the Coleoptera  In part, this is explained by their small body sizes, which tends to delay species description (Stork et al., 2015) and is correlated with a greater proportion of unknown taxa and a greater "excess" over the number of known species recoded from Tianmu previously ( Figure   S5; also see Morinière et al., 2019).
While metabarcoding provides a fast, reliable, and cost-effective method for approximating species diversity and for monitoring community structures without specific taxonomic expertise (Arribas et al., 2016;Bush et al., 2017;Zhang et al., 2016), the resolution of the identification using this approach depends on the available reference databases and method of classification. Using the RDP classifier, OTUs could generally be placed at the family level, but this is fraught with uncertainties, given that existing morphospecies libraries with linked barcode sequences usable for taxonomic assignment have large gaps, in particular in tropical species (Magdalena et al., 2016;Sinniger et al., 2016), and a total of 151 OTUs in our study could not be assigned even to family level. Methods of sequencebased classification are notoriously sensitive to the algorithm and the representation in databases, and these assignments therefore can only be used as a preliminary taxonomic placement.
Only the comprehensive analysis of these groups from metabarcoding of many more sites, combined with phylogenetic studies of the major lineages and their species diversity, will refine our evolutionary hypotheses about what causes the richness at Tianmu Mountain, and in particular the apparently unrivaled species diversity in Cecidomyiidae, Sciaridae, and Phoridae. All three families are known as typical dominant groups in forest ecosystems Brown, 2005;Morinière et al., 2019;Ševčík et al., 2016).
Together with their extremely high diversity, Cecidomyiidae and Phoridae show various feeding traits, such as phytophagy, parasitoidism, saprophagy, fungivory, and predatory lifestyle, and occur in complex habitats including organic detritus, living plants, and even nests of social insects (Gagne & Moser, 2013;Marshall, 2012;Tokuda, 2012). Sciaridae are mainly phytosaprophagous or mycetophagous and are abundantly present in moist habitats ( shown for the Diptera in a Bornean rainforest that also demonstrated the overwhelming diversity of Sciaridae and Cecidomyiidae, and small-bodied lineages generally, while providing the phylogenetic framework for placement of the local assemblage relative to samples from elsewhere (Choo et al., 2017). Short metabarcoding sequences can be placed in a mitogenome tree with some confidence, in particular as these trees are increasingly densely sampled.
Detailed phylogenetic trees are needed to address critical questions about, for example, the biogeographic origin of lineages or their rate of radiation. The Tianmu Mountain study site at the boundary of the Oriental and Palaearctic zoogeographic regions also will allow the use of phylogenetic methods to establish the affinities to mostly tropical vs. mostly temperate lineages and their respective TA B L E 1 Effects of slope aspect, month, elevation, and their interaction on dipteran fauna by PERMANOVA Abbreviations: df, the degree of freedom (statistics); F. Model, F-test value; MeanSqs, quadratic mean deviation; p < .05, significant difference between groups; R 2 , the ratio of the variance of the group to the total variance that indicates the degree of interpretation of sample differences in different groups; SumsOfSqs, total variance.

F I G U R E 4
Constrained effects of slope aspect, seasonality, and elevation on dipteran diversity, by partial canonical analysis of principal coordinates using the Bray-Curtis distances. (a) Constrained to the aspects and conditioned by month periods and elevations; (b) constrained to the aspects and month periods, while conditioned by elevations; (c) constrained to the aspects and elevations, while conditioned by month periods species diversity. This type of analysis goes beyond the placement conducted with k-mer-based classifiers used here, which produces potentially over-confident placement, and in any case does not go beyond the family level for most OTUs, that is, has limited resolution of important life history or biogeographic characters that vary greatly within these diverse families.

| Higher species richness and diversity in the north-facing slope
Prior to the availability of a phylogenetic tree, the Tianmu species diversity can be studied for overall trends in OTU numbers and spatial and temporal turnover at the local site. Total species richness (gamma diversity) at a site or region will be affected by the proportion of turnover. Our study suggests that turnover of Diptera communities is strongly structured by three abiotic factors, aspect, season, and elevation. Significantly higher species richness and diversity per sampling plot were observed on the north-facing compared to the south-facing slope, presumably related to variation in sun exposure that influence environmental variables differently by altering the light, temperature, humidity, and food resources of the habitat (Bartlett et al., 2020;Giersch et al., 2017;Jeffries et al., 2006;Salgado et al., 2020;Weiss et al., 2013). In this study area of subtropical secondary broadleaf forest, the south-facing slope receives more solar radiation and has closed canopy cover with a moist forest Finally, the altered mesoclimate along the altitudinal gradient is another driving force of insect community composition (Hodkinson, 2005;Hopkins et al., 2018;Mani, 1962;Mckie et al., 2005). The activity periods of Diptera and their diversity have been found to be greatly affected by elevation, such as the delayed appearance of the fruit fly, Rhagoletis cerasi, correlated with increased elevation in Turkey (Kovanci & Kovanci, 2006), and the diversity increase of Empidinae dance flies (Empididae) along elevation gradients in the tropics (Chatelain et al., 2018). However, our results suggest that the community variation of Diptera was affected more significantly by seasonality (month periods) than elevation (Table 1, Figure S7).

| CON CLUS ION
Metabarcoding approaches can reveal trends of turnover and correlations with spatial, environmental, and phenological parameters, on the scale of entire communities, to tease apart the most important ecological determinants of alpha and beta diversity in unstudied areas. Subsequent analyses based on evolutionary placements of OTUs will refine these conclusions in regard to lineage-specific responses. This study of a subtropical ecosystem of Tianmu Mountain in eastern China demonstrates the extraordinary species and phylogenetic (family level) diversity of Diptera, particularly in the families Cecidomyiidae, Sciaridae, and Phoridae that combined represented 53.6% of total OTUs, while 52 families each contained more than 95% unknown or unidentifiable species, and many more species than recognized at these sites in morphology-based studies ("excess taxa"). The community structure of Diptera was significantly affected by aspect, seasonality, and, to a lesser extent, elevation. As the metabarcoding technology matures, an increasingly comprehensive ecological and evolutioary assessment of even the most diverse insect groups can be attempted. A key challenge is the integration of data from different sites, ecosystems, and biogeographic regions that links the information from individual site-based studies for a global perspective on insect biodiversity.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest for the publication of this study.

DATA AVA I L A B I L I T Y S TAT E M E N T
The sequencing data supporting the results of this study are available in Dryad Digital Repository (https://doi.org/10.5061/dryad.8kprr 4xq2).