Anticipating where are unknown aquatic insects in Europe to improve biodiversity conservation

Understanding biodiversity patterns is crucial for prioritizing future conservation efforts and reducing the current rates of biodiversity loss. However, a large proportion of species remain undescribed (i.e. unknown biodiversity), hindering our ability to conduct this task. This phenomenon, known as the ‘Linnean shortfall’, is especially relevant in highly diverse, yet endangered, taxonomic groups, such as insects. Here we explore the distributions of recently described freshwater insect species in Europe to (1) infer the potential location of unknown biodiversity hotspots and (2) determine the variables that can anticipate the distribution of unknown biodiversity.


| INTRODUC TI ON
Knowing where species occur is vital for setting priorities for biodiversity and ecosystem conservation (Whittaker et al., 2005).
Incomplete or biased information on the distribution of biodiversity limits our capacity to effectively prioritize where conservation efforts should be allocated (Hermoso, Kennard, & Linke, 2015;Meyer et al., 2015), and thus to maintain healthy ecosystems and the services they provide (Cardinale et al., 2012;Hermoso, Filipe, et al., 2015). This is an urgent task, as climate change and other anthropogenic impacts, such as habitat loss and degradation, are causing an unprecedented biodiversity loss, and many species could likely disappear even before they are collected, identified and formally described (Costello, 2015).
Even though our knowledge about the number of current species is growing, the vast majority of species are not formally described yet, at least for some lineages (i.e. 'Linnean shortfall', Brown & Lomolino, 1998). Also, for many described species, there are several knowledge gaps related to their geographical distribution (i.e. 'Wallace shortfall', Lomolino, 2004), biology or ecological requirements (Bini et al., 2006;Hortal et al., 2015). Improving the information on the distribution of biodiversity is especially urgent in the case of freshwater ecosystems as they are particularly affected by global change, even more than their terrestrial or marine counterparts (Hermoso et al., 2012;Reid et al., 2019;WWF, 2020). Compared with terrestrial invertebrates, freshwater species have smaller geographic ranges, lower dispersal abilities and higher endemism levels (Dudgeon et al., 2006). Moreover, freshwater ecosystems are very sensitive to human disturbances, mainly because they are not only receivers of disturbances (e.g. pollution or biological invasions) but also transmitters, meaning that disturbances effects are transported downstream to the whole drainage basin (Conti et al., 2014;Dudgeon et al., 2006).
Insects represent a big proportion of the world's total biodiversity and are key to ecosystem functioning because they control and maintain vital processes such as pollination, pest control and decomposition (Losey & Vaughan, 2006;Noriega et al., 2018;Schuldt & Assmann, 2010). However, around 80% of the expected insect species are formally undescribed by science (Stork, 2018) and, moreover, many other species are declining at an alarming rate (Cardoso et al., 2020;Wagner, 2020). Although taxonomists continue to describe new species, even in regions where taxonomic studies are units were defined (level 6 of HydroBASINS) and associated with a combination of a set of socioeconomic, environmental and sampling effort descriptors. A zero-inflated Poisson regression approach was used to model the richness of newly described species within each spatial unit.
Results: Nine hundred and sixty-six recently described species were found: 398 Diptera,362 Trichoptera,105 Coleoptera,66 Plecoptera,28 Ephemeroptera,3 Neuroptera, 2 Lepidoptera and 2 Odonata. The Mediterranean Basin was the region with the highest number of recently described species (74%). The richness of recently described species per spatial unit across Europe was highest at mid-elevation areas (between 400 and 1000 m), latitudes between 40 and 50° and in areas with yearly average precipitation levels of 500-1000 mm, a medium intensity of sampling effort and low population density. The percentage of protected areas in each study unit was not significantly related to the richness of recently described species. In fact, 70% of the species were found outside protected areas.

Main conclusions:
The results highlight the urgent need to concentrate conservation efforts in freshwater ecosystems located at mid-altitude areas and out of protected areas across the Mediterranean Basin. The highest number of newly described species in those areas indicates that further monitoring efforts are required to ensure the aquatic biodiversity is adequately known and managed within a context of growing human impacts in freshwater ecosystems.

K E Y W O R D S
aquatic ecosystems, biodiversity loss, conservation priorities, protected areas, species distribution, vulnerability abundant, incomplete taxonomic knowledge and declining trends are of particular concern for aquatic insects, as they occupy many trophic niches and are found in almost all freshwater ecosystems (Fenoglio et al., 2014;Múrria et al., 2018). Finally, the ecology, evolutionary biology and taxonomy remain poorly known for many groups of aquatic insects, especially for those with different larval (mostly aquatic) and adult (mostly terrestrial) habitat requirements (Dijkstra et al., 2014;Tierno de Figueroa et al., 2013).
Several reasons, including factors related to the scarce taxonomical studies, the lack of experts, low sampling efforts or the limited research funding (but see Meyer et al., 2015) may explain why in most countries there is still a large proportion of insect species to be described (Fontaine et al., 2012). To find new species, taxonomists commonly survey regions that are already known for having a high biodiversity, leaving regions that are expected to be poor in species un-explored Sánchez-Fernández, Lobo, et al., 2008;Sastre & Lobo, 2009). For example, protected areas and pristine regions tend to be more explored than areas impacted by human activities (Sastre & Lobo, 2009). Society preferences also affect the priorities in research investments and, therefore, funds are commonly devoted to studying charismatic species such as birds or mammals, while insects (less charismatic) remain largely under-studied (Troudet et al., 2017). Lepidoptera (Macrolepidoptera), Orthoptera and Odonata are exceptions, with more species listed as conservation concern than other insects, probably because of their size and vivid colouring (Leandro et al., 2017). Medically important groups, such as mosquitoes or black flies (Diptera), are also well-studied insects.
Here, the aim was to explore the distribution of recently described aquatic insect species in Europe to (1) infer the location of unknown biodiversity hotspots and (2) determine the variables that explain their distribution. It was assumed that areas where more species have been described in the last 20 years could be indicative that more species await to be described. Therefore, the results should indicate areas where more monitoring effort is still required, as their biodiversity could be higher than what we currently know. The first hypothesis was that unknown biodiversity hotspots would be found in southern Europe, that is, the Mediterranean Basin, a freshwater biodiversity hotspot (Tierno de Figueroa et al., 2013). The second hypothesis was that the location of the unknown biodiversity hotspots could be anticipated by a combination of socioeconomic, environmental and sampling effort variables. For instance, areas with less investment in research should have fewer descriptions in the time period studied than other areas, since the funds dedicated to research are low. Similarly, areas with high environmental variability (e.g. landscape heterogeneity) would show the highest number of recently described species because they harbour more habitat types (Nichols et al., 1998). Regarding sampling effort, regions that have been sampled more intensively would have more complete taxonomic inventories than other regions, and therefore the probability of new species descriptions is low. Knowing where unknown biodiversity hotspots are located will help anticipate where conservation actions need to be implemented before unknown species are lost by direct and indirect human impacts.

| Study area
The study focused on the European continent, including western Russia, Cyprus and Turkey (Figure 1), and comprised an extension of 11,324,000 km 2 across several bioclimatic regions from the Mediterranean to the Polar Artic. Despite being part of Europe, the Macaronesian islands were not included given their unique biogeographical history. For the whole study area, the level 6 of HydroBASINS (Lehner & Grill, 2013) was used as a spatial unit for summarizing the spatial information and carrying out statistical analyses. HydroBASINS portrays the watershed boundaries and subbasin delineations at a global scale (Lehner & Grill, 2013) using the Pfafstetter coding system. The level 6 was selected because larger or smaller scales of spatial units were impractical, the former would dissipate environmental and socioeconomic factors, and the latter could have increased the number of spatial units with no data. This resulted in a total of 1381 spatial units ( Figure 1).

| Species data
A database with information on species of aquatic insects described between 2000 and 2020 was compiled (see Table S1 in the supplementary materials). Subspecies or species groups were discarded. The list of monophyletic freshwater lineages in Múrria et al. (2018) was used to select the target taxonomic groups (orders and families). A first search on new described species was conducted in taxonomic and biodiversity web pages, including the Taxa and Autecology Database for Freshwater Organisms (fresh water ecolo gy.info), the Index to Organisms Names (organ ismna mes.com/query.htm), PESI (eu-nomen.eu) and the Barcode of Life Data System (bolds ystems.org). A second search was focused on specialized journals (e.g. Aquatic Insects, Braueria, Graellsia, Zookeys, Zootaxa) and order-specific web portals, such as Ephemeroptera of the world (insec ta.bio.spbu.ru/z/ Eph-spp/index.htm), Trichoptera World Checklist (entweb.sites. clems on.edu/datab ase/trich opt/index.php), Systema Dipterorum (dipte ra.org), the Chironomid home page (chiro nomid ae.net), DragonflyPix (Odonata; drago nflyp ix.com/check list_en.html) and the Plecoptera species file (Pleco ptera.Speci esFile.org). The scientific names, locality where the species was first recorded and authorship of all species described between 2000 and 2020 were retrieved from the original manuscript. In particular, the geographical coordinates of the holotype locality were preferably used, even when paratypes or other specimens were collected in other places that were usually close. When the coordinates were not available in the original manuscript, corresponding authors were contacted to get details on the locality and coordinates were retrieved using Google Maps. In addition, to ensure that all recently described species were included in the study, the database was reviewed, corrected and expanded by taxonomic experts (see details F I G U R E 1 Map showing the extent of this study and the spatial units considered by the level 6 of HydroBASINS (Lehner & Grill, 2013) and all of the recently described species (2000-2020) of aquatic insects in Europe separated by taxonomic orders.

| Potential explanatory variables
A preliminary list of socioeconomic, environmental, sampling effort and local variables that could potentially explain the distribution of recently described species such as elevation, temperature, precipitation or extent of ice sheets at the last glacial maximum, was compiled (see Table S2

| Statistical analysis
Multicollinearity between pairs of predictive variables can lead to errors when estimating the effects of predictors in the model (Alin, 2010). Therefore, a correlation matrix with all pairwise combinations of the predictive variables was checked. In those pairs with an R-square value over 0.6, only one variable was randomly selected and kept to assure that only independent variables were used in the modelling procedure. All these independent variables covered different ranges and magnitudes and were accordingly scaled. Since elevation and species richness tended to have a quadratic relationship because species richness peaks at mid-elevations (Sanders & Rahbek, 2012), models considering the elevation as a quadratic term were tested.
A Shapiro-Wilk normality test on the dependent variable (i.e. the number of described species from 2000 to 2020 per spatial unit) showed that data were not compatible with a normal distribution (pvalue <2.2e-16), likely because most of the spatial units did not have species described from 2000 to 2020.
A zero-inflated Poisson regression approach was used to model the richness of recently described species within each spatial unit (dependent variable) in front of a combination of potential explanatory variables (predictive variables). This approach assumes that the excess zeros are generated by separated processes from the richness values and zeros can be modelled independently (Long, 1997 (Long, 1997).
The zero-inflated Poisson regression models were ran using all individual non-correlated predictive variables and also all their possible combinations. The Akaike information criterion (AIC) was used to determine which combination of predictive variables better fit the distribution of the richness of described insect species between 2000 and 2020. Following this criteria, the lowest AIC models were considered the most adequate to explain the data since they had more statistical support (Burnham & Anderson, 2002). In addition, all models with an AIC increase equal to or less than seven units in relation to the model with the lowest AIC value were considered statistically significant (Burnham & Anderson, 2002;Hermoso et al., 2011).
The three orders with the highest number of species described between 2000 and 2020 (i.e. Diptera, Trichoptera and Coleoptera) were also analysed separately. The main reason for this additional analysis was because of the differential ecological features of these groups. For instance, Diptera and Coleoptera can tolerate a wide range of environmental conditions, whereas most Trichoptera require clean, cool and well-oxygenated waters (Resh & Cardé, 2009).
Furthermore, an important number of Diptera and Coleoptera are found in the ecotone between land and inland waters, a habitat known for its rich biodiversity and sensitivity to environmental changes (Ribera, 2000;Tachet et al., 2002;Resh & Cardé, 2009;Millán et al., 2014). The models for each individual order were carried out following the same process as for all orders together explained above.
The spatial analyses were conducted using ArcGIS (Environmental Systems Research Institute (ESRI), 2017) and the statistical analyses using the R programming language (R Core Team, 2021). The Hmisc package (Harrell Jr., 2021) was applied for the correlations between explanatory variables, and the pscl package (Jackman, 2020) to run the zero-inflated Poisson models. All graphics were presented using the ggplot2 package (Wickham, 2016).
TA B L E 2 Socioeconomic (SE), environmental (E), sampling effort (S) and spatial (SP) variables selected after checking for multicollinearity of a longer list of potential variables (

| Species database
The initial database included 1003 species described between 2000 and 2020. However, 37 species were discarded because the geographical coordinates of the holotype could not be obtained.

| Variables influencing the distribution of recently described species
From the 45 tested models (individual models, additive model with all pairs of non-correlated variables and two-way additive models with all possible combinations of non-correlated variables), the one with the lowest AIC was the additive model with all non-correlated variables (see Table S4 in the supplementary materials for all the remaining models). The Poisson count component of the models explaining the distribution pattern of richness of recently described species, showed that the two variables with the highest weight were elevation and number of universities, that is, higher values for elevation/number of universities are associated with higher number of descriptions (Table 3), where a big part of the descriptions peak at mid-elevation areas (400-1000 m; Figure 4). Latitude, the number of GBIF occurrences and longitude also had a significant effect explaining the dependent variable, although less important than the previous two variables, as shown by their lower standardized regression coefficients (Table 3) (Illies, 1967). Despite the fact that the Mediterranean Basin is a well-known biodiversity hotspot (Ivković & Plant, 2015;Moubayed-Breil, 2020;Myers, 1990), including aquatic insects (Bonada & Resh, 2013 environmental conditions (strong seasonal and predictable hydrological fluctuations, including dry/wet phases) and the high landscape heterogeneity in this region, which has led to a higher spatial and temporal taxonomic and functional diversity (Bonada & Resh, 2013;Múrria et al., 2020;Tonkin et al., 2017). Although the largest accumulation of new descriptions was found in the Mediterranean Basin, new species of Diptera were described all-over the studied area.
This finding shows that the discovery of new Diptera species follows a unique pattern, most likely because of their complex taxonomy, and suggest that a number future Diptera descriptions could be expected across Europe.
In agreement with the second hypothesis, socioeconomic, environmental, sampling effort and spatial variables explained the distribution of recently described species. First, the majority of the new species descriptions were found in spatial units with low population density (see Figure 4), meaning that (i) human impacts associated with highly populated areas, such as habitat degradation and fragmentation, could have reduced diversity and, therefore, led to impoverished communities (Newbold et al., 2015) and Second, a high number of new descriptions were found at midelevations, ranging between 400 and 1000 m. Low-elevation areas (below 400 m) tend to be heavily impacted by human activities, and, therefore, as explained above could host either an impoverished or well-studied biodiversity. On the other hand, high-elevation areas have been recurrently surveyed in the past Sánchez-Fernández, Lobo, et al., 2008), which could explain the low number of new descriptions at high altitudes. Finally, the habitat of mid-elevation ranges provides an ideal set of conditions to harbour a large number of species, since they have the potential to be colonized by species from both lower and higher elevations (Bertuzzo et al., 2016). This is reinforced by the refugia effect of mountain areas, because their intricate topography increases isolation with elevation (Elsen et al., 2018;Finn et al., 2011;Perrigo et al., 2020). Furthermore, the presence of aquatic insects at midelevations, which very often corresponds to mid-order sections, could also be supported by the River Continuum Concept (RCC).
The RCC postulates high alfa diversity in mid-order sections [but see  depth, flow characteristics, temperature and the complexity of the water from headwater to mid-order sections (Vannote et al., 1980). and 2020 were found outside the limits of protected areas. There are several potential explanations for this pattern. One the one hand, the acquisition of sampling permits which is usually administratively complex, therefore discouraging researchers to conduct sampling campaigns. Also, most new species in protected areas could have been already discovered because protected areas usually report better species inventories (promoted by local projects) than the surrounding areas, for example, an extensive inventory is necessary condition to those countries that have signed the Ramsar convention (Dudley, 2008). Another reason to explain the observed pattern could be that freshwater ecosystems and aquatic insects are seldom considered when conceiving the conservation plans (Ivković & Plant, 2015), and current protected areas fail to cover the distribution of freshwater biodiversity (Guareschi et al., 2015;Hermoso, Filipe, et al., 2015;Sánchez-Fernández et al., 2021). As a result, protected areas are not designed considering aquatic insects and, therefore, it is not surprising that an important part of the recently discovered species was recorded in unprotected areas (Ivković & Plant, 2015;Payo-Payo & Lobo, 2016). The design of protected areas tends to be biased towards less economically profitable regions, such as high mountainous areas, because the economical profits of farming in those areas are non-existent (Pressey, 1994;Pressey et al., 2002).
Therefore, by establishing protected areas in regions with these characteristics, managers are leaving areas that could harbour more biodiversity without protection.
Despite the conservation effort implemented in the las decades, we still need more initiatives to study and protect freshwater ecosystems. Sadly, the Iberian Peninsula is one example of the poor protection of the freshwater habitats and the diversity that they harbour (Hermoso, Filipe, et al., 2015;Sánchez-Fernández, Bilton, et al., 2008). The lack of specific legislation to protect invertebrates (including aquatic insects) and their poor representation under current policy such as the Habitats Directive (Hermoso et al., 2019) is also critical for ensuring the conservation of freshwater biodiversity (Schuldt & Assmann, 2010). Therefore, the results suggest that future biodiversity conservation plans should extend the current network of protected areas towards those that hold a high diversity of taxa currently underrepresented, and also to areas that could still hold unknown and highly vulnerable species. The designation of entomologic (micro)reserves in such areas where insect hotspots have been found could be a promising approach to conserve also unknown freshwater biodiversity. For example, this figure was used in Portugal to create (micro)reserves to protect Eurypha contentei (Insecta, Hemiptera, Cicadoidea) and through the Spanish Entomological Association (AEE: Asociación Española de Entomología) five entomologic (micro)reserves have been recently created in Spain (Galante et al., 2015).

| CON CLUS IONS
The database generated in this study will be a useful resource of information to complete freshwater biodiversity inventories in Europe, and to know where the unknown biodiversity hotspots of aquatic insects in Europe are located. Based on and assuming that new species of aquatic insects will be described in the coming years (in particular with the boost of molecular approaches), taxonomic efforts to find new species must be directed towards south and eastern European areas at mid-elevations. Future protected areas should also prioritize these areas, where freshwater biodiversity inventories are still incomplete and ecosystems suffer from heavy human impacts.

ACK N O WLE D G E M ENTS
We thank Adolfo Cordero-Rivera and KD Dijkstra for contribut-

CO N FLI C T O F I NTE R E S T S TATE M E NT
All authors declare that they have no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available in the supplementary material of this article. In addition, the database (