Identifying fine‐scale habitat preferences of threatened butterflies using airborne laser scanning

Light Detection And Ranging (LiDAR) is a promising remote sensing technique for ecological applications because it can quantify vegetation structure at high resolution over broad spatial extents. Using country‐wide airborne laser scanning (ALS) data, we test to what extent fine‐scale LiDAR metrics capturing low vegetation, medium‐to‐high vegetation and landscape‐scale habitat structures can explain the habitat preferences of threatened butterflies at a national extent.


| INTRODUC TI ON
Butterflies and other invertebrates have declined severely in recent decades, especially in parts of Europe where structured monitoring schemes have revealed long-term population declines (Hallmann et al., 2017;van Swaay et al., 2006). The specialized niches of many butterflies in terms of habitat and food plant requirements make them vulnerable to ongoing habitat modification and other global change drivers (Thomas et al., 2004). Butterflies are generally a wellstudied organism group; they are diverse and often bound to specific habitats and hence a very good indicator and umbrella taxon for invertebrate conservation Thomas, 2005).
Comprehensive survey efforts have especially revealed severe population declines and extinctions of specialist species, for example in the Netherlands (Bos et al., 2006;van Strien et al., 2019), Flanders (Maes & Van Dyck, 2001), Denmark (Eskildsen et al., 2015) and Great Britain (Fox et al., 2015). In the Netherlands, butterflies have declined by 50% since 1992 and over 80% since 1890(van Strien et al., 2019. The major causes of these declines have been the intensification of human land use, the modification of heterogeneous (semi-)natural landscapes and an increase in habitat fragmentation (e.g., Aguirre-Gutiérrez et al., 2017;Thomas et al., 2004;van Strien et al., 2019;van Swaay et al., 2006). Although a reduction of landscape conversion and an increase in conservation efforts have slowed down butterfly declines since 1990 (Carvalheiro et al., 2013;van Strien et al., 2016), a large part of the Dutch butterfly species remain highly vulnerable and are still declining (van Swaay, 2019;van Strien et al., 2019). This shows the urgent need of sustaining and increasing efforts to preserve butterflies and their habitats.
The preservation of habitats is of critical importance to prevent further losses and declines of butterflies and other invertebrates van Strien et al., 2019). As most invertebrates depend on specific habitat elements that provide food resources, nesting sites and shelter, understanding how the fine-scale structure and distribution of habitats determine species distributions is crucial for biodiversity science and conservation (Dennis et al., 2003(Dennis et al., , 2006Thomas, 1995). Habitat structure has also many indirect effects on invertebrates, for example by influencing microclimate, light availability and floristic composition (Aguirre-Gutiérrez et al., 2017;Davies & Asner, 2014;Müller et al., 2014). The fine-scale habitat suitability of invertebrates is typically driven by various aspects of vegetation structure, including vertical vegetation complexity (e.g., the density of specific strata), horizontal heterogeneity (e.g., canopy roughness) or the horizontal structure of vegetation at the landscape scale (e.g., the extent of edges and open spaces; Bakx et al., 2019; Davies & Asner, 2014;Glad et al., 2020;Simonson et al., 2014).
Despite many local field studies on butterfly-habitat relationships, the generality of these relationships remains unclear because quantifying vegetation structure across broad spatial extents has often been limited by the difficulty to obtain detailed, high-resolution data in a standardized, comparable and spatially contiguous way (Davies & Asner, 2014;Kissling et al., 2017;Valbuena et al., 2020). Moreover, the development of standardized and spatial contiguous variables and datasets of ecosystem height, cover and vegetation structural complexity covering broad spatial extents is only recently becoming an important focus of biodiversity science and monitoring, for example in the context of essential biodiversity variables (EBVs; Valbuena et al., 2020).
LiDAR data derived from country-wide airborne laser scanning (ALS) are also increasingly becoming available from free and open sources (Valbuena et al., 2020). LiDAR uses short-range laser pulses to measure the x,y,z-coordinates of reflective objects, often from aircrafts. As the exact timing and position of the sensor on the airplane are known, the distance to each point can be calculated and a 3D point cloud with high precision can be derived, from which a large number of vegetation structure parameters can be calculated (Bakx et al., 2019;Davies & Asner, 2014). These parameters-often referred to as LiDAR metrics-are statistical properties of the point cloud describing the mean, variability or proportions of returns for vertical strata. They can capture information on vegetation structure at a local scale (e.g., for a high-resolution grid cell or a radius around a focal observation point) or at the landscape scale (e.g., measuring habitat patches and edges based on grid cells that capture LiDAR-derived vegetation height; Bakx et al., 2019). LiDAR metrics can thus directly be used to quantify ecological niches and habitat requirements of species, for example tree lines and other linear vegetation elements in open landscapes (Lucas et al., 2019) or climatic and other environmental gradients controlled by microtopography and vegetation structure (Zellweger et al., 2019). This makes LiDAR a transformative resource for ecological studies, enabling a detailed understanding of the specific and scale-dependent habitat preferences of species across broad spatial extents, and with direct insights for management, policy and conservation (Davies & Asner, 2014;Moeslund et al., 2019;Müller & Brandl, 2009;Simonson et al., 2014;Valbuena et al., 2020). thus offers great potential for predictive habitat distribution modelling and other studies on ecological niches and invertebrate-habitat relationships.

K E Y W O R D S
active remote sensing, ecological niche, ecosystem structure, environmental heterogeneity, essential biodiversity variables, habitat suitability, insects, landscape ecology, microhabitat Few LiDAR studies have so far focussed on invertebrates (Davies & Asner, 2014). Such studies often show that species diversity increases with an increase in vegetation structural diversity and landscape heterogeneity (Davies & Asner, 2014). Moreover, they also show that individual taxa respond differently to specific habitat characteristics and that the performance of habitat models can increase when LiDAR metrics are included (Davies & Asner, 2014;Hess et al., 2013;Vierling et al., 2011). As most LiDAR studies have focussed on forests and woody habitats (Bakx et al., 2019;Davies & Asner, 2014), it remains open to what extent LiDAR can capture vegetation structure of low-stature habitats such as grasslands, dunes and wetlands. Some previous studies show promising results for measuring 3D vegetation structure in grasslands and wetlands (e.g., Alexander et al., 2015;Koma et al., 2020;Zlinszky et al., 2014). However, country-wide LiDAR surveys are often conducted in the leaf-off season to optimize terrain mapping (Reutebuch et al., 2005). As a consequence, such LiDAR data may contain little information for quantifying the vertical structure of vegetation within low-stature habitats (Alexander et al., 2015). On the other hand, measuring vegetation structure with leaf-on data in woodlands can also be challenging because laser returns might predominantly be recorded from the canopy, especially with discrete return data (Anderson et al., 2016).
Comparing the explanatory power and information content of a suite of LiDAR metrics in open and woody habitats is thus important to better understand the potential of LiDAR data for ecological research and nature management (Davies & Asner, 2014).
Here, we analyse to what extent specific LiDAR metrics derived from country-wide ALS data can explain the fine-scale habitat preferences of threatened butterflies in the Netherlands. We focus on four species that are all of conservation concern (van Swaay, 2019) and which are bound to specific habitats, representing grassland and woodland habitats in wet or dry conditions (Table 1). We build species distribution models (SDMs) with LiDAR metrics that capture the vertical complexity and horizontal heterogeneity of vegetation, and use species presence-absence data derived from a national butterfly monitoring scheme  as the response variable. We expect that (H1) LiDAR metrics reflecting low vegetation (e.g., forest understorey or grasses and herbs in open habitats) show little importance in explaining habitat preferences of butterflies because of the limitations of LiDAR data in dense forests (due to the low penetrability of the canopy) or in grasslands (due to the leaf-off acquisition of LiDAR data), (H2) metrics reflecting medium-to-high vegetation (e.g., the density or heterogeneity of shrub and tree layers) are especially important to explain habitat preferences of wood-  Note: Information on habitat preferences is derived from ecological field studies (Table S1) and butterfly presences on specific soil types (Table S2). Status on the Dutch Red List of threatened butterflies (van Swaay, 2019) reflects the categories "Endangered" (EN), "Vulnerable" (VU) and "Critically Endangered" (CR). Presence and absence reflect the number of records included in the species distribution models. The specific LiDAR metrics included in the species distribution models of each butterfly species are provided in Table 2.

| Butterfly data
We focus on four threatened butterfly species with different habitat preferences (Table 1): (1) the small pearl-bordered fritillary (Boloria selene), a specialist of wet grasslands with a low and flower-rich vegetation (Bergman et al., 2008;Bos et al., 2006;Cozzi et al., 2008; van Swaay, 2019); (2) the grayling (Hipparchia semele), a species inhabiting dry open habitats with a heterogeneous cover of bare sand, low grasses, nectar plants and scattered woody vegetation (Bos et al., 2006;van Swaay, 2019;Vanreusel et al., 2007); (3) the white admiral (Limenitis camilla), a butterfly of moist woodlands with open patches providing sunlight throughfall (Bos et al., 2006; van Swaay, 2019); and (4) the heath fritillary (Melitaea athalia), a dry woodland species which is mostly found on sheltered open spaces with a flower-rich herb vegetation near woodland edges (Bergman et al., 2008;Bos et al., 2006;van Swaay, 2019;Warren, 1987aWarren, , 1987b. Details on their specific habitat preferences as derived from (mostly local) ecological field studies are summarized in Table S1. All   wandering individuals. For the analyses, we only included absences in a 10 km buffer around presence points to account for the limited mobility of the species (Essens et al., 2017). This selection was done using QGIS 3.4 (QGIS Development Team, 2009). We further identified the soil type of each transect section-a key determinant of vegetation and thus an indirect driver of butterfly distributionsusing a national soil classification (Wösten et al., 1988). This soil classification distinguishes 21 soil types and additionally includes three classes that contain either information on land cover ("water" and "urban") or no information ("zero," i.e., information not available). We used the geospatial layer of this national soil classification (provided by Wösten et al., 2013) and included, for each butterfly species, absence points from those soil types that also hosted presence points (Table S2). We further excluded the land cover classes "water" and "urban" as well as undefined grid cells ("zero") because they do not represent specific soil types or key habitats of the focal species.
Selecting absence points that are in principle reachable for the focal species (10 km radius) and potentially suitable given the abiotic environment (soil conditions), but yet remain unoccupied, enables to identify the effect of vegetation structure on the presence-absence of the species (Zellweger et al., 2013). We note, however, that vegetation and soils might not be the only drivers of species absence, especially for rare species (e.g., M. athalia) where a low population size may not allow individuals to occupy all suitable habitats.
To reduce the spatial clustering of data points induced by the transect sampling design, we discarded presence and absence points that were located within 100-m distance from their nearest neigh-

| LiDAR data
We used LiDAR data from the third country-wide ALS campaign From the LiDAR point clouds, we derived 12 LiDAR metrics that captured the vertical complexity and horizontal heterogeneity of the vegetation (Table 2). This was done using the R package "lidR" (Roussel & Auty, 2019). Each metric was chosen to reflect vegetation structure-related habitat preferences of the focal species as reported in the ecological literature, either from field or from LiDAR studies (see Table 2). A total of six LiDAR metrics reflected the vertical complexity of vegetation. Those were directly derived from the LiDAR point cloud using a 25 m radius around each centroid (Table 2) LiDAR metrics excluding points <5 cm above ground were highly correlated (r ≥ 0.99) with LiDAR metrics including all vegetation points ( Figure S1). We therefore used the LiDAR metrics including all vegetation points in all statistical analyses below.

| Statistical analysis
We build species distribution models (SDMs) to analyse whether and how specific LiDAR metrics (Table 2) can explain the presence-absence of the four butterfly species. We used the ODMAP protocol (Zurell et al., 2020) to document the modelling objectives and decisions (Supporting Information ODMAP protocol). We carefully explored multicollinearity among the LiDAR metrics with Spearman rank correlations ( Figure S2). Metrics that showed high pairwise Spearman rank correlations (r > |0.70|) were discarded in the SDMs by first removing the metric with the largest variance inflation factor (VIF) using the function vifcor in the R package "usdm" (Naimi et al., 2014).
For conceptually related metrics that were highly collinear with each other or other metrics (e.g., open area, open patches and edge extent), we kept the metric that best reflected the ecology and habitat preferences of a specific species (compare Table S1). All metrics in the final SDMs were not strongly correlated (r < |0.70|) and had VIF < 3, as suggested for model implementation (Naimi et al., 2014).
To implement the SDMs, we used the R package "sdm" (Naimi & Araújo, 2016) which depends on the R package "stats" for GLMs, the R package "randomForest" for RF and the Java software for Maxent  Table S3), we mainly focus on the results of the RF. The RF algorithm is a machine-learning method which implicitly deals with nonlinear relationships. We build RF models using 500 decision trees and default settings (e.g., maxnodes = 20). Model calibration was performed on 100 random bootstrap subsets of 70% of the data, and predictive performance was then validated with the remaining 30% of the data in each run. Deviance D was calculated with a loss function (for binomial data) that represents the loss in predictive performance due to a suboptimal model, as implemented in the R package "sdm" (Naimi & Araújo, 2016). In addition to the RF results, we also present in the appendix the ROC curves for all SDM algorithms ( Figure S3-S5) and the response curves averaged across the three SDM algorithms ( Figure S6-S8). The latter was done using the "rcurve" function in the R package "sdm" which calculates the mean and confidence interval over the individual responses from the three fitted models (RF, GLM, Maxent).
To test our three hypotheses, we assessed whether and to what extent specific LiDAR metrics reflecting low vegetation (H1), medium-to-high vegetation (H2) and landscape-scale habitat structure (H3) can explain the presence-absence of the four butterfly species (Table 2). Specifically, we used the relative variable importance and the response curves of each metric as implemented in the R package "sdm" (mean ± SD over 100 model runs for each species) to interpret the role of LiDAR metrics in explaining butterfly habitat preferences. Relative variable importance-measured by AUC improvements of model performance due to inclusion of the focal variable-was obtained using the function "getVarImp" (Naimi & Araújo, 2016). Response curves for the RF algorithm were visualized using the function "getResponseCurve" (Naimi & Araújo, 2016

| Metrics selection
Spearman rank correlations were high (r = 0.7-0.9) between several LiDAR metric pairs ( Figure S2). For instance, total vegetation roughness and low vegetation roughness and most density metrics of adjacent strata were discarded from the models based on the VIF.

| Model performance
The RF algorithm outperformed the GLMs and Maxent models for all species in terms of AUC, TSS and D (Table S3). The ROC curves of the RF models ( Figure S3) revealed a good fit of the test data for B.  (Table S3).

| Effects of low vegetation
Low vegetation metrics were of minor importance in most SDMs (green coloured bars and lines in Figures 2 and 3).  Figure S9). The response of H. semele to vegetation density between 0.2 and 1 m (reflecting tall herbs and low shrubs) was generally weak (green coloured bars in Figures 2b and S9). For woodland butterflies, vegetation density <0.2 m in moist woodlands was unimportant for L. camilla (green coloured bar in Figure 3b), but important for M. athalia in dry woodlands (green coloured bar in Figure 3b).

| Medium-to-high vegetation
Medium-to-high vegetation metrics were of major importance for both woodland species (orange coloured bars and lines in Figure 3b,c) and for the wet grassland species B. selene (orange coloured bars and lines in Figure 2b,c). L. camilla was associated with a high density of >20 m tall trees whereas M. athalia was most strongly associated with a high 5-20 m vegetation density (orange coloured lines in Figure 3c). For the wet grassland species B. selene, vegetation height <10 m strongly increased its probability of occurrence (orange coloured line in Figure 2c).
The dry grassland species H. semele was only weakly associated with medium-to-high vegetation metrics (orange coloured bars and lines in Figure 2b,c), but the importance of vegetation height increased in coastal habitats ( Figure S9). The density of 1-5 m shrubs was unimportant in all SDMs and weakly associated with the species' probability of occurrence, but M. athalia and H. semele in inland habitats responded positively to a low density of 1-5 m tall vegetation (Figures 3 and S9).

| Landscape-scale habitat structure
Metrics related to landscape-scale habitat structure were of key importance in all SDMs, but the specific metrics partly differed among species (purple coloured bars and lines in Figures 2b,c and 3b,c)

| D ISCUSS I ON
Our LiDAR-based analysis of four threatened butterfly species in the Netherlands provides detailed insights into how vegetation structure shapes invertebrate-habitat relationships at a national extent and thus generalizes beyond local and landscape-scale studies that only analyse invertebrate habitat use at a particular study site.
Using high-quality butterfly presence-absence data derived from a national monitoring scheme, we show that landscape-level habitat structures are especially important for both grassland and woodland species and that medium-to-high vegetation structures (and to some extent low vegetation density) are crucial for woodland species.
LiDAR metrics capturing low vegetation structure were generally of minor importance across all four species.
The weak association of low vegetation LiDAR metrics with grassland butterflies confirms our hypothesis (H1) that low vegetation elements are difficult to capture with leaf-off ALS data.
Ecological field studies show the critical importance of such low vegetation elements for butterflies in grassland habitats, for example in terms of food plants for both adults and larvae (Bos et al., 2006;van Swaay, 2019 Figure S9). Heather is abundant and an important nectar source in these habitats (mainly heathlands), and a patchy cover is preferred over monotonous heather fields (Bos et al., 2006;van Swaay, 2019;Vanreusel et al., 2007).  (Bos et al., 2006).
We expected that LiDAR metrics reflecting medium-to-high vegetation (e.g., density or heterogeneity of shrub and tree layers) are especially important to explain habitat preferences of woodland butterflies (H2). This may be particularly important in woodland habitats if species differ in their vertical habitat niches and use of different vegetation strata. As a moist woodland species, the white admiral (Limenitis camilla) is thought to primarily use the forest canopy layer (Bos et al., 2006), for example as gathering places of territorial males or as resting places (Lederer, 1960). Our analysis supports the importance of the forest canopy for this species by showing that sites with >20% density of tall (>20 m high) trees are preferred, while sites where trees of this height are absent are avoided. In contrast, the heath fritillary (Melitaea athalia), a dry woodland species, flies low to the ground and needs trees to provide sheltered conditions and to support its main host plant, the parasitic cow-wheat (Melampyrum; Bos et al., 2006;Warren, 1987a). This association is reflected by a strong preference for sites with a high vegetation density of 5-20 m tall shrubs and trees as derived from LiDAR. Dense scrub vegetation is unfavourable for M. athalia and may even act as a barrier to its dispersal (Warren, 1987b), reflected in its preference of a low vegeta- with the roughness of low vegetation ( Figure S2). For H. semele, this might reflect the amount of bare sand patches in these habitats (Maes et al., 2006;van Swaay, 2019 Warren, 1978aWarren, , 1978b. Woodland edges can also provide suitable habitat features for L. camilla (e.g., sunny conditions within woodland) and shelter for inland populations of B. selene (Bos et al., 2006;van Swaay, 2019), but effects were only weakly reflected in the SDMs.
Most SDMs use macroclimate, land cover and topography to map species distributions over broad spatial extents (Guisan et al., 2017).
However, our study shows that vertical complexity (e.g., vegetation density, cover and height) and horizontal heterogeneity (e.g., vegetation openness and woodland edge extent) are key determinants of species distributions that need to be taken into account when predicting the probability of invertebrate occurrences at fine res-  Figure S2). Such habitat structures may allow these butterflies (or their larval host plants) to persist, despite general predictions of strong northward range shifts based on macroclimate . Country-wide or regional ALS datasets derived from massive LiDAR point clouds (Meijer et al., 2020) will thus provide valuable information for modelling and mapping species distributions under climate change because they provide spatially contiguous, fine-scale information to quantify microtopography, vegetation structure and microclimate (Zellweger et al., 2019).  (Alexander et al., 2015;Zlinszky et al., 2014). Additionally, full-waveform LiDAR could also improve the ability to quantify understorey vegetation in forests and woody habitats (Anderson et al., 2016). The availability of such LiDAR data over broad spatial extents would greatly enhance analyses of species distributions, ecological niches and habitat preferences of invertebrates and other taxa. Moreover, synergies with other remote sensing data such as spectral imagery or information obtained from synthetic aperture radar (SAR) could improve the quantification of habitat aspects in grasslands that are not captured by LiDAR, for example the identification of specific grasses or host plant species (Marcinkowska-Ochtyra et al., 2018), the quantification of seasonal growth dynamics in grasslands (Metz et al., 2014) or the extent of bare ground patches. Furthermore, SDMs of host plant species could be used to improve butterfly SDMs through an assessment of host plant distribution and availability, a crucial habitat factor for most butterfly species.

| CON CLUS IONS
Our study shows how the distribution of grassland and woodland butterflies depends on different aspects of vegetation structure, including the vertical variability of vegetation and the horizontal heterogeneity of vegetation and microtopography at the landscape scale. As vegetation structure is a key habitat determinant for many invertebrate species and their host plants (Dennis et al., 2003(Dennis et al., , 2006, the ability of LiDAR to quantify vegetation structure offers promising new ways to gain insights into invertebrate-habitat relationships (Davies & Asner, 2014;Moeslund et al., 2019;Zellweger et al., 2013). Such information can be beneficial for improving the management and conservation of threatened species by identifying and quantifying habitat preferences and specific habitat thresholds. Our study demonstrates that LiDAR metrics are not only informative for species inhabiting woody habitats, but also for invertebrates occurring in low-stature habitats. The efficient, scalable and distributed processing of large, multi-terabyte LiDAR datasets and the development of dedicated algorithms and software will facilitate and enhance applications of ALS data in ecology and biodiversity science (Meijer et al., 2020;Roussel et al., 2020). This offers ample new opportunities for developing ecosystem structure EBVs (Valbuena et al., 2020) and for applying LiDAR-based habitat analyses to invertebrates, not only in grasslands but also in dunes, heathlands and wetlands, and other non-forest habitats that are threatened throughout Europe and other parts of the world.

ACK N OWLED G EM ENTS
This work is part of the project "eScience infrastructure for

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13272.

DATA AVA I L A B I L I T Y S TAT E M E N T
All scripts for processing the raw LiDAR data (point clouds) into