Capturing uncatalogued distribution records to improve conservation assessments of data‐deficient species: a case study using the glossy grass skink

Effective conservation planning is often predicated on detailed and current information about a species' geographical distribution. However, traditional sources of occurrence data (e.g., online biodiversity databases) may be insufficient for estimating the range of rare, poorly understood species that are readily misidentified. Here, we demonstrate a more holistic approach to this problem, using the poorly known glossy grass skink (Pseudemoia rawlinsoni) as a case study. We first compared the relative contribution made (to our geographical knowledge of the species) by online database records, with that of photo‐substantiated records obtained via personal communication (PC). We used ecological niche modelling (ENM) to predict the species' distribution, then performed field surveys at both historical and predicted suitable sites to further clarify its occurrence. 20% of all known records came from the PC method, which resulted in 35 new sites and increased the species' area of occupancy (AOO) by 176 km2. Most records obtained via PC came from the past decade, demonstrating that this method is more effective at elucidating the current distribution. ENM revealed that P. rawlinsoni has a disjunct range, and is mostly a low‐elevation coastal species, with the exception of suitable habitat in parts of the high‐elevation Australian Alps bioregion. The species' AOO has likely declined over recent decades owing to anthropogenic disturbance, given that 38% of the species' predicted range is now cleared agricultural land, and our field surveys failed to detect the species at 52% of historical record sites. Together, these findings provide a robust foundation of geographical knowledge on which to develop strategic conservation actions for the species.


Introduction
With future species extinction rates expected to greatly surpass background rates (Pereira et al., 2010;Barnosky et al., 2011;Ceballos, Ehrlich, & Dirzo, 2017), there remains no doubt that biological conservation is a science with urgency.In the context of this escalating conservation crisis, the IUCN Red List of Threatened Species (herein, Red List) plays a key role; it monitors biodiversity and frequently updates its listings of a species' threat status, providing a foundation for conservation decisions (Hoffmann et al., 2008;Cazalis et al., 2022;IUCN, 2022).The problem, however, is that many species remain unassessed against the Red List, or are too poorly understood to assess (Cazalis et al., 2022;IUCN, 2022).These latter species are considered 'Data Deficient'-a Red List category given to species for which a lack of biological data precludes confident assignation to a threat category (IUCN, 2022).Data-deficient species typically are either taxonomically uncertain species, are recently described, have few or very old occurrence records, or lack information on population status, geographical distribution and threats (Bland et al., 2017).Until these species are assigned a conservation status, they may be excluded from global and local conservation priorities.This would be especially problematic given that many datadeficient (DD) species are likely already threatened by extinction (Howard & Bickford, 2014;Bland et al., 2015;Meiri et al., 2017;Borgelt et al., 2022;Caetano et al., 2022).
Several Red List criteria are used to designate a threat status to species, but most criteria (criteria: A, population size reduction; C, small population size and decline; D, very small population or restricted distribution and E, quantitative analysis of extinction risk) require some form of information on population size and rates of population decline-data which are seldom available, even for relatively well-studied species (Santini et al., 2019).Consequently, most threatened species are listed under criterion B (geographical range size), because usually there is at least some occurrence data available on a species from which to derive metrics of its distribution (Roberts et al., 2016;Kass et al., 2021a).These metrics include extent of occurrence (EOO, the area contained within the shortest continuous boundary that can be drawn to encompass all records), and area of occupancy (AOO, the area within the EOO where the species is found; IUCN, 2022).Crucially, however, our distributional knowledge is incomplete for most species (i.e., the Wallacean Shortfall; Whittaker et al., 2005;Hortal et al., 2015) and this is especially true for DD species (Roberts et al., 2016;Meiri et al., 2017).Clearly, obtaining a detailed and current picture of poorly known species' distributions is essential if we are to adequately assess their risk of extinction.
The conventional and most straightforward method of examining distribution is to download occurrence records of a species from online biodiversity data repositories, such as those curated by natural history museums and government research institutions.But, geographical ranges of DD species are, virtually by definition, poorly recorded and understood (Roberts et al., 2016).This raises an important question when assessing extinction risk for DD species: how confident can we be in the distributional picture portrayed by records from online databases?There are two key problems with these databases: (1) they often lack photographic evidence of the organism, preventing us from querying its identification, and (2) they do not capture unreported observations of the species made by naturalists, which could account for a non-trivial proportion of the collective knowledge of a species' distribution.
It may be possible to rapidly and significantly increase our knowledge of a DD species' distribution by searching for photographs (which represent undocumented occurrences) of the focal species on the internet and via personal communications (PCs) with naturalists (e.g., Farquhar, Pili, & Russell, 2022).This method of data acquisition falls under the label of citizen science, wherein citizensincluding, but not limited to, naturalistsparticipate in biodiversity monitoring to generate scientifically useful occurrence data (Silvertown, 2009;Bonney et al., 2014;Brown & Williams, 2019).Models of species' distributions (e.g., ecological niche models; ENMs) that include citizen science data often make predictions that are congruent with models using more professional data (e.g., Souther, Randall, & Lyndon, 2021).Furthermore, there is evidence that combining these multiple sources of occurrence data can significantly improve predictions of species distributions (Grabow et al., 2022).Thus, integrating citizen science data may be useful for accurately predicting the environmental constraints on species' distributions and estimating range sizes and declines (Schmeller et al., 2009;van Strien, van Swaay, & Termaat, 2013;Maes et al., 2015;McKinley et al., 2017).
One Australian lizardthe glossy grass skink (Pseudemoia rawlinsoni)aptly exemplifies the DD problem; it is a small (adult snout-vent length 65 mm), cryptic lizard which is likely often misidentified with morphologically similar species.It occurs in densely vegetated microhabitats, typically those found near creeks, swamps, lakes and saltmarshes (Robertson & Coventry, 2019;Wilson & Swan, 2021).Individual Australian States have listed the species as endangered (in Victoria), vulnerable (in South Australia) and rare (in Tasmania).Yet, it was considered DD during the most recent Red List assessment (Gillespie et al., 2018;Chapple et al., 2019) because very little about its biology and ecology, given its surprising lack of research attention.It has a relatively large EOO, spanning southern Victoria, southeastern South Australia, south-eastern New South Wales, Australian Capital Territory (ACT) and north-eastern Tasmania (Chapple et al., 2019;Wilson & Swan, 2021).Given this large range, there likely exists many opportunistic observations of the species made by naturalists, which have not been formally recorded and are therefore inaccessible.Despite its large range, currently available records suggest that the species is highly patchily distributed across this range, with a relatively small AOO of 732 km 2 (Chapple et al., 2019).This small, fragmented AOO could be symptomatic of a decline.Indeed, it has been hypothesised that the species' AOO is declining, owing to the reduced quality and extent of swampy habitats driven by agricultural and urban developments (Gillespie et al., 2018;Chapple et al., 2019;Robertson & Coventry, 2019).While this may be the case, it is unclear whether gaps in the species' distribution reflect, at least in part, a lack of survey effort in these regions, or the species being mistaken with similar looking species, such as the eastern three-lined skink (Acritoscincus duperreyi) or other members of the Pseudemoia genus.
In the present study, our main objective is to offer a strategic framework to overcome uncertainty regarding the geographical distribution of a poorly known, DD species.First, we explore the relative contribution of accessible database records (i.e., online) and PC records (i.e., unpublished 'grey data') to our knowledge of the AOO of Pseudemoia rawlinsoni.In doing so, we provide the most comprehensive and current picture of the species' AOO to date.Second, we use ENM to predict the spatial occurrence of suitable habitat within the species' range, and explore the factors that may be driving its patchy distribution.Finally, we performed field surveys with the objective of discovering the species at new locations in south-west Victoria (where the species' occurrence is most uncertain), and surveying at historical record locations to determine the proportion of record sites that still contain extant populations.Together, these multiple lines of inquiry proved an updated understanding of (i) the species' geographical range, by identifying its AOO and suitable habitat, and (ii) its resilience to anthropogenic disturbance, by identifying the degree to which it has persisted at historical record sites.

Collecting occurrence records
We collected records of Pseudemoia rawlinsoni using two different methods.First, we downloaded all the publicly available occurrence records of P. rawlinsoni from the Atlas of Living Australia, ACT Wildlife Atlas, BioNet Atlas, Natural Values Atlas and Victorian Biodiversity Atlas.We then supplemented this publicly accessible online dataset with a second dataset compiled from records obtained via PCs.To build this second dataset, we searched for the keywords 'Pseudemoia rawlinsoni' and 'glossy grass skink' on Google Images, Facebook, Instagram and iNaturalist to find photographs of the species and, where possible, we contacted the photographer to obtain the relevant spatial and temporal data (coordinates, date and time of occurrence, coordinate uncertainty in metres).Because P. rawlinsoni is listed as endangered in Victoria, records of the species on iNaturalist are generalized to ~30 km, and hence, the exact location was not publicly available.To maximise spatial accuracy of iNaturalist-derived data, we directly communicated with the photographer.Thus, in this study, we consider records obtained from iNaturalist as obtained via PCs.We suspected that the species is readily misidentified with similar skink species, particularly Acritoscincus duperreyi, Pseudemoia pagenstecheri and Pseudemoia entrecasteauxii therefore, we searched the aforementioned websites using these binomial names as keywords to locate additional P. rawlinsoni observations by validating the identification of each photographed specimen.We also contacted people in our personal network (e.g., citizens, amateur naturalists), because these individuals had records of the species which they had not made publicly available.Spatially suspect records were removed from the dataset, such as records in the ocean or records with high coordinate uncertainty (>5 km).The complete set of all occurrence records were used for calculating range size metrics (EOO and AOO).

Ecological niche modelling
We predicted the distribution of P. rawlinsoni using the Maximum Entopy ENM algorithm (herein 'MaxEnt'; Phillips, Anderson, & Schapire, 2006).MaxEnt is a regressionbased modelling approach used to estimate habitat suitability for a species across a landscape, by contrasting environmental conditions of occupied sites (i.e., presence locations) with those of potentially unoccupied sites (i.e., locations of the environmental background).

Defining the modelling domain
How the modelling domain is defined has considerable influence on model predictions (Elith et al., 2011;Merow, Smith, & Silander Jr, 2013).Our primary aim was to model habitat suitability within a geographical area approximating the potential range of P. rawlinsoni to understand why its distribution is fragmented.We therefore set the modelling background as the extent of all biogeographic regions (Interim Biogeographic Regionalisation for Australia (IBRA) v7.0) occupied or likely to be occupied by P. rawlinsoni.We considered that unoccupied neighbouring bioregions may be ecologically relevant areas to include because some occurrence records exist very close to their boundaries, indicating that these neighbouring landscapes are potentially reachable by the species.To capture such regions, we set a 30 km buffer around each occurrence point (to account for dispersal potential and uncertainty in site occupancy) and selected all bioregions that intersected with these point buffers.The outer boundary of all combined bioregions was then used as the extent of sampling (Figure S1a).

Environmental variables
We pre-selected a set of 27 candidate predictor variables likely affecting the species' distribution.These variables relate to climate, topography, hydrology, soil moisture, vegetation and human influence (see Table S1 for the full set of variables).We primarily included bioclimatic variables because, as an ectotherm, the distribution of P. rawlinsoni is likely influenced by temperature and precipitation regimes, as is the case for many ectothermic species (Ara ujo & Guisan, 2006;Powney et al., 2010;Dervo et al., 2016).The species is thought to be associated with swamps, lakes, watercourses or otherwise moist habitats (Robertson & Coventry, 2019;Wilson & Swan, 2021), hence, we included soil moisture, topographic wetness and Euclidean distance to both waterbodies and watercourses as candidate predictor variables.All the environmental layers were downloaded at, or resampled to, a spatial resolution of ~1 km 2 .All analyses were performed in the R statistical environment (R Development Core Team, 2022) unless stated otherwise.
To reduce multicollinearity between variables, we visually inspected the relationship of all predictor variables in principle component (PC) environmental space (Figure S2).We also calculated Pearson correlation coefficients between each pair of variables.And as a final measure, we calculated the variance inflation factor (VIF) of all predictor variables using the 'vifstep' function of the usdm package for R (Naimi et al., 2014) and retained those with VIF < 5.In the end, we retained the variable with greater ecological relevance from each set of highly correlated variables (q ≥ 0.7; Dormann et al., 2013): maximum temperature of the warmest month (Bio5), minimum temperature of the coldest month (Bio6), precipitation seasonality (Bio15), Normalized Difference Vegetation Index (NDVI), Thornthwaite Aridity Index, topographic wetness and total available soil moisture (Figure S3).

Accounting for sampling bias and spatial nonindependence
To account for uneven sampling effort across the study region and spatial non-independence of species occurrence records, we first removed duplicate records and reduced spatial clustering by thinning P. rawlinsoni occurrence records to 1 km using the 'thin' function in the R package spThin (Aiello-Lammens et al., 2015; see Figure S4a,b for more detail).This resulted in 230 occurrence points used for modelling.Second, we sampled background points from a spatially weighted raster file (i.e., a 'bias file', Figure S1b).Here, we estimated the sampling effort of species occurrence records by smoothing the cleaned occurrence data using gaussian kernel density algorithm (bandwidth = 50 km).We then sampled 10 000 background points (Phillips & Dud ık, 2008) across the modelling background weighted against the probability density values of the smoothed occurrence data.Therefore, selection of background points was subjected to an approximately similar sampling bias as the occurrence data (Phillips & Dud ık, 2008;Kramer-Schadt et al., 2013).

Model fine-tuning and evaluation
MaxEnt model fine-tuning was achieved using the 'ENMevaluate' function in the ENMeval package (v2.0.3;Muscarella et al., 2014;Kass et al., 2021b), which runs MaxEnt across various combinations of feature classes and values of the regularisation multiplier (described below) to enable comparisons of model performance.We then selected the MaxEnt settings that balanced model fit and predictive ability.
Models were built with regularization multiplier options set to 1, 2 and 3 and with five different feature classes combinations (1.Linear; 2. Quadratic; 3. Linear + Quadratic; 4. Linear + Quadratic + Hinge; 5. Hinge); this resulted in 15 parameter combinations (and hence 15 candidate models).Feature classes determine the flexibility of the shape of the relationship of covariates to response.The regularization multiplier dictates the penalty for model complexity, with lower values resulting in a model more fitted to the presences (over-fitted), and higher values in more spread out (under-fitted) model predictions (Phillips & Dud ık, 2008;Elith et al., 2011;Merow, Smith, & Silander Jr, 2013).Notably, as one of our aims was to predict new areas of potential occurrence, we did not use values of the regularization multiplier lower than 1.Together, these settings influence model complexity by controlling the number and complexity of features in the final output.See Figure S5 for a visualization of the complexity of the 15 alternative MaxEnt models.
We evaluated models using a checkerboard2 crossvalidation approach (Kass et al., 2021b).This approach employs a masked geographically structured data partitioning sampling to achieve (quasi) spatially independent partitions of presence and background data (sensu Radosavljevic & Anderson, 2014).Here, the modelling domain is aggregated into two checkerboard-like grids; calibration data are first partitioned into two groups based on a 'fine-grain' checkerboard, then each group is further subdivided into two groups based on a 'coarse-grain' checkerboard, ultimately yielding 4 partitions (i.e., k-folds) with an approximately equal sample in geographic (and presumably environmental) space.For each parameter combination, five MaxEnt models were run iteratively: four models were run using three of the four (i.e., kÀ1) partitions of the calibration dataset for training and the remaining partition for testing; a fifth model (herein 'full model') was run using all records (unpartitioned) for both model training and testing.
All the models were evaluated using the metrics AICc, AUC TEST , AUC DIFF , OR mtp , OM 10 and Boyce Index (see Table S2 for definitions and explanations of these metrics).We then summarized the evaluation metric scores across five models of each parameter combination.We visualized the predictions of each parameter combination's full model, and we qualitatively evaluated their plausibility against our knowledge of the species' ecology and of the region's biogeography (following Petitpierre et al., 2017).We ultimately chose the parameter combination that yielded MaxEnt models with the lowest AIC score (corrected for small sample sizes; AICc; Warren & Seifert, 2011), that performed well for multiple other metrics, was relatively parsimonious compared to other models (Merow, Smith, & Silander Jr, 2013;Coelho, Diniz-Filho, & Rangel, 2019), and yielded ecologically plausible predictions.
After identifying the optimal parameter combination, the significance of its evaluation metrics' scores was evaluated by comparing them to the distribution of the evaluation metrics' scores of 99 null models.Null models were fitted with the same 10 000 background points used in empirical models, and random 'occurrences' sampled from the modelling domain.The random occurrences were equal to the number of occurrence data used to train the empirical models (Bohl, Kass, & Anderson, 2019).The performance of null models was then evaluated against the occurrence and background data used to train empirical models.We found that the optimal model performed significantly better than all null model distributions (P < 0.05) in terms of AUC and Boyce Index, increasing confidence in the evaluation metrics of the models of the optimal parameter combination.
We then constructed response curves of predictor variables from the optimal parameter combination's full model to investigate how environmental predictors affected the predicted probability of habitat suitability, by changing the variable of interest while holding the other variables at their mean.
We recognize that different algorithms fitted with the exact same data can produce variable predictions (Araujo and New, 2007).However, MaxEnt (a regression-based model) has been consistently shown to perform better than most species distribution model algorithms (Elith et al., 2011;Valavi et al., 2022).Nonetheless, we still inspected this caveat by modelling the species' ecological niche using surface range envelopes (similar to the Bioclim algorithm).Notably, MaxEnt models produced statistically better and more plausible predictions (see surface range envelope predictions in Figure S6).

Binary transformation of model prediction
Using the optimal model, we delineated suitable versus unsuitable areas by converting continuous values into a binarized map.In doing so, we used the 'SS=SP' (sensitivity = specificity) criterion (as suggested by Guillera-Arroita et al., 2015), which identifies a threshold at which occurrences are just as likely to be wrongly predicted as background points (using the 'Optimal.thresholds'function in the PresenceAbsence package; Freeman & Moisen, 2008).This criterion resulted in a threshold value of 0.55.We then intersected this binary output layer in QGIS with land cover categories obtained from the Dynamic Land Cover Dataset (DLCDv2.1;Lymburner et al., 2011), to calculate the area of different land cover categories within the species' modelled distribution.

Field surveys for Pseudemoia rawlinsoni
To determine whether P. rawlinsoni has become extirpated from historical sites, we performed visual encounter surveys across the Victorian distribution of the species in locations (n = 25) at which the species has not been detected for over 15 years (i.e., approximately three generations for this species) (Wotherspoon, Chapple and Farquhar, unpublished data).Records of the species are particularly scarce in southwest Victoria, despite the occurrence of numerous potentially high-quality coastal wetlands in this region (as revealed by MaxEnt model predictions).We, therefore, performed additional surveys at these sites with the objective of clarifying the species' presence in this uncertain part of its range (Figure 1).Surveys involved two observers walking randomly within a 1.5 ha area over two hours.Each observer visually scanned the habitat for lizards using both the unaided-eye and binoculars (Vortex Diamondback HD 10 9 42) to spot individuals from a distance (thus confirming identification before lizards flee).If P. rawlinsoni was detected, surveys were stopped and the species was considered present at that site.Otherwise, sites were fully surveyed twice, with the second survey occurring on a separate day.Absence was inferred after failing to detect the species following two independent full surveys (i.e., after a total search effort of 8 person-hours in a 1.5 ha area).All the surveys took place within the species' activity season, between October 2021 and February 2022, and were performed under weather conditions appropriate for skink activity (see Table S4 for survey data).

Compilation of occurrence records
Considering only publicly accessible databases, there are 421 records of Pseudemoia rawlinsoni, and its EOO is 359 845 km 2 and AOO is 772 km 2 .Including 103 new records obtained via PCs increases the species' AOO to 948 km 2 .That is, a net increase of 176 km 2 and 35 new sites simply by collating unpublished observations of the species (Table 1).There was no increase in EOO given we did not find any new occurrence localities outside the known range of the species.Most records obtained from PCs occurred over the last decade (2011-2021), demonstrating that this method is more effective at elucidating the species' distribution in the present/recent past (Figure 2).Noteworthy new occurrence localities discovered via PCs include Discovery Bay Coastal Park (first record in far south-west Victoria), Bryan Swamp (first record in the southern Grampians) and several sites in Melbourne's south-eastern suburbs (e.g., Dingley, Waterways, Seaford, Cranbourne; Figure 3).

Model performance
The optimal model (i.e., with the lowest AICc value) performed well with an AUC of 0.71 and included a regularization multiplier of 1 and LQ feature class.The difference between the training and testing AUC (i.e., AUC DIFF ) was very low (0.018), indicating that the model predicts new data well and is not overfitted (Warren & Seifert, 2011).Model Figure 1 Map of field survey sites for Pseudemoia rawlinsoni.Green squares are sites at which the species had not been recorded for 15 years (i.e., approximately three generations) or more.Red squares are sites at which the species has never been detected; these sites were predicted as potentially suitable and were surveyed with the intention of clarifying the species' distribution in southwest Victoria, but we did not detect the species at these sites.
performance metrics of the candidate models (including the null model) are provided in Table S3.The geographical prediction of the optimal model is shown in Figure 4.The addition of PC records did not increase the species' extent of occurrence (EOO, minimum convex hull), but the discovery of an additional 35 occupied sites within the species' known range increased its AOO (2 9 2 km grid cells) by 176 km 2 .

Predicted distribution of Pseudemoia rawlinsoni
The total area of predicted suitable habitat for the species is disjunct across a 42 282 km 2 area, of which 58% is natural land cover (intact or mildly disturbed), 38% is some form of agricultural land (moderately to highly disturbed) and 4% is urban (habitat completely removed).MaxEnt modelling identified several noteworthy locations of potentially suitable habitat at which the species has never been recorded.These include Falls Creek (a high-elevation alpine plateau in NE Victoria), on Flinders Island (in the Bass Strait, northeast of the island of Tasmania) and in the Central Plateau region of Tasmania (Figure 4).
The most important predictors of habitat suitability for P. rawlinsoni (Figure S7) are minimum temperature of the coldest month (bio6; ~55% contribution) and NDVI (~25% contribution).The model predicted high suitability for both high and low (but not intermediate) values of bio6.This is because the species mostly occurs at low elevations where bio6 values are relatively high (warm), but the occurrence of alpine populations (with very low winter temperatures) results in the prediction of high suitability at these very low values of bio6.Its high probability of occurrence at intermediate NDVI values suggests that the species generally requires at least some vegetation, but is absent from areas with high-NDVI values such as dense forests.The species tends to avoid areas with high precipitation seasonality (bio15) and high summer temperatures (bio5), though each of these variables contributed only ~10% to the model.Soil moisture, aridity and topographic wetness made minimal contributions to the model (each <5%).Response curves of the environmental variables from the Maxent model are provided in Figure S8.

Field surveys for Pseudemoia rawlinsoni
We detected P. rawlinsoni at 13 of the 25 historical record sites (52% of sites).The species was found in a broad range of habitats from remnant coastal saltmarsh, alpine swamps and Xanthorrhoea fields, to anthropogenically disturbed sites dominated by invasive plant species and surrounded by infrastructure (Table S4).Of the 13 sites at which P. rawlinsoni was detected, 8 sites (61.5%) yielded the species in the first site visit, and detections were often (76.9%) made in less than 60 min, with a mean time-until-detection of 34.67 min (Table S5).We did not detect P. rawlinsoni at any of the apparently suitable (as predicted by MaxEnt modelling) wetlands in south-west Victoria (Table S6).

Discussion
This study addresses a key question in the field of conservation biology: how do we improve conservation assessments for DD species?We show here that, in the case of the poorly known glossy grass skink, much can be learned by combining multiple sources of occurrence data with ENMs to better understand distribution.This is then used to improve accuracy when estimating the overlap between the species' habitat and anthropogenically disturbed landcovers.By combining this information with field surveys, we have found that almost half of the species' distribution has been removed or is highly degraded, implicating a substantial decline in AOO of the species in the recent past.Together, these findings provide a robust foundation of geographical knowledge on which to develop strategic conservation actions for the species.

The untapped value of opportunistic occurrence data
In the context of accelerating urbanization and habitat removal, conservation scientists require a comprehensive understanding of species' distributions in order to properly estimate extinction risks and form strategic conservation interventions (Maes et al., 2015;Roberts et al., 2016;Kass et al., 2021a).Our currently limited geographical data for small, rare and poorly understood species precludes us from making any strong inference about their threat status (Roberts et al., 2016;Meiri et al., 2017).The traditional method of relying heavily on data sourced from institutional databases may offer an incomplete distributional picture for these DD species.As the present case study has shown, integrating opportunistic occurrence records can considerably improve our knowledge of a species' occurrence, especially in the recent past (Figure 2).Institutional database records will remain invaluable, as they often provide a greater number of records compared with citizen science data; such is the case in this study (Table 1) and in others (e.g., Souther, Randall, & Lyndon, 2021).Yet, the quality and reliability of the two are often equivalent (e.g., Kosmala et al., 2016;Johnston et al., 2020).So long as we are careful in selecting the appropriate research questions we are applying these data to, and correct for biases, opportunistic citizen science observations are invaluable data for those investigating patterns of occurrence in organisms (van Strien, van Swaay, & Termaat, 2013;McKinley et al., 2017;Brown & Williams, 2019).
The capacity for record vetting in citizen science is a further clear advantage.A clear photograph of an organism is incontrovertible evidence of its occurrence at a site, hence, these records can be safely included in species distribution models (provided that the accompanying location data are reliable).Conversely, investigators relying solely on nonphotographic occurrence data may be understandably sceptical of including outlying records in their analyses.The method of seeking publicly inaccessible occurrence records is not without its drawbacks; scanning the internet for photographs of a rare species, then contacting the owners for associated occurrence data, may require considerable time investment from the researcher.Nevertheless, we urge that this can be a worthwhile investment.For instance, our PC efforts increased the AOO of P. rawlinsoni by 176 km 2 and discovered 35 new occurrence sites (Table 1; Figure 3); this is a major improvement to the knowledge of a rare species' distribution.

Consequences of misidentifying a threatened species
Through our data collection activities, we observed frequent misidentification of P. rawlinsoni by both casual wildlife observers and experienced ecologists.It is a small scincid lizard which, from a distance, is almost indistinguishable from several other co-occurring skink species.The problem is compounded by the fact that P. rawlinsoni almost exclusively occurs near dense vegetation, and its eagerness to flee to such vegetation when approached makes close inspection of diagnostic features difficult.Pseudemoia rawlinsoni is most frequently confused with Acritoscincus duperreyi, given their similar general colour pattern and broadly overlapping geographical range.There are significant conservation concerns that can arise from this mistaken identity.For example, Pseudemoia rawlinsoni is listed as endangered under the Victorian Flora and Fauna Guarantee Act 1988, and thus, the species (and its habitat) is afforded theoretical conservation benefits of legislative protection.However, if observations of P. rawlinsoni are mistakenly recorded (or dismissed entirely) as Acritoscincus duperreyi (a common species), then we are at risk of underestimating the range of a threatened species.
A further corollary of misidentification is that sites containing populations of P. rawlinsoni may be mismanaged or destroyed for human developments under the erroneous assumption that the species is absent from a site.The problem, however, is that most environmental consultants are unlikely to be taxon specialists that are familiar with the often-nuanced diagnostic features in some taxonomic groups such as skinks.Yet, such generalists are often tasked with advising their clients on whether a threatened species occurs at a site targeted for modification or destruction (Clemann, 2015).Therefore, if ecologists can become familiar with the identifying features of these two species when performing pre-impact surveys, we can minimise the risk of P. rawlinsoni populations being mismanaged or extirpated as a consequence of incompetence.To that end, we have composed a series of diagrams comparing the key diagnostic features of the two species which can be quickly assessed in the field to help with identification (Figure S9).

The predicted distribution of P. rawlinsoni habitat
Habitat suitability modelling, using our comprehensive dataset of P. rawlinsoni occurrences, offers novel insights into this poorly known species' potential geographical distribution.The species is largely restricted to coastal areas of southeast Australia, with low-predicted habitat suitability towards more inland (i.e., more arid) regions.Furthermore, regions of predicted suitable habitat are disjunct across the species' distribution (Figure 4).There are several important inferences relating to the species' ecology and conservation that we may deduce from these finding.
First, it suggests that climatic factors have some degree of influence on the species' range limits.This may be either through the direct effect of climate on the species' physiological tolerances (Kearney & Porter, 2004;Kearney et al., 2018), or the influence of climate on the distribution of critical vegetation for the species (i.e., moist, densely vegetated microhabitats), which may then indirectly influence the species' range limits (sensu Woods, Dillon, & Pincebourde, 2015).Future investigators should elucidate the specific drivers of the species' climatic range limits by investigating its ecophysiology.We hypothesise that P. rawlinsoni is intolerant of high-desiccation rates and/or hightemperature regimes, which may explain why it appears to be mostly restricted to the cool and moist microenvironments offered by swamp ecosystems and associated vegetation in the coastal areas.Indeed, susceptibility to desiccation is a factor known to constrain activity times and geographical ranges of lizards (Heatwole & Veron, 1977;Carneiro et al., 2017;Kearney et al., 2018;Huang et al., 2020).
The second major inference from our modelling relates to the patchy occurrence of predicted suitable habitat across its range.Prior to this study, our view of the glossy grass skinks' range was that of a poorly known species, whose patchily distributed occurrence records reflected a failure of humans to identify, and record observations of, this small cryptic lizard; indeed, this was our motivation for seeking out additional records of the species.But while this is true to some extent, our modelling, which corrected for human sampling bias, shows that there may be a climatic basis to the species' disjointed distribution, and that gaps in its range may indeed represent ecologically unsuitable areas.
Conservation implications for P. rawlinsoni While P. rawlinsoni has a relatively large EOO, it has a limited AOO of 948 km 2 ; i.e., below the threshold for being considered vulnerable against the IUCN Red List Categories and Criteria (IUCN, 2022).Its distribution is considered severely fragmented and subject to recent decline, given that 42% of its predicted range has been converted to agricultural (38%) or urban (4%) land.Indeed, the sparse and fragmented occurrence of the species in south-west Victoria is perhaps best explained by the fact that this region has been almost entirely converted to heavily modified agricultural land.The IUCN Red List Categories and Criteria (IUCN, 2022) state that continuing declines in geographical distribution should be calculated over a 10-year period or three generations for a species (whichever is longer).P. rawlinsoni is expected to have a generation time of approximately 5 years (Wotherspoon, Chapple and Farquhar, unpublished data).Given that we surveyed sites at which the species has not been recorded for 15 or more years, our finding that P. rawlinsoni has potentially disappeared from 48% of such sites (12/25 sites) provides a basis for inferring a substantial decline in AOO, extent of habitat and number of subpopulations over the last three generations.Thus, P. rawlinsoni meets the relevant elements of IUCN Red List Criterion B [subcriterion B2ab (ii, iii, iv)] to make it eligible for listing as vulnerable.
Halting this decline will require a concerted effort from conservation practitioners and urban planners.As we have here shown, almost half of the species' entire habitat is anthropogenically modified.Until a detailed investigation of the species' specific habitat requirements takes place, our efforts to prevent or minimise the impact of further habitat modification may be ineffective.Research that explores the differences in abundance between remnant and adjacent cleared sites, and the microhabitat characteristics driving such differences, would yield invaluable new information that can be incorporated into effective urban planning and land management strategies for the species.

Conclusion
The myriad processes driving global biodiversity decline (e.g., wildfire, climate change, urban expansion) are expected to increase in frequency and intensity in the future.As conservation scientists, then, we must have access to detailed, up-to-date information on a species' biology, ecology and distribution if our priority is to make accurate forecasts of how species will respond to these threats.However, our ability to predict and manage biodiversity decline may be hindered by our incomplete and outdated knowledge of geographical distributions for many species.We have demonstrated how a combined data-integration approach can improve knowledge of a species' geographical range and AOOfundamentally important conservation metrics that are routinely used to assess a species' threat status.Combining accessible database records (i.e., online) with PC records (i.e., unpublished 'grey data') gives us an in-depth and current picture of a species' distribution, resulting in more reliable metrics of its vulnerability, and thus, a well-informed designation of the species to a particular threat category.While being broadly applicable, we suggest this method is particularly useful for investigators wanting to rapidly make an accurate assessment of the conservation status of DD and readily misidentified species with uncertain geographical distributions.Figure S6.Binary maps of distribution of Pseudemoia rawlinsoni predicted using surface range envelopes under a (a) 90% (b) 95%, and (c) 100% threshold.
Figure S7.Estimates of relative (%) contributions of environmental predictor variables to the optimal MaxEnt model.
Figure S8.Response curves characterizing how each environmental predictor variable affected the MaxEnt predictions for the best performing model.
Figure S9.Visualisation of the relevant differences between Pseudemoia rawlinsoni and Acritoscincus duperreyi.Table S1.Full set of environmental predictor variables considered in variable selection for modelling the distribution of Pseudemoia rawlinsoni.
Table S2.Metrics used to evaluate the performance of models.
Table S3.Performance of the MaxEnt models.Table S4.Details of the 25 historical Pseudemoia rawlinsoni sites that were surveyed between October 2021 and February 2022.
Table S5.List of the 13 historical record survey sites at which we found Pseudemoia rawlinsoni, indicating the survey number (first or second site visit) that the species was first detected on.
Table S6.The 10 potentially suitable (as predicted by MaxEnt modelling) wetlands in south-west Victoria at which we performed surveys during December 2021 to detect new populations of Pseudemoia rawlinsoni.

Figure 2
Figure 2 Number of records of Pseudemoia rawlinsoni obtained from online biodiversity databases (existing records), and personal communications (PCs) (new records obtained during this study).PCs added considerably (19.66%) to our final occurrence dataset for P. rawlinsoni, resulting in 103 new records and discovering 35 new sites.

Figure 3
Figure3Map of all occurrence records of Pseudemoia rawlinsoni obtained from two sources: personal communications (PCs, as part of the present study) and online biodiversity databases.The extent of occurrence (EOO) convex hull is denoted by the dashed orange line.This map represents the most complete picture of the species' distribution to date, given that we combine historical records with previously inaccessible data (i.e., occurrences obtained via months of conducting PCs).

Figure 4
Figure 4 MaxEnt model predictions of environmentally suitable areas for the occurrence of Pseudemoia rawlinsoni in southeast Australia.Maps show (a) continuous probability of habitat suitability, and (b) a binary prediction of habitat suitability based on a probability threshold that optimises ROC; the red areas (probability of habitat suitability >0.55%) represent an approximation of the species' realized environmental niche, given that the model included variables that reflect the species ecological requirements (e.g., temperature, rainfall, vegetation).

Figure S4 .
Figure S4.Mantle correlogram showing the spatial autocorrelation of cleaned (a) and thinned (b) species occurrence records at 0-10 000 m.Figure S5.Complexity and generality of MaxEnt models tuned with different combinations of feature classes and regularisation multipliers.FigureS6.Binary maps of distribution of Pseudemoia rawlinsoni predicted using surface range envelopes under a (a) 90% (b) 95%, and (c) 100% threshold.FigureS7.Estimates of relative (%) contributions of environmental predictor variables to the optimal MaxEnt model.FigureS8.Response curves characterizing how each environmental predictor variable affected the MaxEnt predictions for the best performing model.FigureS9.Visualisation of the relevant differences between Pseudemoia rawlinsoni and Acritoscincus duperreyi.

Figure S5 .
Figure S4.Mantle correlogram showing the spatial autocorrelation of cleaned (a) and thinned (b) species occurrence records at 0-10 000 m.Figure S5.Complexity and generality of MaxEnt models tuned with different combinations of feature classes and regularisation multipliers.FigureS6.Binary maps of distribution of Pseudemoia rawlinsoni predicted using surface range envelopes under a (a) 90% (b) 95%, and (c) 100% threshold.FigureS7.Estimates of relative (%) contributions of environmental predictor variables to the optimal MaxEnt model.FigureS8.Response curves characterizing how each environmental predictor variable affected the MaxEnt predictions for the best performing model.FigureS9.Visualisation of the relevant differences between Pseudemoia rawlinsoni and Acritoscincus duperreyi.

Table 1
The relative contribution of occurrence data, obtained from two different collection methods, to our knowledge of the geographical distribution of Pseudemoia rawlinsoni