ConR: An R package to assist large‐scale multispecies preliminary conservation assessments using distribution data

Abstract The Red List Categories and the accompanying five criteria developed by the International Union for Conservation of Nature (IUCN) provide an authoritative and comprehensive methodology to assess the conservation status of organisms. Red List criterion B, which principally uses distribution data, is the most widely used to assess conservation status, particularly of plant species. No software package has previously been available to perform large‐scale multispecies calculations of the three main criterion B parameters [extent of occurrence (EOO), area of occupancy (AOO) and an estimate of the number of locations] and provide preliminary conservation assessments using an automated batch process. We developed ConR, a dedicated R package, as a rapid and efficient tool to conduct large numbers of preliminary assessments, thereby facilitating complete Red List assessment. ConR (1) calculates key geographic range parameters (AOO and EOO) and estimates the number of locations sensu IUCN needed for an assessment under criterion B; (2) uses this information in a batch process to generate preliminary assessments of multiple species; (3) summarize the parameters and preliminary assessments in a spreadsheet; and (4) provides a visualization of the results by generating maps suitable for the submission of full assessments to the IUCN Red List. ConR can be used for any living organism for which reliable georeferenced distribution data are available. As distributional data for taxa become increasingly available via large open access datasets, ConR provides a novel, timely tool to guide and accelerate the work of the conservation and taxonomic communities by enabling practitioners to conduct preliminary assessments simultaneously for hundreds or even thousands of species in an efficient and time‐saving way.


| INTRODUCTION
As we attempt to address the modern biodiversity crisis, assessing the conservation status of species has become an invaluable tool for biodiversity conservation. Evaluating threat based on the Red List Categories and Criteria of the International Union for Conservation of Nature (IUCN, 2012) is an authoritative, comprehensive and widely used approach in conservation biology (Rodrigues, Pilgrim, Lamoreux, Hoffmann, & Brooks, 2006). Indeed, many decisions made by governments, natural resource managers, and conservation planners (Rodrigues et al., 2006) rely (often solely) on the "Red List" published by IUCN (http://www.iucnredlist.org/). For example, programs such as Important Bird Areas (IBA), Important Plant Areas (IPA, Anderson, 2002) or Tropical Important Plant Areas (TIPA, Darbyshire et al., 2017) all rely directly on threat assessments based on IUCN criteria. In parallel, there is also an urgency in listing threatened species in the near future. This is, for example, the case of Target 2 of the Global Strategy for Plant Conservation (GSPC) of the United Nation's Convention on Biological Diversity, which calls for assessing the conservation status of all known plant species by 2020 (https://www.cbd.int/gspc/targets. shtml).
However, as of 2016, the Red List included assessments of just 21,898 plant species (IUCN Standards and Petitions Subcommittee, 2016), ca. 6.2% of the estimated global total (~352,000 flowering plant species; Paton et al., 2008). The ThreatSearch database (www. bgci.org/threat_search.php) documents the conservation assessments of ca. 150,000 taxa, including assessments at the species and infraspecific levels based on both older or current IUCN criteria; preliminary, global or regional assessments; and assessments based on other non-IUCN criteria. Thus, ThreatSearch represents an uncritical, highend estimate of the total number of plant taxa assessed to date. Hence, over the last three decades, progress toward this target has been slow largely because the process of performing and publishing full Red List assessments is time-consuming. Accelerating global conservation assessments is urgently needed (Krupnick, Kress, & Wagner, 2009;Miller et al., 2012). While alternative methods have been developed to streamline and simplify large-scale conservation assessments (e.g., Krupnick et al., 2009;Miller et al., 2012;Ocampo-Peñuela, Jenkins, Vijay, Li, & Pimm, 2016;Ter Steege et al., 2015), none are based on the theoretical framework provided by IUCN, and they thus have little immediate impact for concrete conservation actions.
The International Union for Conservation of Nature employs five complementary criteria (A, B, C, D and E) under which a species can be evaluated, and, when not already extinct, assessments assign species to three threatened categories (Critically Endangered (CR); Endangered (E); VU (Vulnerable)), or otherwise to LC (Least Concerned), NT (Near Threatened) or DD (Data Deficient, when insufficient data are available). Among these five criteria, criterion B is the most widely used.
For example, in 2007, almost half of all organisms whose status was published on the IUCN Red List were assessed solely based on criterion B (Gaston & Fuller, 2009). Unlike the others, Criterion B is suitable for estimating conservation status even when the distribution of a taxon is only known from georeferenced herbarium or museum collections and with limited information on local threats and potential continuing decline (Schatz, 2002), and it plays a prominent role in describing global trends in extinction risk. Even though some have suggested that Criterion B is the most misapplied of the five (IUCN Standards and Petitions Subcommittee 2016, p. 62), it nevertheless has the significant advantage of allowing assessments to be undertaken using distribution data only (Schatz, 2002), which are in many cases the only information available (in contrast, for example, to abundance data).
Assessing the conservation status of taxa under IUCN Red List criterion B (IUCN, 2012; IUCN Standards and Petitions Subcommittee 2016) nevertheless presents particular challenges based on recorded primary occurrences (typically obtained by compiling herbarium/ museum records). Criterion B involves two subcriteria (B1 and B2), which reflect two different kinds of geographic range size estimates [subcriterion B1 is based on extent of occurrence (EOO) while B2 is based on area of occupancy (AOO)], and three additional conditions (a, b and c) that describe aspects of the biology and potential decline of the taxon as a result of the impact of threats. Threshold levels for at least one subcriterion and two conditions must be met for a taxon to be assigned a threatened conservation status (see Table 1). and preliminary assessments in a spreadsheet; and (4) provides a visualization of the results by generating maps suitable for the submission of full assessments to the IUCN Red List. ConR can be used for any living organism for which reliable georeferenced distribution data are available. As distributional data for taxa become increasingly available via large open access datasets, ConR provides a novel, timely tool to guide and accelerate the work of the conservation and taxonomic communities by enabling practitioners to conduct preliminary assessments simultaneously for hundreds or even thousands of species in an efficient and time-saving way.

K E Y W O R D S
area of occupancy, criterion B, distribution range, extent of occurrence, IUCN, location, preliminary status, threatened taxa

| Extent of occurrence
Extent of occurrence (EOO) is defined as "the area contained within the shortest continuous imaginary boundary that can be drawn to encompass all the known, inferred or projected sites of present occurrence of a taxon, excluding cases of vagrancy (IUCN, 2012)." EOO is generally measured by a minimum convex polygon, or convex hull,

| Area of occupancy
The Area of occupancy (AOO) is defined as "the area within its 'extent of occurrence' that is occupied by a taxon, excluding cases of vagrancy (IUCN, 2012)." AOO differs from EOO (see above) as it reflects the fact that a taxon will not usually occur throughout its EOO, that is, there will be areas where the taxon is absent, including (unsuitable areas). The AOO will be a function of the scale or grid cell size at which it is measured, and which should reflect relevant biological aspects of the taxon. For example, the impact of a threat is not identical if we consider tree or herb species.

| Location
A "location" is defined as "a geographically or ecologically distinct area in which a single threat can rapidly affect all individuals of the taxon present (IUCN, 2012)." Thus, the size of a location depends on the threat (mining, deforestation, poaching, etc.). EOO and AOO, the two main parameters of Criterion B, can be generated automatically (Table 1). However, assessing the number of locations requires contextual information about threats. This information, which is usually obtained from field observations, expert knowledge, and/or precise data on the size and nature of a taxon's range (e.g., continuous vs. severely fragmented), can thus only be applied properly using a "taxonby-taxon" process to obtain a fully informed IUCN Red List assessment.

| Subpopulations
"Subpopulations" are defined as "geographically or otherwise distinct groups in the population between which there is little demographic or genetic exchange (IUCN, 2012;Rivers, Bachman, Meagher, Lughadha, & Brummitt, 2010)." Although the number of subpopulations is not directly taken into account for assessments based on criterion B, this information is requested during the submission process to the IUCN Red List.
Below, we describe ConR, an R package to generate batch preliminary assessments of conservation status following the IUCN guidelines using multiple species datasets based on Criterion B. ConR makes it possible to: (1) calculate or estimate the key parameters needed for an assessment under criterion B; (2) generate preliminary assessments of multiple species using a batch process; and (3) summarize the estimated parameters and preliminary assessments in a spreadsheet and spatially visualize the results on generated maps. ConR implements a novel method to approximate the number of "locations" sensu IUCN, one of the key Criterion B parameters (see below).

| THE ConR PACKAGE
ConR allows users to estimate the above parameters automatically for any list of taxa and then assigns each taxon to a preliminary IUCN threat category according to Criterion B. These preliminary assessments are Under Criterion, B, the assessment of a taxon is based on the calculation of its EOO (B1) and/or AOO (B2). In addition, at least two of the following conditions must be taken into consideration: (1) Table 1). Calculation of the two key range parameters, EOO and AOO, can be easily automated either using a taxon-by-taxon approach, as provided for by the web service GeoCAT (Bachman, Moat, Hill, de la Torre, & Scott, 2011), or in batch mode, for example in other R packages such as speciesgeocodeR (Töpel et al., 2017) or RED (https://CRAN.R-project.org/package=red; see Table 2).
However, none of these packages are designed to estimate the number of locations, a fact that hinders their utility in assigning taxa to a threat category under Criterion B. The notion of "location" remains a complex and sometimes confusing concept. It has been interpreted in many different ways depending on the type of organism studied, the general landscape in which a taxon occurs and the type of threat to its populations. In ConR we have, for the first time, attempted to estimate the number of locations automatically so that it can be calculated simultaneously for a large number of taxa. This automation comes with a number of assumptions detailed below.
The number of locations for each taxon can be approximated using two complementary approaches in ConR. First, a grid with cells of a chosen size is overlaid on taxa occurrences and the number of locations is estimated by the number of occupied cells. The grid cell size must be defined by the user and should represent the scale at which subpopulations are equally affected by a given threat. For example, a cell size of 10 km² may be considered a good estimate of the scale at which a particular serious threat event such as mining could equally affect individuals of a given taxon (Durán et al. 2013). The user can choose a fixed cell size across the whole multispecies dataset (e.g. 10 km²) or can use a species-specific sliding scale approach (Rivers et al., 2010). In the latter approach, cell size is defined as 1/x of the maximum interoccurrence distance, where x is the maximum distance between two occurrences (e.g. 5% (0.05) of the max distance between the known occurrences). In both cases, the cell grid is overlaid on the total distribution of the taxon in a way that results in the minimum number of estimated locations.
Finally, as cell size is user defined, alternative estimates of the scale at which a given threat operates can be compared.
In the second approach, ConR integrates information about protected areas (PAs). The underlying rationale for this is that subpopulations within a PA will not be treated in the same way as those located T A B L E 2 Features of various currently available programs that estimate parameters used for preliminary conservation status assessments following the IUCN guidelines. GeoCAT (Bachman et al., 2011); speciesgeocodeR (Töpel et al., 2017)  For each preliminary assessment, in addition to estimating the "number of locations" condition, at least one of the two remaining conditions relating to the future trend of a taxon's distribution or structure must be taken into consideration: continuing decline and/or extreme fluctuation. ConR assumes by default a continuing future decline in habitat quality [condition (b) (iii), Table 1]. While this assumption might appear be an oversimplification, it would seem to be valid in most cases. The validity of this assumption is also intuitively acknowledged by the IUCN guidelines, which recognize a criterion for assessing threat status specifically on the basis of very small or restricted populations (Criterion D). This assumption is also reasonable when one considers that wilderness areas are in rapid decline throughout the world, especially in the tropics (Watson et al., 2016), suggesting that future decline may be anticipated for any given range-restricted species.
Finally, ConR also provides an estimate of the number of subpopulations of a taxon by implementing a circular buffer method (Resol_ sub_pop in km²). This buffer is user defined and can be adapted to different groups of taxa depending on their different dispersal characteristics but also gene flow (if known).

| ConR FEATURES
ConR includes four functions, two sample occurrence datasets, and two sample shapefile datasets. All functions operate on a mandatory single data frame providing taxon occurrences and on optional user-provided shapefiles of land/sea and protected area limits. Occurrence data and shapefiles must be provided using the WGS84 reference coordinate system. The input data frame requires three mandatory fields: latitude and longitude (in decimal degrees), and taxon name. The collection year can also be added, thereby allowing graphic visualization of a taxon's collecting history (Figure 1). Additional information, such as higher taxonomic rank, can also be provided. By default, ConR saves all results in the user's R working directory. A step by step tutorial (R vignette) describing all options is provided as supplementary material and on the CRAN website.

| IUCN.eval
This is the main ConR function, which provides values for all parameters, including EOO, AOO and an estimate of number of locations, needed for assessing the preliminary conservation status of taxa based on selected conditions and subcriteria of criterion B (Table 1). All options are flexible and can be user defined. The number of locations can be estimated using a fixed or sliding grid approach (Rivers et al., 2010). In addition, PA information can also be taken into account if an appropriate PA shapefile is provided (see above).
The output is a  along with a distribution map. If PA information was included, the map also depicts the distribution of PAs as well as occurrences within (blue dots) or outside (black dots) them (see Figure 1). This map can be used for the submission of a formal assessment to the IUCN.

| EOO.computing
The EOO.computing function calculates EOO. It operates with a minimum of three unique occurrences; otherwise, it returns "NA". In ConR, EOO can be estimated either using a "convex.hull" or an "alpha.

hull" method, as recommended by IUCN Standards and Petitions
Subcommittee ( In the very infrequent case that occurrences form a straight segment, the EOO will be zero, representing an underestimate of its surface (IUCN Standards and Petitions Subcommittee 2016). In this specific case, ConR outputs a warning. The EOO is then estimated using a different method: A polygon is built by adding a buffer of a predefined size of 0.1° to the segment, which can be adjusted by the argument buff.alpha.
Also, the EOO cannot be computed when there are less than three unique occurrences; a warning is returned in such case. Finally, it should be noted that the way in which ConR estimates the EOO may be biased for species with wide distributions and cannot be applied to species whose distribution spans the 180th meridian (see R documentation).

| subpop.comp
This function estimates the number of subpopulations using the circular buffer method (Rivers et al., 2010). Each unique occurrence is buffered with a circle of a defined radius and overlapping circles are merged to form a single subpopulation, while nonoverlapping circles are considered to represent separate subpopulations. For batch processing of species, while the circular buffer method does not take into consideration the dispersal abilities of each taxon, it was recommended by Rivers et al. (2010) after testing various methods. The output must be considered as an approximation of the total number of subpopulations. Although the number of subpopulations is not directly taken into account for assessments based on criterion B, this information is requested for the submission of full assessments to the IUCN Red List.

| map.res
The map.res function allows a graphical summary and geographical ex-

| CASE STUDY
In order to illustrate the usefulness and limits of ConR, we tested the package on a high-quality dataset of continental African palm distributions for 60 species (of the 68 currently known; Stauffer et al., 2017).
A large part of the data were extracted from the RAINBIO database (Dauby et al., 2016), which contains nearly all herbarium collections for African palms. Additional recent collections were added when available, resulting in a dataset of 4,234 unique occurrence records. The dataset was first used for the preparation and submission (as of April 2017) of full, species by species IUCN Red Listing assessments, mainly under Criterion B. Second, using ConR (with default parameters) and the same data for all 60 species, but excluding any "nonherbarium" occurrences (such as those based on satellite imagery or population censuses), we performed preliminary assessments as a batch operation. We also ran the dataset with and without PA information (downloaded and filtered from https://www.protectedplanet.net) using the "protect.areas" default option. ConR analyzed the dataset in less than 5 minutes using a standard laptop.
The results of the full IUCN assessments and those generated by ConR (with and without PA information), summarized in Table 3, are quite congruent. Factoring in PAs did not alter the outcomes,  In addition to this case study, we also undertook a preliminary conservation assessment of amphibians of Madagascar using a dataset that contained 7,657 georeferenced records representing 201 species, downloaded on February 9, 2016, from www.gbif.org (https://doi.org/10.15468/dl.2tkoae). This analysis was performed mainly to demonstrate the graphical outputs of ConR (Figures 1 and   2). This dataset is available within ConR as an example data frame (Malagasy_amphibian).

| DISCUSSION
ConR provides for the first time a dedicated, multispecies conservation assessment package based specifically on IUCN criterion B and using only species geographic distribution. It provides an efficient tool to help accelerate the work of the conservation community by enabling practitioners to conduct preliminary assessments that are both reliable and informative. We stress that ConR does not (and is not intended too) replace the full IUCN Red Listing process; it can, however, assist and facilitate this process. ConR uses a number of assumptions in order to automate category assignment, especially the estimation of the number of locations sensu IUCN. Notwithstanding these assumptions, detailed above, which must be understood and acknowledged by the user, ConR is flexible in their implementation, allowing the user to explore various approaches and methodologies and to customize values for each option. As shown in our case study on African palms (Table 3), the results of the full and ConR assessments are generally congruent.
The differences observed between them can be linked primarily to the way in which ConR estimates the number of locations.
For example, Eremospatha barendii is known from three collections made at localities more than 10 km apart. ConR thus infers (with a resolution of 10 km 2 ) two locations, whereas in the full assessment, we estimated a single location (because both localities were considered to be subjected to the same threat). Another difference lies in whether locations were used or not for the full assessment.
For example, Eremospatha dransfieldii is inferred by ConR to have eight locations, and it is therefore assessed as VU. However, the subpopulations of this species are severely fragmented (Cosiaux et al., 2017), which also triggers subcriterion "a" (Table 1), which was used along with continuing decline (subcriterion "b"), and thus the number of locations was not used for the full assessment. In contrast, for some species, ConR indicated a status of CR, EN or VU, whereas the full assessment was LC (e.g., Raphia gentiliana and R. monbutturom; Table 3). These mismatches occurred for species with broad geographic distributions but for which there were few collections (and thus fewer than 10 locations were inferred by Second, ConR will also be of value to taxonomists, who are increasingly expected to provide preliminary conservation assessments when describing new species or publishing revisions or monographs. By generating key parameters (EOO, AOO and an estimate of the number of locations), ConR will greatly facilitate this process.
Finally, rapid preliminary assessments of IUCN conservation status based on large, multitaxon sets will support studies on a wide range of subjects such as the evolution of extinction risk within and among clades (Forest, Crandall, Chase, & Faith, 2015;Jetz et al., 2014) and the phylogenetic component of extinction risk within regional floras (Leão, Fonseca, Peres, & Tabarelli, 2014) or faunas.
The ConR package has already been used by the authors to facilitate full assessments (e.g., of palms) and to prepare IUCN Red List workshops. Also, ConR has been successfully used as part of an IUCN "Green Listing" of Protected and Conserved Areas (IUCN, 2012) for private sector players in order to identify potentially threatened species occurring in their concessions that, after verification, will be the subject of specific conservation management plans (unpublished results).

DATA ACCESSIBILITY
The ConR package is written in R (R development Core Team 2016) and is available on the Comprehensive R Archive Network (https:// cran.r-project.org/package=ConR) and on a github repository (https:// github.com/gdauby/ConR).