Predicting potential distribution of poorly known species with small database: the case of four‐horned antelope Tetracerus quadricornis on the Indian subcontinent

Abstract Information gaps on the distribution of data deficient and rare species such as four‐horned antelope (FHA) in Nepal may impair their conservation. We aimed to empirically predict the distribution of FHA in Nepal with the help of data from the Indian subcontinent. Additionally, we wanted to identify core areas and gaps within the reported range limits and to assess the degree of isolation of known Nepalese populations from the main distribution areas in India. The tropical part of the Indian subcontinent (65°–90° eastern longitude, 5°–30° northern latitude), that is, the areas south of the Himalayan Mountains. Using MaxEnt and accounting for sampling bias, we developed predictive distribution models from environmental and topographical variables, and known presence locations of the study species in India and Nepal. We address and discuss the use of target group vs. random background. The prediction map reveals a disjunct distribution of FHA with core areas in the tropical parts of central to southern–western India. At the scale of the Indian subcontinent, suitable FHA habitat area in Nepal was small. The Indo‐Gangetic Plain isolates Nepalese from the Indian FHA populations, but the distribution area extends further south than proposed by the current IUCN map. A low to intermediate temperature seasonality as well as low precipitation during the dry and warm season contributed most to the prediction of FHA distribution. The predicted distribution maps confirm other FHA range maps but also indicate that suitable areas exist south of the known range. Results further highlight that small populations in the Nepalese Terai Arc are isolated from the Indian core distribution and therefore might be under high extinction risk.


Introduction
An important challenge in conservation biology is to understand the factors that determine the spatial distribution and abundance of species (Johnson 1980). With increasingly intensive human exploitation of land, habitats of wildlife species are being fragmented, degraded, and lost (Ellis et al. 2010;Haddad et al. 2015). Therefore, it becomes more and more important to understand the distribution of species over large spatial scale extents to guide targeted conservation management (Porwal et al. 1996;Mathys et al. 2006;Ara ujo et al. 2011). Human pressure on wildlife habitats is particularly high in developing parts of the world with large human populations, such as the Indian subcontinent (Mishra 1997;Sekhar 1998;Goswami et al. 2014). Predictive habitat modeling has been widely used as a tool to assess the impact of climate, land use, and environmental change on the distribution of organisms (Guisan and Zimmermann 2000;Drew et al. 2011). Such modeling approaches are also important for predicting potential species distribution ranges from environmental factors and for setting conservation priorities (Guisan and Thuiller 2005). A concise species distribution model (SDM) can inform conservation planning and management in a cost-effective way to approach the conservation challenge (Sanderson et al. 2002).
Four-horned antelope (Tetracerus quadricornis de Blainville, 1816), hereafter "FHA" (Fig. 1), is endemic to the Indian subcontinent, that is, Nepal and India. It is a solitary herbivore with small body size (shoulder height 55-65 cm and weigh 18-21 kg at adult) (Karanth and Sunquist 1992;Leslie and Sharma 2009) and low density less than 1 individual/km 2 (Baskaran et al. 2009). It is classified as Vulnerable in the IUCN Red List of Threatened Species (Mallon 2008) and as data deficient in the national red list of Nepal (Jnawali et al. 2011). As the deforestation rate of the Indian subcontinent is still in the range of 1.5% to 2.7% per year (Puyravaud et al. 2010;Southworth et al. 2012), habitat destruction because of conversion to agricultural land is the major threat to this species. FHA is believed to be widely distributed with fragmented populations particularly in dry deciduous forest in lowland of Nepal and India (Krishna et al. 2008;Mallon 2008;Leslie and Sharma 2009;Sharma et al. 2013). Studies at local scale extents suggest that FHA occurrence is determined by tree species richness (Sharma et al. 2013) and their nutrient levels (Ahrestani et al. 2011). However, beside local explanations of the specieshabitat relationships (Krishna et al. 2008;Baskaran et al. 2009;Sharma et al. 2013;Pokharel et al. 2015), a comprehensive and empirical assessment of the FHA's distribution range is still lacking (Krishna et al. 2009;Leslie and Sharma 2009).
According to Leslie and Sharma (2009), the species occurs in east central Nepal, whereas the IUCN Red List predicts FHA to occur only in the western part of Nepal ( Fig. 2). Own observations, however, confirm FHA to be present in both Chitwan and Bardia (28°23 0 0″N, 81°30 0 0″ E)/ Banke (28°11 0 28″N 81°54 0 46″E) national parks west of it. Figure 2 reveals some more discrepancies between range maps and occurrence locations, especially in the southernmost part of India. Missing information about the ecology and distribution of this endemic species may impair its conservation, particularly in Nepal where the species is listed as data deficient. The main objective of this study therefore was to obtain a better impression about the potential distribution of FHA in Nepal and its isolation from the Indian core distribution. We also aimed to develop an empirically based map of the entire possible distribution range. We hypothesized that the Indo-Gangetic Plain which is known as "bread basket" of South Asia (Aggarwal et al. 2004) isolates the suitable FHA habitats in Nepal from those in India. Furthermore, as FHA is restricted to dry deciduous forest in tropical regions (Krishna et al. 2008;Baskaran et al. 2009;Sharma et al. 2013;Pokharel et al. 2015), we expected that bioclimatic variables, which are related to a savannah-like vegetation, can explain the distribution of FHA. We believe that the findings of this study will fill the knowledge gap regarding the distribution and status of the species; in particular, our findings will be helpful to ground truth potential occurrences, and to assess habitat conditions. Ultimately, conservation management can take actions for   the conservation of FHA throughout their distribution range, particularly in the Nepalese lowland.

Study area
We focus on the tropical part of the Indian subcontinent (65°-90°eastern longitude, 5°-30°northern latitude) (Fig. 2), that is, the areas south of the Himalayan Mountains, particularly India (3 287 260 km 2 ) and Nepal (147 180 km 2 ) (data source: www.data.un.org accessed on 27 November 2014). Siwalik Hills (or Churia range) demarcate the northern limit of these tropical areas whereas the Indo-Gangetic Plains (Ganges, Indus and Brahmaputra river valleys), Deccan plateau, and Western and Eastern Ghats are the main topographical features. Eastern coast and Western Ghats of India are dominated by humid climate and tropical evergreen or moist deciduous forests whereas in the western and northwestern part of India, dryness increases with longer (5-9 months) dry periods. Those areas, except Rajasthan, where thorny thickets dominate, are mainly dominated by dry deciduous forest (Blasco et al. 1996). Nepalese tropical forest and northern, northeastern, and central Indian forests are dominated by sal (Shroea robusta) (Blasco et al. 1996;Barnekow Lillesø et al. 2005;Carpenter 2005). More than two-third of the tropical forest in this region is occupied by moist and dry deciduous forests, which frequently face wild fires (FAO, 2007;Joseph et al. 2009) and are the main potential habitat for FHA.
Most (~80%) of the precipitation in the subcontinent occurs due to monsoon from May to September are higher in the southeast of the subcontinent whereas westerly circulation derive some precipitation from November to March, and is more active in northwest (Mooley and Parthasarathy 1983;Shrestha et al. 2000;Duan and Yao 2004). Mean annual rainfall is about 1100 mm (1090.4 mm AE 103.91 mm) for India (Parthasarathy et al. 1994) and about 1800 mm for Nepal (Shrestha 2000). Mean temperature of the coldest months is generally >15°C in the tropical areas of the subcontinent.

Species and environmental data preparation
We used a dataset of ungulate occurrence in 76 large (>200 km²) protected areas within India as provided in Ahrestani et al. (2011) and cross-checked its coordinates in a GIS environment (ArcGIS 10.1, Redlands California) with two datasets of protected areas (WCU-UNEP, 2007;IUCN & UNEP-WCMC, 2015). We corrected one location from Ahrestani et al. (2011), which showed obvious latitudinal shift of 135 kilometers. All other coordinate pairs were inside the respective shapefiles from the protected areas databases. Thus, we extracted 53 locations with FHA presence. We added one additional location from Anwar et al. (2011) and three from Nepalese protected areas: "Bardia," "Banke," and "Chitwan," which are also known to harbor FHA (Pokharel et al. 2015; http:// www.dnpwc.gov.np assessed on 2 December 2015). Thus, we obtained 57 FHA records.
Single coordinate pairs as representatives of large protected areas may introduce uncertainty into analysis. However, we considered location error to be low, relative to study area extent (3000 9 3000 km). For example, protected area minimum and maximum sizes were 259 km² and 7506 km², respectively, with a mean of 828 km². These area extents correspond to quadrats of 16, 87, and 29 km edge length. Assuming these squares on average to contain correct locations, minimum, maximum, and mean location errors relative to edge length of the total extent were 0.53%, 2.90%, and 0.97%, respectively. Given further the inherent spatial autocorrelation in environmental (bioclimatic) variables, we considered the given coordinate pairs to well reflect the underlying bioclimatic and topographic values.
We used 19 bioclimatic variables (Hijmans et al. 2005) to capture environmental variation within the potential distribution range of FHA. The set of 19 bioclimatic rasters was accessed from the WORLDCLIM website (http:// www.worldclim.org/) at a resolution of 0.5 min (30 arc seconds, 0.083 degree, ca. 1 by 1 km) for tile twentyeight. As elevation was a good predictor of FHA occurrence at smaller scale extents (Pokharel et al. 2015), we complemented our set of predictors with elevation, based on a Shuttle Radar Topography Mission (SRTM) tile (USGS 2006). All raster layers were downloaded at a resolution of 0.083 degree (~1 km), georeferenced to geographic coordinate system (GCS WGS84), and cropped to 65°-90°eastern longitude and 5°-30°northern latitude for further use in a raster stack in RStudio (RStudio Team, 2015).

Modeling procedure
We used maximum entropy (MaxEnt) modeling (Phillips et al. 2006), a machine learning approach for data analysis and prediction of FHA distribution. This method uses background points to describe the location of species presences in environmental space. It estimates an optimal probability distribution of maximum entropy (Phillips et al. 2006) and can be regarded as a niche-based machine learning technique (Elith et al. 2011), which characterizes an approximation of a species' ecological niche and projects it into geographic space. MaxEnt has been found to perform best among many different species distribution modeling methods particularly if available information is incomplete and sample size is small (Elith et al. 2006;Pearson et al. 2007). Similar to linear combinations in statistical models like GLMs and GAMs, Max-Ent uses an expanded set of transformations of the covariates, called features in machine learning techniques (Elith et al. 2011) and thus allows for flexibility with the predictors. However, it has recently been demonstrated that more complex features are not the main determinants of model performance (Syfert et al. 2013). We therefore used quadratic and hinge features and did not change the regularization parameter also because we were using a set of preselected variables as suggested by Merow et al. (2013), thus providing enough control for over-fitting. We made use of MaxEnt's jackknife test, which delivers the effect of removal of each predictor variable on the increase of explanatory ability (gain) of the model (Elith et al. 2006), and to reduce the number of predictors until we reached the maximum possible gain. This procedure resulted in eight predictor variables for the final FHA distribution model ( Table 1).
One of the central issues in the presence-only modeling is the choice of background values (Merow et al. 2013), because a random selection can lead to models that predict sampling effort rather than a species' distribution. A way to account for sampling bias is to use a target group background, which has the same spatial bias as the occurrence data (Phillips et al. 2009). We selected Indian ungulates presented in Ahrestani et al. (2011) as a target group. Our background locations thus consisted of all 76 coordinates therein plus a set of 182 locations for 14 ungulates beside FHA, which we downloaded from the Global Biodiversity Information Facility (GBIF Backbone Taxonomy, 1 July 2013. accessed via http://www.gbif.org on 16 November 2014). After removal of duplicate records and addition of background locations for Nepal (Jnawali et al. 2011), we obtained 248 target group background locations.
We performed threshold-independent evaluation with the receiver operating area under curve (AUC). We used MaxEnts inbuilt bootstrapping option to obtain a standard deviation for the area under the receiver operating curve (AUC) and to visualize uncertainty in species response curves. Bootstrapping is a nonparametric approach to statistical inference that can provide accurate inferences when sample size is small (Freedman 1981). Its implementation in MaxEnt selects training data by sampling with replacement from the presence points, with the number of samples equaling the total number of presence points. With this procedure, about one-third of the presences are left out while all others are included between one and four times. Therefore, the training datasets contain duplicate records. We bootstrapped 100 times for calculation of AUC standard deviations and response curves. We prepared a final prediction map and compared our model predictions with FHA distribution ranges published in Leslie and Sharma (2009) as well as in the IUCN Red List of Threatened Species (Mallon 2008).
For a further assessment of model calibration, we derived presence-only calibration (POC) plots (Phillips and Elith 2010) from model output. While AUC gives relative measures of discrimination ability, model calibration plots return how well predicted values match probabilities at occurrence and background locations. All analyses and raster processing were accomplished in R version 3.0 (R Core Team, 2013) with the dismo package, version 1.0-12 (Hijmans et al. 2015) and its dependencies.

Model performance
The area under receiver operating curve (AUC) obtained with 100 bootstraps was 0.86 AE 0.18 (AUC AE standard deviation) for the model with target group background and 0.92 AE 0.013 for a model with random background. POC plots revealed a better fit between predictions and relative probability of presence in the model that used target group background. Specifically, predicted probabilities of occurrence were allocated along the diagonals of calibration plot, whereas the model with random background was badly calibrated at prediction values >0.5 (Fig. 3).

FHA distribution range
Large parts of India were found to be suitable for FHA (Fig. 4). The probability of occurrence clearly diminished toward the Indian northwest (Thar Desert) and was also lower in the Eastern Ghats, and very low in the Ganges river valley between India and Nepal. The Gangetic Plains had the lowest predicted suitability also without inclusion of the human land use predictor (results not shown). Furthermore, suitable areas along the Terai Arc in Nepal stretch along the narrow strip of Churia range that is fragmented within (Fig. 5). In particular, areas suitable for FHA in Nepal, that is, Chure hills in Sindhuli, Udayapur, and Dhanusha districts in east, Chitwan in center, and Dang and Banke in west Nepal, are poorly connected. In addition, areas predicted to be suitable for FHA lie in Gujarat state of India and the southern parts of Sindh Province in Pakistan (Fig. 4).

Predictors of the FHA distribution
Maximum temperature of warmest month, precipitation of driest quarter, elevation, and temperature seasonality explained almost half of the models variability (Table 1). As expected, FHA occurrence was positively linked to high temperatures of the warmest month (bio5), corresponding to savannah-like vegetation. Response curves also show that suitability for the FHA decreased with higher precipitation of driest quarter (bio17), and temperature seasonality (bio4). Moreover, FHA presence probability was highest at about 400-1000 m above sea level and decreased toward lower and higher elevations (Fig. 6).

Model performance and uncertainty
In our study, we modeled the potential distribution of four-horned antelope (FHA) throughout India and Nepal   with climate, topographic, and land-use information.
Although the sample size of known occurrences was low, results of bootstrapping indicate an acceptable level of model performance and the MaxEnt model predicted a potential FHA distribution range that mainly coincided with the range limits of previous assessments. This is in accordance with other studies, which showed that small sample sizes can provide enough information to predict species distribution ranges over large spatial scale extents (Hernandez et al. 2006;Pearson et al. 2007). In particular, rare species with limited global distribution like FHA can be modeled successfully (Hernandez et al. 2006) due to narrow fundamental niches and thus well-defined locations along environmental gradients. MaxEnt by default uses randomly generated background locations and derives information from those locations for model development (Phillips et al. 2006;Mateo et al. 2010). In this case, however, sampling bias can strongly influence model performance and predictions of species distributions (Phillips et al. 2009;Bystriakova et al. 2012;Kramer-Schadt et al. 2013;Syfert et al. 2013). We accounted for sampling bias and obtained a prediction map that was different from one, which was modeled with random background (Fig. 4). Explicitly, predicted FHA distribution area adhered much closer to observed FHA occurrences in the latter case, while the target group background model prediction was less centered on such occurrences and was able to predict to areas without observations (Fig. 4). As we controlled for sampling bias and restricted the background to a target group, performance of our model was good, whereas allocation of random background would have further increased the AUC.
The maximum entropy approach is one of the best performing methods in predictive species distribution modeling (Elith et al. 2006). However, it may tend to overpredict a species' range when predictions are made in unknown environmental space. As both, occurrence and target group background records spanned large longitudinal and latitudinal gradients, the predictions of our model were mainly into known environmental space, which increases its credibility.
Presence-only calibration plots show the fit between predictions and observed occupancy in classes (bins) of similar prediction values. Both models thus displayed reasonable refinement, as demonstrated by the wide range of possible prediction values between zero and one (Fig. 3). However, good calibration was only achieved when we used target group background. The random background model tended to under-predict FHA habitat, that is, the relative probability of presence was higher than the The curves show how the logistic prediction changes when only the corresponding variable is used. These plots reflect the dependence of predicted suitability both on the selected variable and on dependencies induced by correlations between the selected variable and other variables. For variable description, see Table 1. predicted presence of the species (Fig. 3). This result further supports other results that showed that MaxEnt models performed better when sampling bias had been corrected for (Bystriakova et al. 2012;Kramer-Schadt et al. 2013;Syfert et al. 2013).

FHA distribution range
We chose the logistic output for our MaxEnt prediction map, the values of which should be interpreted as relative areas of high and low suitability rather than absolute values of presence probability (c.f. Pearson et al. 2006;Merow et al. 2013). FHA core areas are in the Western Ghats, the central to southern parts of Deccan peninsula (Fig. 4) and the central and eastern parts of the Nepalese Churia range (Fig. 5). It is evident from our models that FHA habitat lies in tropical areas extending from the slopes of the Indian coast up to the Churia range (below ca. 2000 m) as the northernmost limit of its distribution.
Another key finding is that areas suitable as habitats of FHA are patchily distributed all over the tropical Indian subcontinent. FHA is a sedentary species (Mallon and Kingswood 2001); therefore, the probability is low that it crosses large matrix areas between fragmented habitat patches. Particularly, the FHA populations in the Churia range might have been isolated from the rest of the population in Indian peninsula because of the large agricultural landscape along the Ganges river basin in-between, which is predicted to be unsuitable for FHA. Thus, the Gangetic Plain which is known as "bread basket" of South Asia (Aggarwal et al. 2004) represents a geographic barrier for the dispersion of FHA between the Terai Arc and the range in India. As humans dominate the landscape (Persha et al. 2010;Kumar and Yashiro 2014), FHA populations throughout the subcontinent may be exposed to intensive human land use. In particular, the isolated Nepalese FHA population might be at high risk of extinction because of its small distribution range and population size (Johnson 1998;Krishna et al. 2009). Furthermore, eastern parts of Churia range predicted to be suitable as FHA habitat are outside the protected areas and thus might have high poaching pressure. Therefore, in practical, abundance of FHA might be lower in the eastern parts of Churia range than in the western parts. However, more research is needed on dispersal ability of FHA and genetic exchange to understand the viability of its subpopulations, which is estimated to be less than 1,000 individuals (Mallon 2008).
Our prediction map will support ground truthing of FHA occurrence as well as assessment of habitat conditions and protection status of predicted distribution ranges. In particular, our work may assist species management planning including a revision of the conservation status of FHA in Nepal. Moreover, based on our prediction maps and the presence locations from Ahrestani et al. (2011), the distribution range of FHA, as delineated by Leslie and Sharma (2009) and in the IUCN Red List of Threatened Species list (Mallon 2008), may need to be extended further south. Areas in the Gangetic Plain, which were predicted by our model to be least suitable, should be included in ground assessments to identify potential corridors between the Indian and Nepalese populations. Until a revised map has been ground-checked, our predicted distribution range may provide a valuable basis for conservation management in favor of the FHA and its habitats but also for targeted monitoring. Another interesting feature of our model was its prediction of a small part of Sindh Province in Pakistan as suitable FHA habitat because the species used to be found in those areas (Roberts 1997;Krishna et al. 2009). Reintroduction of FHA in the Sindh Province may prove successful to restore the FHA population.

Predictors of the FHA distribution
We present the suitable areas for FHA on the Indian subcontinent as explained by topographical and climatic features. We did not directly include land cover information as predictors of the FHA range because there is considerable redundancy between climatic conditions and land cover (Woodward 1987;Woodward et al. 2004). Climate throughout the Indian subcontinent differs considerably among biogeographic zones and results in different land cover and vegetation types (Udvardy 1975;Blasco et al. 1996). Also, inclusion of a multilevel factor variable would have reduced degrees of freedom, which is critical with a small dataset such as the one we used. Rather than land cover, we therefore used climatic variables, which in turn proved to be important as predictors of FHA distribution.
Four-horned antelope core areas are very dry (7-9 months) and are characterized by undulating terrain with moist and dry deciduous forest, and woodlands (Blasco et al. 1996;Kodandapani et al. 2004;Prasad et al. 2008). Those forest types are subjected to annual wildfires which regulate vegetation structures (Blasco et al. 1996;FAO, 2007). Outputs of our model support these research findings on small-scale habitat features of FHA, which are mainly determined by tropical dry deciduous forests (Krishna et al. 2008(Krishna et al. , 2009Baskaran et al. 2009;Leslie and Sharma 2009;Sharma et al. 2013;Pokharel et al. 2015). The representation of the FHA's distribution in environmental space by higher temperature of warmest months (bio5) and lower precipitation of driest month (bio17) suggests that the species can geographically be found in areas with hot and long dry season. Those areas receive high amounts of rainfall during monsoon. In addition, the curvilinear response to elevation indicates that FHA can be found at intermediate elevations and habitat suitability decreases toward higher elevations above 1,000 m. This relationship of FHA habitat suitability with elevation is evident in our model, which predicted the high Himalayan areas as unsuitable for FHA. Species distribution, however, is indirectly linked to elevation. It is directly connected to the more functional ecological variables such as temperature and precipitation  which in turn determine vegetation structure and habitat quality. Elevation is also linked to topographical features such as slope and ruggedness, which may affect predation risk. Disregarding the indirect character of elevation, we decided to report FHA's response to this variable as bootstrapped curves showed little variation and the predictor considerably contributed to the model. In addition, an optimum elevation range can be easily measured in the field than averaged climate variables, which facilitates communication to conservation managers.
Temperature affects physiologic processes and hence the behavior of animals (Root 1988;Briffa et al. 2013). Temperature seasonality (bio4) with its negative impact on climatic suitability confirms that fluctuation in temperature throughout the year should not be too high in the FHA's range meaning that climatic habitat suitability for FHA is highest at equatorial region and decreases toward the north and higher elevation. The Churia hills in the Terai Arc, which are characterized by a subtropical climate with mean annual temperature of 20°C and longer dry season (Barnekow Lillesø et al. 2005;MoFSC, 2008), were predicted as the northernmost limit of FHA distribution.
Major river systems of the subcontinent, particularly the Ganges River, are diverted to irrigate almost onefourth of the total Ganges river basin. In the agricultural landscape, moisture levels in the air remain high due to evapotranspiration (Thenkabail et al. 2005;Dheeravath et al. 2010). Singh et al. (2008) noticed an increase in summer temperature but a decrease in winter temperature during the last century. Premonsoon rain was also increased leading to a shorter dry season with high seasonality. Thus, human actions have altered the climate of the Ganges basin which might have made the basin climatically unsuitable for FHA.
In conclusion, we presented the first empirical rangewide distribution model for four-horned antelope on the Indian subcontinent. Together with the underlying occurrence records, our predictions revealed some suitable areas that are missing from extant range maps but also areas that are obviously unsuitable such as the Gangetic Plains. Areas suitable as habitat for FHA are patchily distributed from the slopes of the Indian coast to the foothills of the Himalaya and are under pressure of human land use. Compared to the whole distribution range of the endemic FHA, its suitable areas in Nepal are very small, fragmented, and isolated from the Indian core distribution. We therefore suggest a revision of the current FHA distribution map as represented in the IUCN Red List of Threatened Species and further actions to protect the species from land-use changes. WCU-UNEP. 2007

Supporting Information
Additional Supporting Information may be found in the online version of this article: Figure S1. Density curves that visualize the locations of occurrence (green), target group background (blue), and random background (red) values along environmental gradients.