Defining optimal sampling effort for large-scale monitoring of invasive alien plants: a Bayesian method for estimating abundance and distribution

Authors

  • Cang Hui,

    Corresponding author
    1. Centre for Invasion Biology, Department of Botany and Zoology, Stellenbosch University, Private Bag X1, Matieland 7602, South Africa
    Search for more papers by this author
  • Llewellyn C. Foxcroft,

    1. Centre for Invasion Biology, Department of Botany and Zoology, Stellenbosch University, Private Bag X1, Matieland 7602, South Africa
    2. Scientific Services, South African National Parks, Private Bag X402, Skukuza 1350, South Africa
    Search for more papers by this author
  • David M. Richardson,

    1. Centre for Invasion Biology, Department of Botany and Zoology, Stellenbosch University, Private Bag X1, Matieland 7602, South Africa
    Search for more papers by this author
  • Sandra MacFadyen

    1. Scientific Services, South African National Parks, Private Bag X402, Skukuza 1350, South Africa
    Search for more papers by this author

Correspondence author. E-mail: chui@sun.ac.za

Summary

1. Monitoring the abundance and spatial structure of invasive alien plant populations is important for designing and measuring the efficacy of long-term management strategies. However, methods for monitoring over large areas with minimum sampling effort, but with sufficient accuracy, are lacking. Although sophisticated sampling techniques are available for increasing sampling efficiency, they are often difficult to implement for large-scale monitoring, thus necessitating a robust yet practical method.

2. We explored this problem over a large area (c.20 000 km2), using ad hoc presence–absence records routinely collected over 4 years in Kruger National Park (KNP), South Africa. Using a Bayesian method designed to solve the pseudo-absence (or false-negative) dilemma, we estimated the abundance and spatial structure of all invasive alien plants in KNP. Five sampling schemes, with different spatially weighted sampling efforts, were assessed and the optimal sampling effort estimated.

3. Although most taxa have very few records (50% of the species have only one record), the more abundant species showed a log-normal species-abundance distribution, with the 29 most abundant taxa being represented by an estimated total of 2·22 million individuals, with most exhibiting positive spatial autocorrelation.

4. Estimations from all sampling schemes approached the real situation with increasing sampling effort. An equal-weighted (uniform) sampling scheme performed best for abundance estimation (optimal efforts of 68 records per km2), but showed no advantage in detecting spatial autocorrelation (247 records per km2 required). With increasing sampling effort, the accuracy of abundance estimation followed an exponential form, whereas the accuracy of distribution estimation showed diverse forms. Overall, a power law relationship between taxon density (as well as the spatial autocorrelation) and the optimal sampling effort was determined.

5.Synthesis and applications. The use of Bayesian methods to estimate optimal sampling effort indicates that for large-scale monitoring, reliable and accurate schemes are feasible. These methods can be used to determine optimal schemes in areas of different sizes and situations. In a large area like KNP, the uniform equal-weighted sampling scheme performs optimally for monitoring abundance and distribution of invasive alien plants, and is recommended as a protocol for large-scale monitoring in other protected areas as well.

Introduction

Protected areas are an important component of global biodiversity conservation strategies (Gaston et al. 2008). Biological invasions are a major direct driver of biodiversity loss, changes in ecosystem services (Wilcove et al. 1998) and biotic homogenization (Brook, Sodhi & Bradshaw 2008), and threaten the ecological integrity of most protected areas. Managing those invasive species that cause negative impacts is one of the major tasks of managers in many protected areas. Effective management of invasive species demands multi-layer initiatives linked with research and monitoring. For example, the management of invasive alien plants (IAPs) in South Africa’s Kruger National Park (KNP) has been supported by research in the following areas: detailed studies on the determinants and dynamics of spread of key taxa (Foxcroft et al. 2004; Foxcroft & Rejmánek 2007; Foxcroft, Richardson & Wilson 2008), assessment of the risks posed by alien plants in watersheds upstream of the park (Foxcroft, Rouget & Richardson 2007) and the effectiveness of the park boundary as a filter for invasive species (Foxcroft et al. 2011), and examination of the importance of issues pertaining to spatial scale in designing management plans (Foxcroft et al. 2009). Considerable efforts have also been made to initiate objective evaluation of all management initiatives to adapt and improve interventions (Biggs & Rogers 2003). A major challenge is to develop a cost-efficient monitoring strategy for this very large protected area, i.e. one that uses available records for reliable inference of the abundance and distributional structure of IAPs. The same challenge exists for protected areas around the world. This paper deals with fundamental requirements for an effective monitoring programme for IAPs. Although drawing on empirical data from KNP, the methodology can be applied in the general ecological research for estimating species abundance and distributions, and the results recommended for IAP management and control at regional scales.

Large-scale monitoring programmes (LSMPs) have obvious advantages over local-scale studies. First, sources of invasions can be included and identified in LSMPs, which is often impossible for local-scale studies. Secondly, spatial stochasticity is often minimized to facilitate the recognition of trends and patterns. Finally, environmental complexity and heterogeneity are often included to allow robust inference and projection that are compatible with policy making (Johnson 1993; Urquhart, Paulsen & Larsen 1998). Nevertheless, the implementation of LSMPs is often constrained by cost (e.g. Bottrill et al. 2009). Consequently, designing cost-efficient methods of inference and sampling schemes (protocols), that are accurate enough to inform various management and planning activities, is crucial for improving the effectiveness of LSMP. An efficient sampling scheme that requires the smallest sampling effort but which derives adequate monitoring results is urgently needed.

For assessing the impact of IAPs, the two most important variables that should be effectively estimated from such monitoring programmes are spatial distribution and abundance (Parker et al. 1999). Abundance is one of the most important measures of species conservation status (World Conservation Union 2001) and is a surrogate for ecological functioning (Gaston & Blackburn 2000; McGill et al. 2007). The spatial distribution of species reflects the dispersal processes and pathways of biological invasions, and is also a strong predictor of extinction risk and range contraction (Gaston & Fuller 2009). For instance, prioritization of areas for management using multi-criteria decision models often requires a reliable density map (and also maps of topography, disturbances and clearing history) as important input data to allow robust inference (Roura-Pascual et al. 2010). Better understanding of species abundance and distribution patterns can improve the monitoring of changes in biodiversity (Strayer 1999; Yoccoz, Nichols & Boulinier 2001; Wilson et al. 2004), optimize conservation and control efficiency (Van Kleunen & Richardson 2007), and enable the assessment of impact and potential distribution of invasions (Parker et al. 1999).

The effectiveness of LSMPs can be improved by the careful design of sampling strategies (Carlson & Schmiegelow 2002; Thompson 2002; Fortin & Dale 2005). Issues that must be considered include the determination of an appropriate sampling effort, extent, unit size (grain) and sampling strategy (scheme, or spatial layout of the sampling unit); all these factors affect the potential of the initiative to detect different spatial patterns (Dungan et al. 2002; Hui, Veldtman & McGeoch 2010). For instance, adaptive cluster sampling enhances the efficiency compared to simple random sampling for estimating population densities of aggregated species (Thompson 1990). As a rule of thumb, sampling designs should be guided by the minimum requirement for adequate spatial analysis and inference (Fortin & Dale 2005), an issue which is likely to be difficult in a LSMP (Goodman 2003). As a commonly used and cost-efficient format for data recording (Brotons et al. 2004; Joseph et al. 2006), presence-absence data and ad hoc records have been adopted as the standard format in many monitoring programmes. In such cases, the choice of extent and grain is not an issue. Instead, we face the problem of false-negatives and pseudo-absences in the model inference (e.g. MacKenzie et al. 2002; Royle, Nichols & Kéry 2005). An efficient sampling design thus requires an adequate sampling effort for an efficient and practical sampling scheme that can provide enough information for accurate inference of abundance and the spatial distribution of IAPs.

The monitoring programme for IAPs in South Africa’s KNP was chosen as a case study. This monitoring programme uses a system called CyberTracker that allows field rangers to report the location of IAPs and other features of interest (which count as “absence” records for IAPs) encountered on routine patrols (Foxcroft et al. 2009). We use the data gathered in this programme to determine the minimum sampling effort for monitoring IAPs in KNP. First, we designed a Bayesian model and estimated the abundance and spatial autocorrelation of IAPs by subdividing the landscape into lattices at a resolution of 4 × 4 km grid cells. Secondly, by assuming that the estimates from all records reflect real abundance and its distribution, a re-sampling simulation was designed for a variety of sampling schemes to evaluate their performance in model inference. Finally, the relationship between abundance estimation (also spatial autocorrelation) and sampling effort was identified to ensure we achieved satisfactory accuracy. The optimal sampling effort was presented as a function of the density and spatial correlation of the focal IAP species.

Materials and methods

Study area and data

The KNP (30°53′–32°01′E, 22°19′–25°31′S), stretching across 19 485 km2, is one of the largest protected areas in the world that is actively managed primarily for biodiversity conservation. More than 370 alien plant species have been recorded in KNP (Foxcroft et al. 2003; Foxcroft, Richardson & Wilson 2008). The goals of IAP management are to detect new invasions at an early stage and to respond rapidly to these, to maintain current invasions at low abundances where eradication is not feasible, and to undertake various actions to prevent further invasions (Foxcroft & Richardson 2003; Foxcroft & Freitag-Ronaldson 2007). The CyberTracker system (http://www.cybertracker.org), a user-friendly interface for PalmOS computers linked to geographical positioning systems (GPS), was introduced to facilitate the achievement of management goals in 2003, by providing rangers with a tool for management, and also high-precision data for research (MacFadyen 2005). In the KNP programme, this includes capturing information on animals, plants, water holes, poaching activity, fence condition, fire scars and numerous other features, including invasive alien plants. During their patrols, rangers move through the park in a haphazard fashion, recording the features they observe. The CyberTracker system is also set to take GPS readings at pre-defined time intervals. When recording a specific observation, other features, if present, will also be captured. Thus, timed points and records of other features can be used as ‘absence’ point data when analyzing alien plant invasions (see discussion in Foxcroft et al. 2009). We used data from 2004 to 2007 which comprises 2 360 419 records (including 27 777 presences and 2 332 642 absences) (Fig. 1a,b). One hundred and sixty-seven IAP species were included in the presence records; most records were for Opuntia stricta (72·1%) and Lantana camara (8·4%), and 119 species have fewer than 10 records. The extremely low occurrence (1·2%) of IAPs does not indicate insufficient sampling intensity or sampling preference (given the vast amount of records and the sampling protocol for rangers), but can be, in part, ascribed to the current policy of preventing and managing alien plant invasions. However, invasions of many species are currently in an early phase, with the potential for rapid expansion.

Figure 1.

 The number of records (a) and presence records (b) in Kruger National Park, as from the CyberTracker Programme, plotted at a resolution of 4 × 4 km grids. The spatial structure of the number of Opuntia stricta (c) and Lantana camara (d), as predicted from the Bayesian model. The spatial structures of other invasive species are presented in Fig. S2 Supporting Information.

Bayesian estimation

After dividing the KNP into grids, we transferred the presence-absence records for each species into the detection-nondetection grids and faced the problem of pseudo-absence data: cells with only absence records are not necessarily true absence of the focal species (e.g. MacKenzie et al. 2002, 2003; Tyre et al. 2003; Royle, Nichols & Kéry 2005; Pearce & Boyce 2006). Although methods are available for abundance inference using true presence-absence binary data (e.g. Nachman 1981; Wright 1991; He & Gaston 2003; Hui, McGeoch & Warren 2006; Hui et al. 2009; Borregaard & Rahbek 2010), this pseudo-absence dilemma still requires serious consideration. Solutions to this dilemma can arise from two statistical philosophies (Anderson 2008): maximum likelihood and Bayesian methods. The maximum likelihood method combines two binomial processes in forming a joint likelihood for the probability of absence and detection for each cell. This can then be estimated by maximizing the logarithmic likelihood by assuming a constant point-detection rate (MacKenzie et al. 2002, 2003) or a pre-defined distribution of species abundance (Royle & Nichols 2003; Zhou & Griffiths 2007). This is an appropriate approach for small-scale studies in homogeneous landscapes, but not for LSMPs because of the inherent spatial heterogeneity and some technical drawbacks in the abundance estimation (Warren, McGeoch & Chown 2003; Conlisk, Conlisk & Harte 2007). In contrast, the Bayesian method has also been applied to estimating avian abundance and occurrence in repeated surveys with encounter histories for each cell (Royle et al. 2007; Royle & Dorazio 2008). We thus followed this statistical philosophy and presented a Bayesian method to the pseudo-absence dilemma and abundance estimation for each IAP species in the KNP.

For a cell of size a with a number of n reported records (including x presences), if each record only corresponds to one non-repetitive expeditious visit of an area of δ with perfect detection, then a number of M = a/δ records are needed for obtaining the full information of this cell. This is reasonable assumption given that rangers are pre-trained for identifying specific IAPs and have to move rapidly across vast areas and that the occurrence of particular IAP species in the records is much lower than 1·2%. In the calculation, we set = 4 × 4 km (with a total of 1333 cells in the KNP) and M the maximum number of records within cells for practical reasons (i.e. = 72 203, indicating a 100% detection rate when rangers report a record within 8·4 m radius; less than 1% cells with more than 10 000 records). We assume that there are actually N presences (i.e. the true underlying abundance) once we had the full-information of the M records. The detection rate of one random sampling (record) in this cell thus equals N/M (i.e. the detection rate reflects the true abundance of a species in the cell; MacKenzie et al. 2005), which often differs from the occurrence of presence in samples, x/n. Clearly, the probability of finding x presences in the n reported records, knowing that there are N presences in the full-information M records, follows a hyper-geometric distribution: inline image (where inline image is the binomial coefficient, N!/(x!(N-x)!), for ‘N choose x’; here we treated the in-cell number of sampling records n and M as two known parameters and thus deleted in the left-side probability notation for brevity). In doing so, we neglect the in-cell heterogeneity (note that the between-cell spatial heterogeneity remains the same as in the cell-specific detection rate) and assume M the carrying capacity for all IAPs species. Such relatively reasonable compromises enable the possibility for a species-specific inference. Therefore, the probability distribution for in-cell abundance N given that x presences have been reported in the n samples can be estimated by the Bayesian rule:

image(eqn 1)

where prob(· |N) is a prior probability of N presences in the cell regardless of any sampling information. In the calculation, we set this prior probability as a combination of the uninformative prior (Jaynes 1968) and the Poisson model (for alleviating the zero-inflation problem; Royle, Nichols & Kéry 2005): inline image for N ≥ 1, and prob(· |N) = exp (− d · M) for N = 0, where d stands for the occurrence of a certain species (that is, the proportion of presences records for a focal species in all the records). For the pseudo-absence dilemma, the probability of absence in the cell, given that all n records reported were absence, is (see Fig. S1 in Supporting Information for an illustration):

image(eqn 2)

The mean and variance of the abundance estimated in the cell can thus be given by E(N) = ∑ N · prob(N|x) and V(N) = ∑ N2prob(N|x) − E(N)2, respectively. To estimate the abundance and its variation for focal species in the entire KNP, we did not use the unbiased Horvitz–Thompson estimator as all cells in KNP have been sampled (Thompson 2002) and also that the above in-cell estimation has considered sampling intensity within each cell, which resembles stratified sampling. Consequently, the expected abundance of the focal species in the KNP is the sum of all the in-cell mean abundance, E(N), across the entire KNP. Furthermore, by assuming the independence of the number of individuals among different cells (i.e. setting the covariance to zero), we can estimate the minimum variance of total abundance as the sum of the in-cell variances across the landscape, from which the confidence interval of total abundance (and density) can be derived.

The above process yielded a spatially heterogeneous map of in-cell abundance for each species at a resolution of 4 × 4 km. This spatial heterogeneity of cell-specific local abundance and detection rate reflects spreading history, demographics and the spatial structure of suitable habitat for focal species (Guisan & Thuiller 2005). To describe the spatial heterogeneity of species distributions, we calculated the spatial autocorrelation of focal species using the first-distance class Moran’s (1950) I index that describes the degree to which points in space are correlated (Hui, Veldtman & McGeoch 2010). The mean and variance of Moran’s I can be estimated by permutation tests (e.g. Rodríquez & Delibes 2002). Specifically, the mean (i.e. equals the negative reciprocal of the number of cells minus one) and variance of spatial randomness are only dependent on the spatial configuration of the landscapes, not on the number of individuals within cells. The Moran’s I for each species was then calculated using the mean abundance within cells (E(N)).

Optimal sampling effort

Since incorporating the detection rate in abundance estimation for LSMPs is not common, choosing appropriate sampling schemes and optimal efforts for the above Bayesian estimation model, and other similar models, is necessary (e.g. Thompson & Seber 1996). The following re-sampling tests take the abundance estimated from the above Bayesian method as true and thus enable the accuracy of estimates under different sampling efforts to be evaluated (see Appendix S1 in Supporting Information for tests using simulated species). To determine the most-efficient sampling scheme, we first let s denote the total number of records reported (a sum of presences and absences), i.e. the sampling effort. For each unit of sampling effort, only one cell can be visited and the chance of reporting a presence record is equal to the detection rate, i.e. the proportion of expected true presences estimated for the focal species within the cell (E(N)/M). Five sampling schemes were initially examined, including weighted (the sampling effort is allocated according to observations from the CyberTracker data), uniform (i.e. equal-weighted cells), addictive (the ranger tends to visit the cells having more presence records), elusive (the ranger will try to avoid visiting the cells with presence records) and random-walk (the ranger will randomly choose a cell adjacent to the cell visited at the last time). Although other sophisticated sampling schemes exist with which to increase the sampling efficiency (e.g. adaptive cluster sampling; Dryver & Thompson 2005), they are difficult to implement in practice, especially given the species-specific spatial variation and the multifaceted targets of KNP’s monitoring programme. The abundance and spatial autocorrelation of IAPs were estimated for different sampling effort using the Bayesian method. The performance of such re-sampling was evaluated by the accuracy index, defined as the proportion deviation in estimating the abundance and spatial autocorrelation from those estimations using full records: A = Abs[(Estimatesample−Estimatefull)/Estimatesample] (Hui, McGeoch & Warren 2006). The optimal sampling effort was thus defined as the minimum sampling effort for estimating abundance and distributional structure at a satisfactory level (we chose A < 0·05). Of more importance though, we examined how the estimations change as a function of sampling effort.

Results

The abundance and spatial autocorrelation of all 167 IAP species were calculated by the Bayesian method at a resolution of 4 × 4 km. The data were dominated by IAP species with only a single presence record and density estimated less than 0·1 km−2 (grey bar in Fig. 2). The abundance estimations of the rest of the IAP species formed a log-normal form of species-abundance distribution (white bars in Fig. 2; Kolmogorov–Smirnov test, D = 0·1, > 0·05). We only present data for those species with at least 22 reported presence records, representing a relatively reliable estimation for the 29 most recorded species (Table 1). These 29 species represent 95% of the presence records, with a total of 2·22 million IAP individuals estimated. The mean density of these 29 species ranges from 1·29 km−2 for Cardiospermum grandiflorum to 28·2 km−2 for O. stricta. Although not significant, six species were found to have a negative value of Moran’s I (Table 1; note the expected value for randomness is = −0·00075); other species were significantly positively autocorrelated, including the two most abundant species: O. stricta and L. camara (Fig. 1c,d; see Fig. S2 for others).

Figure 2.

 Species-abundance distribution of the 167 invasive plant species in the Kruger National Park, as estimated from the Bayesian model. The grey bar shows an overwhelming number of rare invasive plant species; by excluding the rare species the white bars follow a log-normal shape.

Table 1.   The number of records, abundance (with the 95% confidence interval of density, km−2) and distribution estimations (Moran’s I; all positive value are significantly autocorrelated), as predicted from the Bayesian model, and the optimal sampling effort for abundance estimation (OSEA; records per km2) and distribution estimation (OSED; records per km2), as predicted from the relationship between sampling effort and these estimations (Figs 4, S3 and S4), for the top 29 recorded invasive species in Kruger National Park
SpeciesRecordsAbundance95% CI95% CIMoran’s IOSEAOSED
Opuntia stricta2002954924325·87730·4990·063821·593·0
Lantana camara23321448855·5699·3030·035635·6024·7
Opuntia spp.14831142054·0197·7030·028441·3923·3
Chromolaena odorata327802342·2715·965−0·000748·6681·1
Pistia stratiotes218764852·0895·7620·007951·8171·1
Parthenium hysterophorus204727361·9035·563−0·000652·63114·4
Eichhornia crassipes199782912·1815·8550·011151·7664·0
Xanthium strumarium176728061·9075·566−0·000851·36109·8
Azolla filiculoides160706961·8035·4540·005353·4191·9
Argemone spp.135693161·7405·3750·004953·73137·8
Senna spp.110708421·8305·4420·006259·20165·9
Xanthium spp.98650511·5945·0830·005261·90193·4
Cardiospermum halicacabum96695661·7755·3660·007062·43136·5
Argemone ochroleuca93629471·4565·006−0·000760·15116·4
Ageratum spp.88623751·4364·9660·006658·26135·1
Ricinus communis83610691·3804·8880·006159·94168·2
Argemone mexicana71550931·1104·545−0·000767·37124·3
Catharanthus roseus63518720·9774·3470·005277·29239·1
Arundo donax63511600·9414·3100·004878·28263·3
Datura inoxia59505250·9284·2580·006079·10262·1
Datura stramonium46444430·6993·8620·007282·24411·7
Ageratum conyzoides45422210·5973·7370·005884·00437·3
Melia azedarach37363840·3773·3580·004888·67321·1
Senna occidentalis35349910·3293·2630·004688·90343·3
Datura ferox26275550·0222·806−0·000891·7951·3
Zinnia peruviana25267130·0542·6880·004697·86628·9
Nicotiana glauca25266900·0532·6870·003597·791339·7
Psidium guajava24253630·0032·6000·004498·84909·2
Cardiospermum grandiflorum23251560·0122·5700·0041100·00184·3

Using randomization tests, we showed that the Bayesian estimations for all five sampling schemes approached the known abundance and distribution for simulated species, with the uniform sampling scheme performing best (Appendix S1). Therefore, we only demonstrated the results from the uniform and weighted sampling schemes in detail; these schemes represent an ideal and a self-organized realistic sampling scheme respectively. Both schemes can also be easily implemented in future monitoring (i.e. equal weight or maintaining current patrol protocol). With the increase in sampling effort, estimations of these 29 species from both uniform and weighted sampling schemes, as depicted by the abundance–rank curves (Fig. 3a) and species–aggregation curves (Fig. 3b), converged to the original Bayesian estimations, yet with different converging rates and directions. Although both schemes tend to overestimate species abundance at low sampling effort, estimations from the weighted sampling scheme converged at a much slower rate than with uniform sampling, with 0·5 (2) million records of weighted sampling producing almost the same results as 0·1 (0·5) million records of uniform sampling (Fig. 3a). This suggests that equal-weight uniform sampling is the most-efficient sampling scheme for abundance estimation. Furthermore, estimations from both sampling schemes performed well in estimating the spatial autocorrelation, although the weighted sampling scheme approached the original Bayesian estimation from the positive autocorrelation (or rather randomness) side and uniform sampling from negative autocorrelation side (Fig. 3b). Low sampling effort of uniform sampling tended to identify the spatial structure as randomness, in contrast to the tendency of significantly positive autocorrelation identified by low-effort weighted sampling (Fig. 3b).

Figure 3.

 The abundance–rank curves (a) and the species–aggregation curves (b) for the top-recorded 29 invasive plant species in Kruger National Park, under different sampling schemes and sampling efforts. The solid and dashed straight lines in (b) indicate the mean and confidence interval of Moran’s I for randomly distributed species.

The optimal sampling effort was calculated for the most-efficient sampling scheme, uniform (equal-weight) sampling, although similar results can be expected for weighted sampling. The relationship between sampling effort and the accuracy in abundance estimation followed a clear exponential relationship: A = c1 Exp(−c2 × s), where c1 and c2 are constants (R2 > 0·94 for all 29 species; Fig. S3). The optimal sampling effort for abundance estimation (OSEA) was then estimated for A = 0·05, i.e. OSEA = −Ln(0·05/c1)/c2 (Table 1). A strong interspecific power law relationship between the OSEA and the IAP density emerged (Fig. 4a). The relationship between sampling effort and the accuracy in distribution estimation was much more diverse, from the unimodal form for all six species with negative values of Moran’s I, to the power law and exponential forms for other IAP species (see Fig. S4 for details). The optimal sampling effort for distribution estimation (OSED) was then estimated accordingly for A = 0·05, which also led to a power law relationship between the OSED and IAP densities (Fig. 4b). A power law relationship between the OSEA and the OSED also emerged (Fig. 4c), suggesting an inner positive relationship between species density and aggregation. Overall, a spatially random, rare species required a much larger sampling effort to achieve the satisfactory level of accuracy. An average of 67·45 records per km2 was required for abundance estimation of the top 29 species, 246·6 records per km2 for distribution estimation (Table 1), comparing to the current level of 121·14 records per km2.

Figure 4.

 The relationship between the density of invasive plants and the optimal sampling effort for abundance estimation (OSEA) (a), between the aggregation, as measured by Moran’s I, and the optimal sampling effort for distribution estimation (OSED) (b), as well as between the OSEA and OSED (c).

Discussion

Large-scale monitoring programmes need cost-efficient methods for estimating the abundance and distribution of target species. Traditional mensuration methods of estimating abundance, such as mark–recapture techniques, are only useful at local scales (e.g. 0·1–10 km2 for complete counts) due to the method of data collection and the associated costs. This has led to an increased interest in the use of binary (presence/absence) data for LSMPs (Brotons et al. 2004; Joseph et al. 2006). In this regard, two categories of abundance estimation models have been developed. The first is designed for true presence-absence binary data and includes the intraspecific occupancy–abundance relationship that is grounded in the ubiquitous positive correlation between species abundance and range size (Nachman 1984; Wright 1991; Gaston & Blackburn 2000; He & Gaston 2003; Hui & McGeoch 2007) and the scaling pattern of occupancy describes how adjacent occupied cells merge with increasing grain (Hartley & Kunin 2003; Hui, McGeoch & Warren 2006; Lennon et al. 2007; Gaston & Fuller 2009; Hui 2009). A multi-criteria test suggests the supremacy of the scaling-pattern-of-occupancy models over the occupancy–abundance–relationship models in estimating abundance and yielding macroecological patterns (Hui et al. 2009).

The other category is designed to tackle the pseudo-absence problem inherent in binary data and mainly uses the maximum likelihood method to reconcile the imperfect detection or presence-only data (e.g. Mackenzie et al. 2002; Royle, Nichols & Kéry 2005; Pearce & Boyce 2006; Zhou & Griffiths 2007; Nichols et al. 2008). Here, a compromise often has to be made for the alleviation of over-parameterization. The Bayesian model we present here contributes to the second category of models for dealing with the abundance estimation and pseudo-absence dilemma simultaneously, but follows the statistical philosophy of the Bayesian School. Such models need further investigation given their obvious practical value (Royle et al. 2007; Royle & Dorazio 2008).

Studies on biological invasions at the level of entire community assemblages are limited (Mason & French 2008) due to the lack of evidence that invasive species can form a separate community in the invaded areas and also the difficulty of quantifying such a community if it exists. The species-abundance distribution (Fig. 2) in this study provides clues for both problems. Invasive species in KNP can be categorized into two groups: the recorded but not established (grey bar) and the recorded and established species (white bars in Fig. 2). This two-group bimodal species-abundance distribution is clearly different from the canonical unimodal log-normal (or zero-sum multinomial) shape that has been reported in numerous cases (since Preston 1948) and which is explained by various models (e.g. Gaston & Blackburn 2000; Hubbell 2001; Volkov et al. 2005; McGill et al. 2007; Haegeman & Etienne 2010).

Strong support for our two-group species-abundance distribution is provided by Magurran & Henderson (2003), who describe a very similar form for an estuarine fish community (Fig. 1 in Magurran & Henderson 2003). They explain such an excess of rare species by dividing the fish assemblage into two components: persistent and occasional species. Southwood (1996) reported a similar pattern in a Heteroptera insect community and divided the species into transient and core species, which is similar to Hanski’s (1982) core-satellite hypothesis. Alien species move through a series of stages during the invasion process (Richardson et al. 2000) and often experience a time-lag before rapid expansion (Crooks 2005). This provides a natural break point in the species-abundance distribution to separate those established invaders from those that have been recorded but which are still in the time-lag. Magurran & Henderson (2003) further use the residence time (or persistence) to identify such a break point. Considering the importance of residence time in determining the potential range of an invading species (Wilson et al. 2007; Pyšek, Křivánek & Jarošík 2009), it is necessary to further investigate the relationship between residence time and IAP abundance, and verify whether this residence time can be used to categorize IAP species into core and satellite groups.

Ecological studies should always be carried out in an equal-weight fashion for further statistical inference (Quinn & Keough 2002). Our results confirmed this rule of thumb, even for LSMP. Weighted and sequential sampling schemes, such as addictive and elusive, overestimated the abundance of IAPs (Appendix S1), and estimations from weighted sampling approached the real scenario at a much slower rate than the uniform sampling (Fig. 3). This suggests that the future patrol (or sampling) protocol in the CyberTracker monitoring programme (in terms of IAPs) should focus more on the under-sampled areas to counter the current highly skewed sampling weights (an average 121·14 records per km2, with a 95% confidence interval of 2·6 to 510·2 records per km2). This means a minimum in-cell sampling effort of between 68 and 247 records per km2 for monitoring abundance and distribution, respectively.

Tasked with conserving biological diversity, translating advances in science into management action is a key requisite and challenge for conservation agencies. Rapid advances in science are providing a wealth of insights on crucial facets of issues that are important to conservationists, yet few of these advances are operationalized for practical implementation in the field. One of the key challenges in conservation biology is finding appropriate ways of using information effectively (Richardson & Whittaker 2010). Data collected in the KNP for the purposes of both management and LSMPs provided the opportunity to explore options for improving the cost-efficiency of utilizing available data for monitoring and other purposes. The Bayesian method derived here provides management with an accurate and practical approach for monitoring invasive alien plants over a large area. This paves the way for improved prioritization of various interventions to address problems associated with biological invasions in protected areas.

Acknowledgements

We are grateful to G. Blanchet, G. Cruz-Piñón, F. He, K.J. Gaston, I. Kühn, W.E. Kunin and S. Hartley for comments and logistic help, and SANParks and the DST-NRF Centre of Excellence for Invasion Biology for financial support. C.H. acknowledges support from the NRF Blue Sky Programme. L.C.F. acknowledges support from the NFR Incentive Funds Programme. D.M.R. acknowledges support from the Hans Sigrist Foundation.

Ancillary