## Introduction

Species richness, the number of unique species in a defined area, is the most commonly used measure of biological diversity (Gaston 1996; Moreno *et al*. 2006). Species richness (SR) can be used to delineate protected areas, monitor biological systems and investigate environmental relationships. Surveys rarely encounter all of the species in an area; therefore, numerous estimators have been proposed to improve upon the negative bias of raw counts.

Three categories are regularly used to classify SR estimators (Colwell & Coddington 1994). The first category includes extrapolative methods applied to species accumulation curves or species–area curves. The Michaelis–Menten equation (Michaelis & Menten 1913), negative exponential model (Holdridge *et al*. 1971) and power model (Arrhenius 1921; Tjørve 2009) are commonly used to extrapolate to an estimate of SR at some large sample or large area.

A second category includes parametric estimators that make assumptions about the underlying species-abundance distribution or species detection probabilities (*p*). One type of parametric estimator uses a fitted distribution, often either a log-normal or log-series. For this category, required steps such as estimating total abundance and selecting the discrete abundance classes to which a continuous distribution is fit are often prohibitive (see Colwell & Coddington 1994; Magurran 2004). There are also parametric estimators based on the assumption that *p* is constant across species.

A third category includes nonparametric estimators, which are those that are neutral on the probability distribution from which parameters are drawn. Many of the nonparametric SR estimators were originally derived from methods to estimate the number of individuals in a closed population (e.g. Burnham & Overton 1978; Chao 1984; Pledger 2000).

The search for a single best estimator has not yet been resolved. However, general comparisons of the three estimator categories favour the nonparametric methods (see table 1 in Cao, Larsen & White 2004; table 3 in Walther & Moore 2005). Nonparametric estimators are therefore the focus of this project.

The performance of nonparametric SR estimators can be affected by species- and assemblage-level attributes as well as by survey design parameters, hereafter collectively referred to as factors (Keating & Quinn 1998; Brose, Martinez & Williams 2003). Several studies have indicated that bias decreases as species-abundance distributions become more even (Wagner & Wildi 2002; O'Dea, Whittaker & Ugland 2006). One assumption of the closed population estimators, translated for species data, holds that species are equally detectable across space. Spatial aggregation regularly challenges this assumption (Schmit, Murphy & Mueller 1999). Other factors found to affect SR estimator performance include the number of species (Keating & Quinn 1998; Poulin 1998), total abundance or density of individuals (Baltanás 1992; Walther & Morand 1998) and species detection probability, *p* (Boulinier *et al*. 1998).

Raw sample data and consequently, SR estimates, are also affected by survey design parameters such as effort (Burnham & Overton 1979; Brose, Martinez & Williams 2003). Additionally, survey configuration has been important to other estimation issues (Reese *et al*. 2005). Selecting survey locations randomly is unbiased and therefore preferable; however, survey locations are often selected based on accessibility and previous results (Beck & Kitching 2007). The above factors can all affect sample coverage (*sc*), which is the proportion of a species pool represented in a sample and the single most important factor with respect to estimator performance (Baltanás 1992; Brose, Martinez & Williams 2003). Unfortunately, one needs to know the true number of species to calculate *sc* and, if this information were available, estimation would be unnecessary. It is therefore important to understand how the aforementioned factors affect performance.

Evaluating SR estimators across a wide range of factors in the field is difficult because of temporal, financial and logistical constraints as well as uncertainty about species- and assemblage-level parameters. Despite the simplifications, simulations are advantageous because they can be systematically varied and randomly surveyed, and most important, the true number of species is known. Our objective therefore was to develop a program in which specified parameters are used to simulate and survey species assemblages, thereby revealing the behaviour of SR estimators in a controlled setting. Most, if not all, of the programs currently available for estimating SR, for example, EstimateS (Colwell 2006), SPADE (Chao & Shen 2010) and ws2m (Turner, Leitner & Rosenzweig 2003), process existing encounter history data (information indicating whether a species was encountered during a particular survey occasion), but include little or no simulation capability. In addition, SimAssem includes a more comprehensive suite of SR and variance estimators than other programs.