1. Plant ecologists have been rather slow to appreciate the existence and the effects of imperfect detection probability in plants. Sources of heterogeneous detectability include differences in morphology or life-form, patch size, observers and survey effort. Understanding the relationship between such factors and detectability is crucial for the efficient design of new plant distribution studies and for the interpretation of existing ones.
2. We have studied the factors affecting detectability in a large permanent plot (24 ha) in East China where the true distribution of six shrub and tree species was known from a detailed earlier inventory. Two observers independently resurveyed and recorded detection and non-detection of each species in each 20 × 20 m sampling quadrat. A total of 288 quadrats were resurveyed (218 by observer A, 211 by observer B and 141 by both). We used generalized linear mixed modelling to study the relationships between detection and species, observer, survey effort and patch size.
3. Detectability of an occupied quadrat was remarkably low and ranged from 0.09 to 0.34 on average for the six shrub and tree species. Differences of detection among species were mainly as a result of distinctive morphology rather than life-form. There was no significant difference of overall detection probability between the two observers. Detectability increased to 0.95 as the survey path approached 20% area of the sampling quadrat and as a plant patch covered c. 19% of the area of the sampling quadrat.
4.Synthesis. Our results suggest that imperfect detection is much more widespread than currently acknowledged by most plant ecologists. We identify several sources of heterogeneity in detectability (species, survey effort and patch size) that ought to be considered when studying and modelling the distribution of plant species. Detectability should be accounted for in plant distribution studies to avoid spurious inferences.
The study of the geographical distribution of species lies at the heart of ecology (Krebs 2000). Distribution is frequently investigated by the occupancy metric (MacKenzie et al. 2002, 2006), that is, by the proportion of a study area that is occupied by a target species. However, some authors have long acknowledged that a species may go undetected in a survey even when it is actually present within a sampling unit (Kéry 2002, 2004; MacKenzie et al. 2002, 2006; Kéry et al. 2006). Thus, the observation of a ‘zero’ (an apparent absence) may represent either the true absence of a species or alternatively, non-detection in spite of presence of that species. That is, true occurrence and detection are confounded in our observations of occurrence.
Unfortunately, this situation does not appear to be sufficiently acknowledged at present in plant distribution studies. This can be seen, for example, in the improper use of the term ‘presence–absence data’ for distribution observations that are in fact ‘detection–non-detection data’. Observed ‘absences’ may in reality be ‘false absences’ (Royle & Nichols 2003; Tyre et al. 2003), i.e. non-detected presences.
Not accounting for the possibility of ‘false absences’ may have serious consequences in many respects. Geographical range size and extent of occurrence will be underestimated with imperfect detection (Kéry 2002; Anderson 2003). Inferences about habitat selection will be misleading, especially if detectability, not only true occurrence, depends on the habitat (Gu & Swihart 2004). Finally, local extinction rates will be overestimated (Williams, Nichols & Conroy 2002; Kéry 2004; Kéry et al. 2006), as will turnover rates.
The solution to the problem of confounded occurrence and detection lies in conducting replicate observations of a closed system (Burnham & Overton 1978; MacKenzie et al. 2002, 2006; Royle & Nichols 2003; Tyre et al. 2003). The pattern of detection–non-detection of a species at an occupied site yields the information about detection probability, which enables us to correct the observed distribution for imperfect detection as well as to recover unbiased functional relationships between habitat covariates and true occurrence.
Detection probability may vary in time because of survey-specific conditions and in space owing to site-specific characteristics (Bailey, Simons & Pollock 2004). In addition, factors such as the size of a plant patch, plant architecture and growth form, and differences among observers have been hypothesized to affect plant detection probability (Kéry et al. 2006). However, much remains to be learnt about the effects of such factors on detection probability.
In this article, we highlight the effects on detection probability based on a resurvey of six shrub and tree species in a large (24 ha) permanent plot in East China in 2007. The true occupancy state of these species in each 20 × 20 m quadrat is known from an intensive, original inventory conducted in 2005. In 2007, two observers independently resurveyed the plot once each and we analysed the resulting detection–non-detection data to gain insight into the factors affecting detection probability of these plant species.
In our study, we conditioned the analysis on the quadrats that were known to be occupied in the 2005 inventory. A zero observation thereby was assumed to represent the overlooking of a species rather than its absence, and hence, we directly modelled detection events and did not need to account for possible non-occurrence by use of a site-occupancy model (MacKenzie et al. 2002). For each species, the probability of detecting at least one individual in each quadrat was investigated in relation to observer, survey effort and patch size. In addition, species-specific detection probability was explored in relation to differences in morphology and life-form. Thus, we were able to study not only survey-specific factors but also site- and species-specific factors that influence heterogeneous detection probability.
Materials and methods
Study site and species
We conducted our study at Gutianshan (GTS) permanent plot (29°15′ N, 118°07′ E), located within the Gutianshan National Nature Reserve, in Kaihua County, East China. According to records from 1958 to 1986, annual mean temperature in this region is 15.3 °C and annual mean precipitation is 1964 mm (Yu et al. 2001). The reserve was designed to protect the old-growth evergreen broad-leaved forest in this region and 1426 seed plant species have been recorded in the reserve (Legendre et al. 2009). Our study plot covered a rectangle of 24 ha montane uplands (446–715 m a.s.l.), divided regularly into 600 20 × 20 m quadrats. The vegetation in the plot is evergreen broad-leaved forest (Wu 1980). Dominant species are Castanopsis eyrei (Fagaceae), Schima superba (Theaceae) and Pinus massoniana (Pinaceae). According to the 2005 inventory, 159 seed plant species (belonging to 49 families) occur within the plot (see Legendre et al. (2009) for more information about GTS plot). These 159 species are divided into 63 canopy tree species, 70 understorey tree species and 26 shrub species (Lai et al. 2009). The vegetation is dense and thick with a c. 12-m high canopy layer, a rather closed, c. 5-m high understorey and a dense, c. 1.8-m high shrub layer. In total 140 676 individuals with diameter at breast height (d.b.h.) ≥ 1 cm were recorded in the 2005 inventory. The shrub species contain 13% of the total individuals, the understorey species of 57% of the individuals and the canopy species of 30% of the individuals.
We selected six focal species to conduct a resurvey in the plot (Table 1). The choice was based on different conspicuousness in terms of their mean height, different leaf size and leaf colour. To determine the age structure of the six species in the plot, we took the d.b.h. data from the 2005 inventory and divided them, for each species, into 10 classes using the kmeans function in R 2.8.1 (R Development Core Team 2008). The distributions of d.b.h. for all six species were skewed toward young stages (Fig. 1), indicating an uneven age structure.
In 2005, a detailed inventory was conducted within the 24-ha plot. Over 9 months, a team of 20 people measured the d.b.h. and identified, mapped and tagged the individual trees. Coordinates of all stems with d.b.h. ≥ 1 cm were mapped and numbered tags were attached to the stems in the field.
In 2007, two observers (GC and JZ) conducted the resurvey of six species in the plot. Neither of them participated in the original inventory in 2005, hence, they had no information about the distribution of the six species in the plot. As one of them was more experienced than the other, they conducted a 5-h training session on 4 December 2007. They first studied specimens of the six focal species and then walked in the community next to our study plot to familiarize themselves with the search image of these six species, i.e. with how the species look in their natural environment.
On 5, 6 and 8 December 2007, between 9 am and 3 pm, the two observers independently surveyed the plot along the main path within the plot, which was formed during the 2005 inventory. Some segments of the path were not obvious, but most were clearly recognizable. The observers walked through the quadrats along the path. As they stepped into a quadrat they took a photograph and recorded the tag of the first individual of any of the six focal species they found. At the same time, they recorded other individuals of the six focal species if they noticed any of them in the same quadrat. Each species was recorded at least one time in each quadrat. Then, they walked down the path to the next quadrat. To record a detailed survey path, the observers took a photograph of a numbered tag about every 2–3 m on their survey path.
Numbered tags were recorded along the survey path and coordinates of these tags can be obtained from data base of the GTS plot. A detailed survey path can thereby be reconstructed for each observer. Each quadrat (20 × 20 m) was divided regularly into 400 grid cells (1 × 1 m); hence, the number of grid cells covered by the survey path in each quadrat can be computed. We define the number of grid cells covered by the survey path within each quadrat as a measure of survey effort in that quadrat. It indicates roughly how much area of the quadrat has been surveyed. Besides, coordinates of all stems of the six species in each quadrat had been recorded in the 2005 inventory. The total number of 1 × 1 m grid cells occupied by each species in each quadrat was defined as a measure of patch (‘target’) size. It indicates roughly how much area of the quadrat was occupied by the species.
Thus, for each quadrat surveyed, we had information on the observer who conducted the survey, survey effort, species patch size and species detection–non-detection. Assuming a stable occurrence state of the six species in each quadrat from 2005 until 2007, we tested the relationship between detection and non-detection and the following explanatory variables: observer, survey effort and patch size.
The original inventory in 2005 is close in time to our survey in 2007 and hence we conditioned on those quadrats where each species was known to be present 2 years before and an observed ‘zero’ was assumed to mean that a species had been overlooked.
Conditioning on species presence in a quadrat, we estimated detection probability and tested its relationships with explanatory variables using a generalized linear mixed model (GLMM; Breslow & Clayton 1993; Kéry 2002). We fitted a random quadrat effect to account for the non-independence of detections in the same quadrat owing to the effects of unmeasured factors. Fixed effects in this model were species, observer, survey effort and patch size as well as all pairwise interactions between the main explanatory factors. Under this model, mean detection probability Pij for species i in quadrat j can be written as:
In this model, α0 is the logit-linear mean for species 1 and observer 1, α1 is a vector of coefficients of species 2–6 and x1 denotes the indicator variables for species 2–6, α2 is the effect of observer 2 and x2 denotes the corresponding indicator variable, α3 is the effect of survey effort for species 1 with x3 the corresponding covariate value and α4 is the effect of patch size of species 1 with x4 the corresponding covariate value. Next are the interaction effects: α5 is a vector of coefficients for the species by observer effects, α6 is a vector of coefficients for the species by effort effects, α7 is the coefficient for the observer by survey effort effect, α8 is a vector of coefficients for the species by patch size effects, and α9 and α10 are the coefficients of the observer by patch size and the survey effort by patch size interactions respectively. Finally, δj is the random quadrat effect assumed to come from a zero-mean normal distribution with variance σ12, and eij is an overdispersion term assumed to come from another zero-mean normal distribution with variance σ22.
For inference about the fixed effects, we used the Wald statistic (McCulloch & Searle 2001). To explore the functional form of the relationship between these factors and detection probability, we used our model to form predictions of detection probability for each statistically significant factor. We fitted the GLMM using the statistical package GenStat (Payne et al. 2006).
A total of 288 quadrats were surveyed: 218 quadrats surveyed by JZ, 211 by GC and 141 by both of them. Averaging over all quadrats for two observers separately, the mean survey time in each quadrat ranged from 4 to 7 min. Quadrats detected by at least one observer ranged from 24 to 60. Camellia fraterna had the highest proportion (29%) of quadrats detected (Table 2).
Table 2. Detection information of the six species in our study. For each of the six species, the number of quadrats known to be occupied is based on the 2005 inventory. Number of quadrats detected and the associated proportions detected are based on the 2007 resurvey in the plot
Number of quadrats known to be occupied
Number of quadrats detected in 2007
Proportion of quadrats detected (%)
Neolitsea aurata var. chekiangensis
The two observers did not differ significantly in their detection probability (Table 3) and were similar at detecting all six species (i.e. there is no significant species × observer interaction, Table 3).
Table 3. Relationships between plant detection probability and several explanatory variables under a generalized linear mixed model for all six species combined in the Gutianshan permanent plot. For all explanatory variables, both their main effects as well as their pairwise interactions are included. Estimated variance component for the effects of quadrat = 0.6013 (SE = 0.1619)
Source of variation
Species × observer
Species × survey effort
Observer × survey effort
Species × patch size
Observer × patch size
Survey effort × patch size
In contrast, detection probability was significantly different among six species (Table 3; Fig. 2). Myrica rubra had the highest value of 0.34, while Neolitsea aurata var. chekiangensis was the lowest at 0.09 (Fig. 2). Thus, detection probabilities for the six species were remarkably low in our study.
As expected, detection probability increased with survey effort (Table 3; Fig. 3). Among all visited quadrats, survey effort ranged from 5 to 44 m2 and the associated estimates of detection probability ranged from 0.11 to 0.58. The increased pattern did not differ between two observers, as there was no effort × observer interaction. Likewise, the increase did not differ among species, as there was no species × effort interaction either (Table 3).
Detection probability increased with plant patch size (Table 3; Fig. 4). The estimated detection probability ranged from 0.13 with patch size of 5 m2 to 0.98 with patch size of 90 m2. Detection probability increased to 0.95 (i.e. ‘almost certain detection’) with a patch size of 75 m2. Still, the relationship between detection probability and patch size did not depend on the observer (no patch size × observer interaction, Table 3), or the species (no species × patch size interaction, Table 3). Finally, a joint effect of patch size and survey effort on detection probability existed in our case (Table 3; Fig. 5).
Differences in detection probability among species
We can think of two possible explanations for differences of detection probability among six species: life-form and distinctive morphology. In our study, life-form was related to plant height. According to the 2005 inventory, most individuals of C. fraterna and large number individuals of N. aurata var. chekiangensis were in the same shrub layer. While C. fraterna was detected most reliably, N. aurata var. chekiangensis had the lowest detection. Likewise, detection rates of the other four species differed significantly, even though the species were in a similar understorey layer. That is to say, similar height did not mean similar detection probability. Life-form, therefore, did not affect detection to the extent we expected.
On the other hand, distinctive morphology had a significant effect on detection probability. Camellia fraterna had the highest proportion of quadrats in which the species was detected because a small number individuals of C. fraterna had white flowers at the time of the survey. White flowers constitute a distinctive search image within the surrounding vegetation, and thus are easily detectable. We have to note that Camellia chekiangoleosa has conspicuous flowers and M. rubra has distinctive fruits. Obviously therefore, to get an unbiased estimate of distributions of these species, field work should be conducted during the species’ flowering (or fruiting) season. However, Ternstroemia gymnanthera, Symplocos stellaris and N. aurata var. chekiangensis do not have distinctive flowers or fruits that can make their search image conspicuous. Therefore, to get unbiased estimates of their distribution, much higher survey effort and more experienced observers are needed for such species.
Differences in detection probability among observers
In our study, detection probabilities of the two observers were not significantly different. This result is consistent with our expectation, as they had been trained prior to the survey. Our results therefore suggest that training may be an efficient way to reduce observer-specific heterogeneity in detection probability. Similarly, no difference exists between two observers when the inexperienced one has been trained before the survey (Kéry & Gregg 2003). Thus, we argue that training is helpful to ensure consistency of detection probability, and therefore of observed distributions, among observers. We believe that sufficient training of field personnel is important for large-scale monitoring programmes of plant diversity, where plenty of amateur observers are available, but few experts.
Differences in detection probability associated with survey effort and patch size
We predicted ‘almost certain’ detection (95%) at a survey effort of 80 m2 per quadrat, representing 20% of its area (Fig. 3). This is reasonable because the sampling unit (quadrat) was relatively small (20 × 20 m). Walking through one-fifth of its area, the observer was able to view most of the quadrat. However, survey effort was relatively low for both observers in all quadrats, with a maximum of 44 m2 (about 10% of its area) in our study. Low survey effort therefore can explain in part the low detection probabilities of the six species.
It can be assumed, however, that more individuals would be encountered and identified if the observers were to make a thorough survey of each quadrat. A higher detection probability would therefore be possible, which in fact is exactly what the predictions of detection probability mean (Fig. 3).
Similarly, averaging over species, detection probability increases to 0.95 with a target (i.e. patch) size of c. 75 m2 per quadrat (Fig. 4), representing c. 19% of quadrat area. Royle & Nichols (2003) have pointed out the direct linkage between site-specific detection probability P and local abundance N: P =1 − (1−r)N. This emphasizes that detection probability is bound to increase with local abundance of a species. Our case shows the similar trend between detection probability and patch size of a plant species (Fig. 4). Indeed, patch size in our case can be regarded as a rough measure of local abundance of a species in each quadrat. Therefore, it is even possible to estimate the distribution of local abundance over all sampling units within a region using ‘detection/non-detection’ data only (Royle & Nichols 2003; Dorazio 2007).
Interestingly, there is an interaction between survey effort and patch size, that is, a joint effect on detection probability (Fig. 5). The two sides are, to some degree, exchangeable in their effect on detection probability. Thus, for a small target, i.e. a small patch, more grid cells need to be surveyed to ensure 95% certainty of detection, while for larger targets, a much smaller effort is sufficient for that level of detection certainty (McArdle 1990).
Potential effects of community dynamics on detection probability
Our approach assumes that the system remained stable between the original inventory in 2005 and the resurveys in 2007, i.e. that no occupied quadrat was abandoned in the meantime. On the one hand, however, any mortality might lead to a decrease in patch size. That is to say, our results might overestimate patch size effect. Moreover, it is possible that some extinction took place and this would bias our estimates of detection probability. Arguably, extinction may have taken place more frequently in quadrats occupied by small than by large patches and this would exaggerate our estimates of patch size effects on detection probability. On the other hand, it is likely that some recruitment took place, thus an increase of patch size. That is to say, our results may underestimate patch size effect on detection probability. Furthermore, these effects can differ by species and that may explain part of the species differences in detection probability.
Nevertheless, there is a 5-ha forest plot adjacent to and slightly overlapping our study plot from which some information about community dynamics is available. In 2002, a total of 19 183 individuals with d.b.h. ≥ 1 cm were recorded in the original inventory of this 5-ha plot. Five years later, in 2007, a total of 19 902 individuals with d.b.h. ≥ 1 cm were recorded in the second inventory (K.P. Ma, X.C. Mi, X.J. Du & M.J. Yu, unpublished data). That is, the annual net change in population size is about 0.75% in this area and this result is within the range of turnover rate – averaging of mortality and recruitment – reviewed by Stephenson & van Mantgem (2005). The 0.75% change in population size is likely to cause a much lower change rate of proportion of quadrats occupied, which is very small compared to the detection probability of 9–34%. Therefore, we believe that our estimates of detection probability were hardly affected at all by community turnover.
Imperfect detection in plant distribution studies
There is a famous saying by Harper (1977) that plants are easier to study than animals as they do not run away. However, it is now clear that they can be difficult to detect, and this has, in our opinion, not been acknowledged sufficiently widely in the plant ecology community. In our current study, even for moderate-sized shrubs and little trees, we found detection probabilities of occupied 20 × 20 m quadrats that were far below 1. The consequences of such imperfect detection for plant distribution studies are clear: it will result in a – possibly serious – underestimation of the true distribution of a species. Furthermore, as ‘target’ (i.e. patch) size in our survey was also related to detection, it is clear that distribution studies that do not correct for imperfect detectability may also result in a size-biased sample of occurrences (i.e. larger patches are more likely to be detected than smaller patches). Finally, observed distributions of different species will be biased to different degrees and studies employing different field efforts will not be comparable.
In general, imperfect detection will lead one to underestimate the distribution of a plant. Furthermore, any relationships that may exist between detection and other factors, e.g. habitat covariates, will erroneously be ascribed to true occurrence unless a proper field design and suitable analytical methods are used (Kéry & Schmidt 2008). If comparable detection probability cannot be ensured across the desired dimensions of comparison (e.g. time, space and habitat), methods must be employed that can explicitly account for imperfect detection. Site-occupancy models (MacKenzie et al. 2002), a sort of hierarchical logistic regression, can be applied to plant distribution data (i.e. detection–non-detection, also naively called ‘presence–absence’ data) provided that at least some sites are surveyed more than once, allowing detection to be formally estimated. From the pattern of detection and non-detection of the species at occupied sites, one can estimate true distribution free from any distorting effects of detection probability. These useful methods deserve to be known and used much more widely in plant distribution studies.
Two anonymous referees provided valuable comments that significantly contributed to improvement of our paper. We thank D.I. MacKenzie for comments on occupancy. Drs Mi Xiangcheng and Ren Haibao established the 24-ha permanent forest plot in Gutianshan, Zhejiang Province, China. This work was financially supported by a Key Innovation Project of Chinese Academy of Sciences (KZCX2-YW-430).