Methods for wildlife monitoring in tropical forests: Comparing human observations, camera traps, and passive acoustic sensors

Wildlife monitoring is essential for conservation science and data‐driven decision‐making. Tropical forests pose a particularly challenging environment for monitoring wildlife due to the dense vegetation, and diverse and cryptic species with relatively low abundances. The most commonly used monitoring methods in tropical forests are observations made by humans (visual or acoustic), camera traps, or passive acoustic sensors. These methods come with trade‐offs in terms of species coverage, accuracy and precision of population metrics, available technical expertise, and costs. Yet, there are no reviews that compare the characteristics of these methods in detail. Here, we comprehensively review the advantages and limitations of the three mentioned methods, by asking four key questions that are always important in relation to wildlife monitoring: (1) What are the target species?; (2) Which population metrics are desirable and attainable?; (3) What expertise, tools, and effort are required for species identification?; and (4) Which financial and human resources are required for data collection and processing? Given the diversity of monitoring objectives and circumstances, we do not aim to conclusively prescribe particular methods for all situations. Neither do we claim that any one method is superior to others. Rather, our review aims to support scientists and conservation practitioners in understanding the options and criteria that must be considered in choosing the appropriate method, given the objectives of their wildlife monitoring efforts and resources available. We focus on tropical forests because of their high conservation priority, although the information put forward is also relevant for other biomes.

monitoring: (1) What are the target species?; (2) Which population metrics are desirable and attainable?; (3) What expertise, tools, and effort are required for species identification?; and (4) Which financial and human resources are required for data collection and processing? Given the diversity of monitoring objectives and circumstances, we do not aim to conclusively prescribe particular methods for all situations. Neither do we claim that any one method is superior to others. Rather, our review aims to support scientists and conservation practitioners in understanding the options and criteria that must be considered in choosing the appropriate method, given the objectives of their wildlife monitoring efforts and resources available. We focus on tropical forests because of their high conservation priority, although the information put forward is also relevant for other biomes.

K E Y W O R D S
automated classification, camera trapping, evidence-based conservation, passive acoustic monitoring, wildlife conservation, wildlife monitoring methods

| INTRODUCTION
Monitoring wildlife is an essential component of conservation (CMP, 2020;Nichols & Williams, 2006;Salafsky et al., 2001). Evidence-based conservation efforts, datadriven decision making for adaptive management, and sustainable use of natural resources, are all based on the premise that population declines can be detected in a timely manner (e.g., Díaz et al., 2020;Grooten & Almond, 2018). Monitoring objectives can range from assessing species presence/absence, to knowing the exact density of one or more species. Monitoring data are used across multiple scales, from local (site-level) to national, regional, and global scales (e.g., as indicators for global biodiversity goals, the IUCN Red List of Threatened Species, the CITES Appendix status of taxa, and to formulate species-specific IUCN Action Plans; Brooks et al., 2015;IUCN, 2020;Pereira et al., 2013;Stephenson, 2019).
Tropical forests harbor a large proportion of the world's terrestrial wildlife (Myers et al., 2000). At the same time, tropical forests are a particularly challenging environment for wildlife monitoring, due to limited visibility in often dense understory, and the diverse, cryptic nature and low densities of many animal species. The complex nature of tropical forests comes with low and variable detection probability (Sollmann et al., 2013), risk of bias related to the timing and location of observations (Cusack et al., 2015), the effort required for species identification, and the cost of data collection. As each monitoring method has its advantages and limitations, and resources are often limited, it can be complicated to select a suitable monitoring method (Stephenson, 2020;Stephenson et al., 2020).
Here, we review the advantages and limitations of the three mentioned methods-observations by humans, camera traps, and passive acoustic sensors-for wildlife monitoring with a focus on tropical forests given their high conservation priority, although the information we provide is also applicable in other biomes. The methods considered typically target terrestrial vertebrate wildlife, but we also consider application to invertebrates where relevant. Our aim is to objectively facilitate the correct uptake and use of these field methods for effective, goal-oriented monitoring by scientists and practitioners (e.g., the private sector, government agencies, and NGOs). We evaluate each method by asking four key questions (Figure 1), that we believe need to be addressed before any monitoring survey: 1. What are the target species (e.g., is the target a community or a particular species)?; 2. Which population metrics are desirable and attainable (e.g., encounter rates, occupancy, or density)?; 3. What expertise, tools and effort are required for species identification?; and 4. Which financial and human resources are required for data collection and data processing? Following these four questions, all relevant characteristics of each method are summarized in Table 1.

| SPECIES COVERAGE
Every wildlife monitoring project foremost requires a clear objective with regard to its target species. Is the goal to monitor populations of particular species, or to monitor a community? Monitoring approaches differ strongly in species coverage-the number and types of species that can be detected-as well as detection biases. For species-level monitoring, the major challenge is the acquisition of sufficient data for acceptable accuracy and precision, within a manageable time and budget. A community-wide assessment requires an approach with broad and unbiased species coverage, where differences in detection probability can be estimated and accounted for.

| Human observation
Observations by humans can be direct, for example, spotting animals, or indirect, for example, recording signs such as nests, tracks, or feces (Buckland et al., 2001Laing et al., 2003). Direct observations are biased toward mammals and birds that are easy to detect because of vocalization, size, and diurnal habits, while rare, small, fossorial, nocturnal, and cryptic species are less likely to be observed (Richard-Hansen et al., 2015). The likelihood of detection may vary across the day and across seasons (Pearse et al., 2015), and by shyness and habituation-animals may be repelled or attracted by observers (Marini et al., 2009;Thomas et al., 2010). Direct observation furthermore requires highly skilled observers. Observer bias may arise from differences in skills between observers and fatigue, although these problems can be reduced by careful training, limiting the length of monitoring sessions, and limiting the number of tasks assigned to each observer (Emlen & DeJong, 1992;Kühl et al., 2008). Due to these biases, direct field observations are generally most suitable for highly detectable species, rather than for community assessments that require broad taxonomic coverage (Roberts, 2011).
Indirect observations have the advantage that signs are immobile and more abundant than the animals that produce them because they remain visible for extended periods (up to several months). Detectability is less influenced by the time of day of the survey than direct animals observations. To estimate a population size from signs, the production and the decay rate of the signs need to be known (Hedges et al., 2012;Laing et al., 2003). These rates can differ across sites and seasons. For example, the decay rates of gorilla and chimpanzee nests depend on forest type, nest height, and structure, and above all, precipitation . The decay rates of signs should thus be estimated in the same survey area and season (Laing et al., 2003;Morgan et al., 2016), which may involve substantial effort and costs (Kuehl et al., 2007). Production rates of signs are less variable, hence estimates from similar or nearby sites can be used F I G U R E 1 This review is structured along four key questions that we believe need to be considered when choosing a monitoring method T A B L E 1 An overview of how observations by humans, camera traps, and passive acoustic sensors relate to the characteristics of interest for the four questions discussed in this article (e.g., Theuerkauf & Gula, 2010). For signs such as footprints or markings left on trees, the rate of production and decay cannot be estimated, so only presence and occupancy estimates, but not density estimation is possible (Sections 3.2 and 3.3, respectively). Not all species produce signs that allow for species-specific identification (Furuichi et al., 1997;Miller et al., 2011). Genetic diagnostics, which are gaining in importance, can help in this case, even for identifying individuals, although this adds costs and complexity (Bowkett et al., 2009;Gray et al., 2013). Many species, such as most felids, do not leave sufficient species-specific signs with known production and decay rates for robust population estimates, and therefore require other monitoring methods (Borah et al., 2014).
Observations can be made either on line, point or reconnaissance transects (recces; Hedges et al., 2012). Line or point transects are predefined randomly located straight lines or points from which observations are made, allowing for distance measurements to the observed objects required for density estimation with distance sampling (Section 3.3). Recces are transects that follow a path of least resistance, that is, the easiest path to follow, without the possibility of collecting additional parameters such as distance from the transect, and can therefore only be used for encounter rates or occupancy analyses (Sections 3.1 and 3.2, respectively).

| Camera trapping
The use of camera trapping has increased rapidly over the past two decades Wearn & Glover-Kapfer, 2017). Triggered by passive infra-red sensors (Welbourne et al., 2016), camera traps record wildlife of a broad array of size classes and taxonomic groups, including mammals (Tobler et al., 2008) Note: The field methods are rated high, medium, or low, indicating a relative approximation of their suitability for each of the characteristics. minimal invasiveness. With time-lapse photography or specialized camera traps, even arthropods can be surveyed (e.g., Collett & Fisher, 2017;Hobbs & Brehme, 2017). Camera trapping is generally most effective for medium to large terrestrial animals, but can also be used to survey smaller, cryptic, and rare animals that typically go undetected by humans (Bessone et al., 2020, Glen et al., 2013Khwaja et al., 2019). Because camera traps record continuously and automatically, they are not biased by the timing of activity of the target species or observer skill or fatigue, making the collection process more standardized and transparent than with human observations. Also, every observation comes with a photograph that can be used for verification and validation. Additionally, as each observation is timestamped, camera trapping informs about activity patterns and human disturbance (Caravaggi et al., 2017;Gaynor et al., 2018;Ramirez et al., 2021). A camera trap covers only a small surface area (typically 10-20 m 2 ). This, coupled with nonrandom use of space by wildlife, makes it particularly important to carefully consider study design and placement strategy (e.g., spacing and location in relation to trails or streams). When surveying target species, detections can be boosted by placing cameras at locations known to be frequented (Cusack et al., 2015;Harmsen et al., 2010;Kolowski & Forrester, 2017) or by using lures or baits (du Preez et al., 2014;Mills et al., 2019). While this strategy can work when coupled with appropriate analytical methods that control for variation in detection (covered in Section 3.1), it must be recognized that boosting detectability for one species may have unpredictable effects on the detectability of others (Kolowski & Forrester, 2017). For example, while dominant predators may preferentially travel along larger trails (Karanth, 1995), many prey species, as well as competitor species and even subdominant individuals of the same species, may avoid these landscape features as a result .
When surveying a wildlife community, it is vital that cameras are installed at randomized locations with respect to local landscape features. Such a survey design may take the form of a systematic grid of points with a randomly allocated starting position (e.g., the TEAM protocol; Jansen et al., 2014), which is also essential when density estimation for nonindividually recognizable individuals is planned (Section 3.3), and can be stratified by habitat type if desired. The mounting height also influences the community that is effectively sampled. This happens most strongly through the exclusion of fully arboreal species with terrestrial placements, but small terrestrial species get excluded as the camera is mounted further from the ground. While canopy wildlife has effectively been studied with camera traps (Gregory et al., 2014;Moore et al., 2020;Whitworth et al., 2016), the difficulty and danger of placing camera traps in the canopy may preclude this approach for most monitoring projects.

| Passive acoustic monitoring
PAM uses acoustic sensors, often referred to as Autonomous Recording Units (ARUs), to survey wildlife by recording vocalizations and other species-specific sounds. PAM is rapidly growing as monitoring method for terrestrial wildlife (Darras et al., 2019), in addition to marine environments where it is commonly used for monitoring cetaceans. ARUs record-often continuously and for extended periods of time-the soundscape of a given area, that is, all sounds measured as frequency and intensity over time, consisting of biotic (e.g., animals), abiotic (e.g., rain, wind), and anthropogenic (e.g., vehicle traffic) sounds (Pijanowski et al., 2011). All species that produce identifiable calls (e.g., elephants trumpeting or rumbling; Wrege et al., 2017) or sounds (e.g., chimpanzees buttress drumming, gorillas chest beating; Heinicke et al., 2015) can be monitored with PAM, including many taxa that are poorly captured by other methods, such as insects (Ganchev & Potamitis, 2007) and amphibians (Aide et al., 2017;Troudet et al., 2017). Bats (Russo & Voigt, 2016) are the taxon most often monitored using PAM, followed by birds (Brandes, 2008) and anuran amphibians (Brauer et al., 2016;Sugai et al., 2019).
Acoustic recordings are typically used to monitor species presence and activity patterns, but in some cases also the sex, behavior, individuals, and even emotional state of individuals can be deduced (Mielke & Zuberbühler, 2013;Soltis et al., 2005). For most species, the detection area of ARUs is much larger than that of camera traps (Diggins et al., 2016;Enari et al., 2019) and therefore the precise installation location introduces less bias in terms of the species that can be detected. Since the loudness of calls affects the effective survey area of ARUs (Hutto & Stutzman, 2009), comparing detection rates across species is only possible when the detection range for each species is known (Section 3.1). Detections of focal species can, as with camera traps, be maximized by deploying ARUs near landscape features frequented by wildlife such as mineral licks or nesting sites, or by recording during seasons with high calling activity by the target species (e.g., breeding season). Such recording protocols should be standardized however across sites, and potential variability in detection probability accounted for, if abundance trends over space or time are to be reliably inferred.
Like camera traps, ARUs can monitor continuously, enabling the study of temporal vocal activity patterns (Sugai et al., 2019), even in periods and areas where it is logistically challenging to do field observations. The accuracy of PAM estimates, however, varies widely with species, distance to recorders and ambient noise levels, precluding absolute abundance estimates for most species (Brauer et al., 2016;Stowell et al., 2019). Moreover, estimating the number of individuals in group-living species is problematic, as counting simultaneously vocalizing individuals is difficult (Sedl aček et al., 2015). Furthermore, the calls of quieter species, such as many mammals, may be swamped by more vocal species. This is especially the case during sound-rich moments such as the dawn and dusk (Hutto & Stutzman, 2009). Comparative studies reported a large overlap in bird species richness estimates between PAM and human field observations, with each method also detecting unique species (Darras et al., 2019;Digby et al., 2013;Leach et al., 2016). Overall, PAM is well-suited for rapidly assessing the presence and habitat use of vocal species, as well as intraspecific changes in activity patterns and encounter rates over time, over large geographical areas. As such, PAM is a suitable method for detecting humaninduced impacts and for assessing the success of conservation strategies (Astaras et al., 2020;Kalan et al., 2015).

| POPULATION METRICS
It is important to consider a priori which population metrics are desired and attainable (Stephenson, 2019). Is a one-off measure of population or community status sufficient, or is it necessary to monitor changes over space or time? Is it important to measure population density, or are encounter rates sufficient? Here we consider the costs and benefits of analytical methods for generating three types of data outputs (in ascending order of usefulness in terms of potential applications and information gain): (1) encounter rate, also referred to as relative abundance or trap rate; (2) occupancy, that is, the proportion of sampled sites occupied; and (3) population density, the number of animals per unit area.

| Encounter rate
The encounter rate, that is, the number of detections per unit of effort, is the most basic metric of biodiversity, as it does not require any additional parameters. However, comparing encounter rates across sites or time should be done with caution as variable detection may cause serious bias (Sollmann et al., 2013;Strindberg & O'Brien, 2012). Detectability of animals varies with the weather, vegetation, visibility due to the season, monitoring equipment, survey design, animal size and behavior, and numerous other factors (Bas et al., 2008;Buckland et al., 2001;Cusack et al., 2015;Kolowski & Forrester, 2017;Madsen et al., 2020;Moore & Kendall, 2004;Pollock et al., 2002). As a result, observed differences in encounter rates may simply reflect differences in detectability rather than differences in population sizes (Sollmann et al., 2013). Constant detection probability may be achieved within sites with strict monitoring protocols, but it is more problematic to achieve across sites. For this reason, metrics that account for variation in detection probability are necessary for comparisons across sites, seasons, and species.

| Occupancy
Occupancy refers to the proportion of sampled sites occupied by a species. Since MacKenzie's seminal article on ways of accounting for imperfect detection in wildlife surveys (MacKenzie et al., 2002), occupancy modeling-now a broad family of models-has become a widely used analytical method in wildlife monitoring, especially for elusive species for which estimates of absolute abundance (Section 3.3) are rarely possible due to low overall detections. Occupancy modeling improves naive estimates of occupancy-that is, the proportion of sites where the species was observed-by correcting for the probability of a missed detection when the species is in fact present. This probability is estimated based on the detection history in the sites where the species' presence was confirmed, and requires multiple survey periods (replicates either in space or time). Occupancy estimates based on occupancy modeling can be used to compare population trends across space and time, without the risk of patterns being confounded by variable detectability (Section 3.1). A key advantage of modeling occupancy based on camera trap or PAM data is that no additional parameters are required, as opposed to density estimation (Section 3.3). Another important class of occupancy modeling are Bayesian approaches (Royle & Kéry, 2007), which allow for more complicated multispecies models, deriving additional metrics and incorporating prior information. Leading software for occupancy modeling includes PRES-ENCE (Hines, 2006) and the R-library "unmarked" (Fiske & Chandler, 2011).
The repeated observations of presence/absence that are necessary to estimate detectability in occupancy analysis, can be achieved in various ways. Observations collected by humans ideally require multiple field visits to each site (Guillera-Arroita et al., 2010;Kendall & White, 2009), but these visits must be sufficiently close in time to ensure that animal distribution does not change between visits. This additional effort may add substantial costs, particularly in more remote areas.
Models exist that allow for obtaining spatial replicates with a single team and a single visit, for example, by treating fixed-length sections of a long transect as separate survey periods (Guillera-Arroita et al., 2011;Hines et al., 2010). For camera traps and PAM surveys, repeated survey periods can be obtained by dividing a single deployment period in fixed-duration sub-periods, for example, a month-long deployment split into six 5-day survey-periods. The duration of these sub-periods is decided based on the characteristics of the species monitored, and should be sufficiently long to assume that repeated detections in subsequent survey periods are independent of earlier detections. Generally, once a minimum-duration survey period has been decided on, additional longer survey periods can be considered to ensure that the detection probability per survey period is not too low, for example, <20% per survey period (G alvez et al., 2016;MacKenzie & Royle, 2005). It is commonly recommended to let the size of the sampling unit, defined by grid cell area or recording unit spacing, be greater than the largest home range size of the target species, to avoid the need to correct for spatial correlation across sites. This, however, is usually unfeasible for species with very large home ranges.
Although occupancy can be a viable alternative to population density (e.g., Beaudrot et al., 2016;Devarajan et al., 2020), studies exploring whether occupancy has a linear relationship with density estimates have shown mixed results. While some studies show that the relationship approaches linearity (Linden et al., 2017;Tempel & Gutiérrez, 2013), other studies indicate that occupancy does not reflect density when species are rare (Gaston et al., 1998). Occupancy modeling does not work well for rare species because detection and occupancy become harder to separate, this problem can be partly alleviated by modeling occupancy of multiple species in the same model (using Bayesian approaches). The relationship also tends to vary with spatial and temporal sampling scales (Latham et al., 2014;Steenweg et al., 2018), or when species exhibit altered patterns of space use due to disturbances (Parsons et al., 2017). Nonlinearity between occupancy and density implies that for the same animal species, in the same habitat, over the same period of time, occupancy can sometimes align with density, or be slightly different to it, or show a completely opposite trend, and should therefore be treated with caution (Parsons et al., 2017).

| Population density
The most informative metric of wildlife monitoring is population density, that is, the number of animals per unit area, which if extended over the species range, can be used to calculate population size (also referred to as absolute or true abundance). Accurate density estimates are important for effective management of wildlife, as they can provide the most robust picture of population trends over space or time (Plumptre & Cox, 2006). These trends can be used to quantify responses to, for example, disturbance, management, or invasive species, and to inform sustainable management of exploited species (van Vliet & Nasi, 2008). The international classification of species conservation status on the IUCN Red List of Threatened Species and subsequent conservation strategies often require not only an understanding of the direction and magnitude of population trends (which could theoretically be obtained using occupancy), but, at least for IUCN Categories C and D, also information on the absolute size of a species' population is needed (IUCN, 2020). This section discusses the three leading analytical methods for estimating population density: distance sampling, the random encounter model (REM), and capture recapture, although various other analytical methods exist (Gilbert et al., 2020).

| Distance sampling
Distance sampling by human observers along line or point transects in tropical forests is a well-established analytical method for density estimation (Buckland et al., 2001), for which free software (Distance) is available . To convert the number of observations (individual, group, or sign) to density estimates, distance sampling estimates the effectively surveyed area by calculating the rate of decrease of species' detection probability, with distance from the observer. Distance sampling therefore requires accurate measurements of these distances. Camera trap data have also been successfully used as point transects (Bessone et al., 2020;Cappelle et al., 2019Cappelle et al., , 2020Howe et al., 2017), which requires the recording of distances at which recorded animals pass in front of the camera. Numbers of replicates (points) and detections (distance measurements) required for robust estimation are comparable to those required on line and point transects by human observers (Bessone et al., 2020;Cappelle et al., 2020). Analytical advances in image recognition are expected to automate such measurements, which will greatly speed up the process of density estimation using camera traps . For PAM, the distance of a vocalization cannot be inferred from volume alone, as the volume is also influenced by the direction in which the vocalization is emitted, atmospheric conditions, and the intensity of the call (Alldredge et al., 2007). Sufficiently dense ARU arrays can triangulate sound locations, but this is at the cost of the overall spatial coverage achieved with a given budget (Marques et al., 2013;Mennill et al., 2012;Wrege et al., 2017). A key requirement of distance sampling is that sampling units (lines or points) capture the heterogeneity of the area surveyed, which is typically ensured by systematic sampling design (Buckland et al., 2001Thomas et al., 2010). Sampling designs required for density estimation and broadspectrum community application are the same (Section 2), making it possible to estimate densities for multiple species.

| Random encounter model
The REM estimates density from trap rates by correcting the latter for the daily distance traveled by animals and the area sampled by camera traps (Rowcliffe et al., 2008). Sampled area is estimated in the same way as in distance sampling (Rowcliffe et al., 2011), and sampling design requirements are also identical. REM can only be used for camera trap data because the size of the sampled area needs to be known. REM requires estimates of animal speed of movement and daily activity level, which in principle can be estimated from camera footage (Rowcliffe et al., 2016), but this adds complexity.

| Capture recapture approaches
Capture recapture analyses, including spatially explicit capture recapture which is now the standard, are an effective analytical method for species that are individually recognizable (Amstrup et al., 2010;Borchers & Efford, 2008;Efford, 2004), and are supported by a variety of analysis software (e.g., Efford, 2009Efford, , 2020Laake, 2013;McClintock, 2015). This analysis is based on detecting and identifying individuals from part of a population in one sample, and then redetecting a proportion of these individuals in subsequent population samples. This way, the chance for an individual to be redetected in multiple samples is calculated and population density can be derived (Amstrup et al., 2010). Individual recognition is generally not possible with direct observations of tropical forest wildlife. Capture recapture analysis is widely used in camera trapping of species in which individuals have unique visual characteristics such as fur patterns, for example, leopards and tigers, but also elephants and great apes can be recognized individually (Arandjelovic et al., 2010(Arandjelovic et al., , 2011Després-Einspenner et al., 2017;Head et al., 2013;Kane et al., 2015;Karanth, 1995;Rich et al., 2014). The approach can also be used with PAM for species with individually unique vocalizations (Dawson & Efford, 2009). Individual identification of large amounts of material can be facilitated by pattern recognition software such as hotspotter and Wild-ID (Nipko et al., 2020).

| SPECIES IDENTIFICATION
With the advent of autonomous recorders, an oftenoverlooked part of wildlife monitoring is the effort required for species identification.

| Human observation
For observations collected directly by humans, species identification is an integral part of the fieldwork, immediately identifying species or signs, or measuring distances, on the spot. Data are then recorded in a standardized format and only minimal extra steps are required to prepare the data for analysis.

| Camera trapping
Camera trap surveys can produce thousands to millions of observations. Annotation and management of such volumes can be challenging for monitoring projects (Glover-Kapfer et al., 2019), despite the availability of various platforms for data management (Young et al., 2018). Image annotation by automated classification is developing rapidly Whytock et al., 2021;Willi et al., 2019) and is increasingly being integrated in data management platforms (Ahumada et al., 2020) and desktop apps (Falzon et al., 2020), requiring gradually less technical expertise and improving access for mainstream use (Aodha et al., 2014). Algorithms can annotate images with increasing accuracy to species or genus level, or filter out empty images (Wei et al., 2020), which can drastically reduce the workload (Norouzzadeh et al., 2018;Tabak et al., 2019). The user can define the confidence thresholds that are deemed acceptable. Lowering these thresholds increases the number of annotated species, but also the margin of error. Confidence levels therefore directly affect the amount of observations analyzed, and should be reported to enable comparison of the output of automated methods between studies.
There are however limitations to the automated identification of less common species (Tabak et al., 2019), as building a robust classifier requires large amounts of annotated photos. The more species, the more annotated photos are needed to realize sufficient discriminative power of the algorithm. Additionally, the dense vegetation of tropical forests contains highly variable background colors, shapes, and light conditions, making it more difficult to distinguish species in photos as compared to open landscapes. Emerging methods are finding solutions to this problem (Beery et al., 2019(Beery et al., , 2020. However, as some images are difficult to identify even for humans (Meek et al., 2013), it is unlikely that human effort can safely be removed for rare species identification altogether in the foreseeable future.

| Passive acoustic monitoring
For species detections with PAM, it is important to decide early on in a project how vocalizations will be detected in the recordings. This can be done manually by reviewing the spectrogram of the files both visually and acoustically (Aide et al., 2013;Bas et al., 2017;Knight et al., 2017;Ovaskainen et al., 2018). However, with multiple ARUs recording many hours of data each day, manual review quickly becomes impractical, making the use of automated classifiers desirable. These classifiers are not yet available for most species in tropical forests. Exceptions include elephants , some primates Zwerts et al., 2021), and birds (Priyadarshani et al., 2018). They do exist for gunshots, which can be used for eco-surveillance purposes (Astaras et al., 2017). Regardless of their availability, generally, the technical expertise required for using species-specific classifiers is moderately high.
Software facilitating the construction of new classifiers (Knight et al., 2017;Ovaskainen et al., 2018) includes a free web-based acoustic analysis platform (RFCx Arbimon; arbimon.rfcx.org). Robust classifier development often require a large annotated dataset (e.g., Enari et al., 2019;Gibb et al., 2019), which can be acquired either by manual annotation, or by the use of unsupervised classification which divides repeating patterns (vocalizations) into separate classes (Ovaskainen et al., 2018;Stowell & Plumbley, 2014). The output from this classification needs to be annotated. Existing databases (e.g., www.xeno-canto.org, www.macaulaylibrary. org) can be used to cross-reference vocalizations for most bird species (Araya-Salas & Smith-Vidaurre, 2017). For species that are not yet in these databases, expert knowledge is needed to annotate recordings. Unsupervised classification works well for regularly occurring vocalizations, but less so for rare species or rare vocalizations, as vocalizations will have a lower chance of detection or high risk of being masked by other sounds.
The annotations that are thus acquired, can be used to train species-specific classifiers. These can be sensitive to intra-specific call variations (Enari et al., 2019) and background noise (Knight et al., 2017;Priyadarshani et al., 2018), and have therefore shown mixed results when compared to manual classifications, both in terms of efficiency and accuracy (Blumstein et al., 2011;Brauer et al., 2016;Joshi et al., 2017). Furthermore, outcomes vary across classification methods, type of ARU, and species . Performance evaluation through manual crosschecking (Stowell et al., 2019) and rigorous reporting of analytical methods is therefore essential to safeguard the reproducibility of the data and to avoid false inferences (Digby et al., 2013;Kalan et al., 2015), as discussed for camera traps.
In conclusion, most classifiers at the moment should be considered as semi-automated, as time-consuming human validation of the results is required.
Camera trap photos or acoustic data can also be annotated with the help of citizen science (Arandjelovic et al., 2016;Baker, 2016;Swanson et al., 2015). An example of that is Zooniverse, a citizen science platform driving identification of millions of camera trap images in many projects around the world (Simpson et al., 2014), and which is also increasingly being applied in combination with automated methods (Willi et al., 2019). Although citizen science can provide valuable input and can have wider benefits in terms of education and involvement, it can be time consuming to initiate and manage. Moreover, it may be of less use when species are not widely known or are difficult to identify.

| RESOURCES REQUIRED
Each method comes with costs and it is important to plan realistically according to the available budget and staff capacity. Because of international price differences, we do not discuss absolute costs here, but rather indicate the relative importance of cost components of materials, labor, and logistics specific to each method. For the sake of comparability, we focus on larger monitoring projects that cover extensive survey areas, requiring multi-day field missions. For absolute cost comparisons between the field methods, we refer to other literature (camera traps: Cappelle et al., 2019;Güthlin et al., 2014;PAM: Darras et al., 2019). Also not discussed here but very important to consider, is how many transects, camera traps and ARUs are necessary to provide acceptable confidence of estimates. Pilot studies may help in estimating how many sites should be surveyed and for how long, to get the best return on investment.

| Human observation
Field observations require small initial investments for the monitoring or data processing equipment. Specific equipment purchases include a thread-based distance measurer, measuring tapes, and binoculars. Standardized data recording is ideally done using a rugged device with the relevant software and recording structure installed (e.g., Spatial Monitoring and Recording Tool; smartconservationtools.org). The highest costs of human observations are related to salaries and fuel, due to an extensive training phase and continued time investment of field personnel. Thorough training is essential for multiple observers to standardize and develop the required skills base, including detailed taxonomic knowledge (Fitzpatrick et al., 2009). Typical courses for university-level field technicians last for about 6-9 weeks, and regular refresher courses must be run to ensure standardization of methods across time and space . Team sizes vary (but can be up to 14 people) depending on remoteness, on whether multiple specialized observers for various taxa are present, and the monitoring protocol.
Recces are roughly four times less costly than line/ point transects (Section 2.1; Walsh & White, 1999). If density is not required, occupancy models can be applied to recce data although one needs to be sure that enough effort has been planned to allow replication. If density is required, systematically designed line or point transects must be used, although a recce-transect combination increases the chance to detect less frequently occurring signs of wildlife or poaching. The length of line transects that can be covered in a day in tropical forests (an approximate 1-4 km) depends on forest type, wildlife density and terrain characteristics. Teams sometimes spend weeks at a time in the forest, either to take repeated observations for occupancy estimations, or to cover extended areas (Cappelle et al., 2019;Diggins et al., 2016). Monitoring large areas can thus weigh heavily on costs of labor, rations, and field equipment.

| Camera trapping
The initial investment for camera traps is relatively high, ranging from 150 to 800 USD per camera trap for midrange to high-end models. Apart from the device itself, SD cards, batteries, locks, hard disks, and sometimes security boxes are required. Due to high humidity and termites in tropical forests, a percentage of camera traps can fail. In addition, cameras may get damaged or be stolen, so extra cameras should be purchased as backup Meek et al., 2019). During camera trap installation and recovery missions, around 10-15 km per day can be covered. The field teams are generally made up of 2-5 persons, but may be larger depending on the survey area and the number of cameras. One to two persons per team require in depth training in camera trap installation, as the orientation of the cameras, and assuring random/systematic location requires an understanding of the errors engendered by poor field practice (Roberts, 2011). Batteries may last for several months. Thus, installation, maintenance, and recovery missions do not have to be scheduled frequently, resulting in relatively low logistical costs. However, regularly relocating camera traps improves the precision of estimates more than monitoring at the same locations longer (Fewster et al., 2009;Kays et al., 2020), lowering initial investments into materials but increasing salary costs. For camera traps, the workload shifts from fieldwork to image processing (Section 4.2), with associated costs for employees that have received at least moderate levels of training in the use of database software and species identification.

| Passive acoustic monitoring
Initial investment for PAM is generally high, as an ARU costs in the range of 250-600 USD (Darras et al., 2019), although low cost (<200 USD) alternatives exist (Hill et al., 2018). Costs of batteries and SD cards and the size of field teams (2-5 persons) are comparable to those of camera traps. As sound recordings quickly result in sizable datasets, much larger than with camera trap images, data storage can be costly. Unlike camera traps, relatively little training is required to set up ARUs, as the installation location is less likely to introduce biases in data collection. While ARUs can record continuously for several days or weeks, depending on the target species, they can be programmed to record according to a predetermined schedule (e.g., only during morning chorus) and for a limited frequency range (thus reducing the size of files generated per recording session), thereby extending the overall deployment duration with a set number of batteries and as such decreasing operational costs. As with camera traps, the limited spatial replication can be compensated by regularly relocating the ARUs, which in turn inflates fieldwork and logistical costs. For PAM, the workload also shifts from fieldwork to data processing, and even more so than with camera traps, PAM requires highly trained technicians (Section 4.3). Data processing also involves fairly high computing power, requiring investment for either a modern multi-core computer or cloud computing services. Web-based platforms require access to high-speed internet connection to upload the typically very large acoustic files.

| CONCLUDING REMARKS
Given the intricacies of each method and the widely varying objectives and circumstances of wildlife monitoring efforts, it is not possible to make universally relevant prescriptions for action stemming from this review. The relative advantages of each monitoring method are always context dependent and the result of a complex web of equally important details. Guidance as to which field method is most adequate in any particular situation can be found by answering the four key questions we posed in this review. The answers to questions 1 and 2 should match the monitoring objectives, as each method allows the detection of some species but not all, which should be taken into account when doing community studies. The answers to questions 3 and 4 depend on the available budget, time, and skills. Monitoring is most effective if the objectives are clearly defined (Stephenson, 2019;Yoccoz et al., 2001). Decision trees (e.g., Hedges et al., 2012;Kühl et al., 2008;Strindberg & O'Brien, 2012) can help to define these objectives. Central to any monitoring objective is whether a project targets either a specific species or the entire community, as well as which population metric is required. Aside from setting objectives, it is necessary to acknowledge the realities in the field with regard to the availability of financial and human resources for fieldwork and data processing, and select field methods accordingly. Not fully considering the trade-offs between achieving the objectives and the attainability of a survey in relation to a particular method, may ultimately lead to ineffective monitoring and loss of conservation funds (Nichols & Williams, 2006;Sheil, 2001).
Despite current bottlenecks associated with camera trapping and PAM, the technological landscape is quickly evolving. Many people and organizations are working hard to improve efficiency both in data collection and processing through the development of new platforms and tools (e.g., RFCx Arbimon, Zooniverse, Wildlife Insights [Ahumada et al., 2020;Simpson et al., 2014]). Moreover, apart from the methods discussed here, exciting new genetic methods with much promise to monitor terrestrial and aquatic species, also merit attention. They can provide information on species diversity within a community (using e-DNA), animal density (using spatially explicit capture-recapture techniques), individually known animals (if one wants to assess the entire population in a small area), sex ratios, and taxonomy (Bohmann et al., 2014).
Integrated monitoring using multiple methods are, despite complementary strengths, rarely combined (Buxton et al., 2018;Garland et al., 2020), mainly due to the costs involved, but also due to a lack of crossmethodological knowledge exchange. Of course, any one method requires significant technical know-how and financial resources, which are not always readily available. Yet, we encourage combining field methods, as it has the potential to greatly broaden the diversity of species monitored. In addition, using multiple methods may facilitate synergies for more in-depth ecological or behavioral research (Garland et al., 2020;Moore et al., 2020), opening up new, interdisciplinary, research paths that can ultimately help to answer pressing ecological questions and provide improved guidance for conservation policy.

ACKNOWLEDGMENTS
The authors thank WWF the Netherlands and the Prince Bernhard Chair for International Nature Conservation, Utrecht University, for their financial support for the symposium that inspired the writing of this review. This study was furthermore supported by funding from the graduate programme "Nature Conservation, Management and Restoration" of the Dutch Research Council (NWO). Marcus Rowcliffe is supported by Research England, Stephanie Brittain was supported by a U.K. Government NERC CASE studentship (NE/M010376/1).

CONFLICT OF INTEREST
The authors declare no potential conflict of interest.
AUTHOR CONTRIBUTIONS Joeri A. Zwerts and Marijke van Kuijk: leading role in conceptualization, writing, and funding acquisition. P. J.
Stephenson: leading role in conceptualization. Pita Verweij: leading role in funding acquisition. All authors: conceptualization and writing.

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.