Acoustic monitoring in terrestrial environments using microphone arrays: applications, technological considerations and prospectus


  • Daniel T. Blumstein,

    Corresponding author
    1. Department of Ecology and Evolutionary Biology, University of California, 621 Young Drive South, Los Angeles, CA 90095-1606, USA
    Search for more papers by this author
  • Daniel J. Mennill,

    1. Department of Biological Sciences, University of Windsor, 401 Sunset Avenue, Windsor, Ontario N9B3P4, Canada
    Search for more papers by this author
  • Patrick Clemins,

    1. AAAS, 1200 New York Avenue NW, Washington, DC 20005, USA
    Search for more papers by this author
  • Lewis Girod,

    1. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA
    Search for more papers by this author
  • Kung Yao,

    1. Electrical Engineering Department, University of California, 420 Westwood Plaza, Los Angeles, CA 90095-1594, USA
    Search for more papers by this author
  • Gail Patricelli,

    1. Department of Evolution and Ecology, University of California, One Shields Avenue, Davis, CA 95616, USA
    Search for more papers by this author
  • Jill L. Deppe,

    1. Illinois Natural History Survey, Institute of Natural Resource Sustainability, University of Illinois at Urbana-Champaign, 1816 South Oak Street, Champaign, IL 61820, USA
    Search for more papers by this author
  • Alan H. Krakauer,

    1. Department of Evolution and Ecology, University of California, One Shields Avenue, Davis, CA 95616, USA
    Search for more papers by this author
  • Christopher Clark,

    1. Cornell Laboratory of Ornithology, Cornell University, 159 Sapsucker Woods Road, Ithaca, NY 14850, USA
    Search for more papers by this author
  • Kathryn A. Cortopassi,

    1. Cornell Laboratory of Ornithology, Cornell University, 159 Sapsucker Woods Road, Ithaca, NY 14850, USA
    Search for more papers by this author
  • Sean F. Hanser,

    1. Department of Population Health and Reproduction, School of Veterinary Medicine, University of California, 1 Shields Avenue, Davis, CA 95616, USA
    Search for more papers by this author
  • Brenda McCowan,

    1. Department of Population Health and Reproduction, School of Veterinary Medicine, University of California, 1 Shields Avenue, Davis, CA 95616, USA
    2. California National Primate Research Center, University of California, 1 Shields Avenue, Davis, CA 95616, USA
    Search for more papers by this author
  • Andreas M. Ali,

    1. Electrical Engineering Department, University of California, 420 Westwood Plaza, Los Angeles, CA 90095-1594, USA
    Search for more papers by this author
  • Alexander N. G. Kirschel

    1. Department of Ecology and Evolutionary Biology, University of California, 621 Young Drive South, Los Angeles, CA 90095-1606, USA
    2. Department of Biological Sciences, University of Cyprus, Nicosia 1678, Cyprus
    3. Edward Grey Institute, Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, U.K.
    Search for more papers by this author

Correspondence author. E-mail:


1. Animals produce sounds for diverse biological functions such as defending territories, attracting mates, deterring predators, navigation, finding food and maintaining contact with members of their social group. Biologists can take advantage of these acoustic behaviours to gain valuable insights into the spatial and temporal scales over which individuals and populations interact. Advances in bioacoustic technology, including the development of autonomous cabled and wireless recording arrays, permit data collection at multiple locations over time. These systems are transforming the way we study individuals and populations of animals and are leading to significant advances in our understandings of the complex interactions between animals and their habitats.

2. Here, we review questions that can be addressed using bioacoustic approaches, by providing a primer on technologies and approaches used to study animals at multiple organizational levels by ecologists, behaviourists and conservation biologists.

3. Spatially dispersed groups of microphones (arrays) enable users to study signal directionality on a small scale or to locate animals and track their movements on a larger scale.

4. Advances in algorithm development can allow users to discriminate among species, sexes, age groups and individuals.

5. With such technology, users can remotely and non-invasively survey populations, describe the soundscape, quantify anthropogenic noise, study species interactions, gain new insights into the social dynamics of sound-producing animals and track the effects of factors such as climate change and habitat fragmentation on phenology and biodiversity.

6. There remain many challenges in the use of acoustic monitoring, including the difficulties in performing signal recognition across taxa. The bioacoustics community should focus on developing a common framework for signal recognition that allows for various species’ data to be analysed by any recognition system supporting a set of common standards.

7.Synthesis and applications. Microphone arrays are increasingly used to remotely monitor acoustically active animals. We provide examples from a variety of taxa where acoustic arrays have been used for ecological, behavioural and conservation studies. We discuss the technologies used, the methodologies for automating signal recognition and some of the remaining challenges. We also make recommendations for using this technology to aid in wildlife management.


Ecologists and behavioural biologists have collaborated with engineers, computer scientists and linguists to design and deploy instruments to remotely record sounds emitted by animals (Table S1, Supporting Information). These collaborative efforts have produced exciting new technologies that allow researchers to detect, recognize, localize and track acoustically active animals (Glossary of terms Table 1). Results from these studies are leading to new insights into the adaptations and specialized features of animal acoustic signals (e.g. signal directionality, transmission properties); the processes of communication within complex social groups (e.g. eavesdropping, communication networks); the seasonal variability in animal acoustic behaviours and how those relate to ecological factors (phenology); and the spatio-temporal variability of the acoustic habitats in which animals live (e.g. ambient noise, anthropogenic sounds). New monitoring techniques allow researchers to survey at ecologically meaningful scales, over areas that contain populations of animals and time periods that contain significant biological activities such as mating, migration and foraging. Here, we provide a brief review and introduction to the literature of the questions that have been addressed using bioacoustic monitoring technologies in terrestrial ecosystems.

Table 1.   Glossary of terms
Anthropogenic soundsHuman-generated sounds produced as a by-product of a human activity (e.g. automobile, aircraft, jackhammer) or intentionally (e.g. siren, fog horn)
ClassificationThe process of determining the identity of a sound event (e.g. species, individual, sex, age)
Communication networkThe set of signallers and receivers engaged in the exchange of information. Animal communication traditionally focused on interactions between single signaller–receiver pairs but has grown to recognize that animal communication often consists of networks of multiple signallers and receivers (McGregor 2005)
DirectionalityThe two- or three-dimensional spatial variation in signal amplitude as it radiates from a signaller
NodesThe location where one or more sensors is deployed and operating
LocalizeThe process of determining the two- or three-dimensional position of the source of an acoustic event
Microphone arrayA multi-sensor system designed to spatially and synchronously sample the sound field.
PhenologyThe timing of seasonal events; for example, when an animal migration occurs, when animals start defending territories, or when a population of animals begins or ends its mating season
SoundscapeThe complete acoustic environment consisting of biotic and abiotic sounds, where biotic sounds refer to those produced by living animals, including humans, while abiotic sounds encompass all natural physical sounds (e.g. water, wind, rain) and non-natural mechanical noise produced by humans
TrackThe process of linking a time series of acoustic locations through space

We have written this article for applied ecologists and conservation biologists, and behavioural ecologists who may benefit from using these technologies. One goal is to enable field ecologists to have meaningful conversations with bioacousticians and engineers who may be able to help them solve management problems. A second goal is to share the research needs of ecologists with bioacousticians. We also include a preview of the types of questions that could be asked in the future, and a primer on the technology and algorithms that have been and are being developed to study bioacoustic biodiversity and behaviour.

Acoustic monitoring in the field

Species distributions, abundances and biodiversity

For animals that produce sound acoustic recordings are an efficient way to sample populations and communities to derive reliable estimates of species occurrence and, potentially, to estimate abundance. Single-microphone recording units have been used to measure species richness and composition of birds (e.g. Haselmayer & Quinn 2000), bats (e.g. MacSwiney et al. 2008; Obrist, Boesch & Flückeger 2008), anurans (e.g. Courch & Paton 2002) and insects (e.g. Brandes 2005). Stereo-microphones (e.g. Hobson et al. 2002) and quadraphonic microphone arrays (e.g. Celis-Murillo, Deppe & Allen 2009) have been used to estimate bird species abundances in addition to richness and community composition. Such systems capitalize on time delays and intensity differences between simultaneously recording microphones to provide information on the relative position of sound sources, enabling multiple individuals to be distinguished from one another. Celis-Murillo, Deppe & Allen (2009) demonstrated that acoustic recordings can perform even better than skilled observers at sampling bird communities via audio-visual field-based point counts. With an array of three or more spatially dispersed microphones (Fig. 1), localization algorithms can be used to determine the absolute geographic position of a sound source. Collier, Kirschel & Taylor (2010) used a 32-microphone wireless array arranged in eight nodes to localise Mexican antthrush Formicarius moniliger in the dense undergrowth of a Neotropical rainforest with mean errors <50 cm, and these localizations helped to develop territory maps to monitor spatial and temporal dynamics of marked individuals (Kirschel et al. 2011). Thus, larger arrays have the potential to provide detailed data on territory dynamics, population densities and habitat use at the level of the population, group or known individual.

Figure 1.

 An example of a microphone array localizing a vocalizing bat. Here, a 20-channel array is arranged into 5 nodes of 4 microphones, with differently sized microphones on each node to allow for localization in 3 dimensions. Dashed lines represent the distance the sound needs to travel to reach each microphone, and cross-correlation of signals recorded can aid in determining the location of the sound source.

Acoustic recordings can be used by conservation biologists. Researchers have deployed microphone systems to detect the presence of endangered or rare species. For instance, multiple research teams have used single-microphone recording units to investigate whether or not ivory-billed woodpeckers Campephilus principalis persist in the bottomland forests of Arkansas and the Florida panhandle (Fitzpatrick et al. 2005; Hill et al. 2006; Swiston & Mennill 2009). Thought to be extinct since the 1940s, acoustic analyses of >40 000 h of audio recordings provided evidence that woodpeckers may persist in remote forests that are difficult to monitor with conventional technology. In addition, passive acoustic arrays are being used to study the behaviour and population structure of threatened African elephant Loxodonta africana populations in habitats where visual survey methods are largely impractical (Payne, Thompson & Kramer 2003). Because they can provide reliable data on species richness and composition more rapidly than human-based survey techniques (Parker 1991), acoustic monitoring approaches offer an efficient and effective method for assessing biodiversity (Riede 1993, 1998) and examining long-term changes related to seasonal variation, human activity, habitat modification and climate change.

Acoustic surveys also provide insight into habitat quality and ecosystem health. Fischer et al. (1997) used acoustic sampling to estimate the density and composition of orthopteran species in Bavarian dry grasslands. They demonstrated that the density ratio of two groups (euryoecious to xerophilic grasshoppers) was a suitable indicator of eutrophication and potential threat to dry grasslands. National Parks such as Denali National Park and Preserve are using automated recordings to inventory and monitor park soundscapes to identify sound sources and develop management plans to mitigate the impacts of anthropogenic noise on ecological function and visitor experience (Hults & Burson 2006; Miller 2008).

Phenology and temporal responses to disturbance

Standardized acoustic monitoring provides a powerful tool for measuring shifts in phenology and quantifying temporal dynamics in individual behaviour, populations and community structure. Hüppop et al. (2006) and Farnsworth & Russell (2007) used acoustic sampling to identify migratory bird species and describe patterns in the nocturnal migration activity; such data are nearly impossible to collect without acoustic sampling because of the ineffectiveness of visual survey methods at night. Long-term, single- and multi-channel recording technology has also been used to monitor seasonal changes in acoustic behaviour in a variety of animals, including pileated woodpeckers Dryocopus pileatus (Tremain, Swiston & Mennill 2008) and barred owls Strix varia (Odom & Mennill 2010). By describing the relationship between migration activity derived from acoustic data and weather conditions, Hüppop et al. (2006) were able to assess the potential effects of wind farms on European migratory bird populations. Márquez et al. (2008) used automated acoustic recordings to study the reproductive phenology of anurans across multiple sites in relation to temperature and humidity to gain insight into the impacts of climate change on anuran populations in Spain. Acoustic monitoring enables the collection of important biological data over long time periods and across large areas, allowing conservation management decisions to be made by identifying which part of a habitat is used temporally and spatially by acoustically active species.

Effects of anthropogenic noise on animals

Acoustic monitoring assists studies examining the impacts of anthropogenic noise on individual animals and populations. For instance, Delaney et al. (1999) employed acoustic recordings and video surveillance to show that noise from approaching helicopters during military training exercises and chain saw noise caused Mexican spotted owls Strix occidentalis lucida to flush from their nests. Birds did not flush when noise stimuli were >105 m away from the nest, suggesting that a 105-m buffer zone around nests could minimize potential effects on nesting. Analyses of acoustic recordings in urban and rural environments throughout Europe have demonstrated that animals alter signal frequency to avoid masking by traffic noise (Slabbekoorn & den Boer-Visser 2006) but also suffer reduced reproductive success in noisy areas (Halfwerk et al. 2011). Recording technology thus allows for the quantification of impacts of anthropogenic sound on auditory behaviour, but more importantly from a management perspective, on the resulting abundances and distributions of wild animals.

Signal structure, directionality and propagation patterns

The use of microphone arrays in the field has great potential for relating small- and large-scale variation in the spatial structure of signals to the social and environmental context in which the signals are used (e.g. Forrest 1991; Lammers & Au 2003). Dantzker, Deane & Bradbury (1999) and Patricelli, Dantzker & Bradbury (2007, 2008) used 8-microphone arrays to surround vocalizing individuals of greater sage-grouse Centrocercus urophasianus, and red-winged blackbirds Agelaius phoeniceus respectively, to make detailed measurements of the directionality and amplitude of vocalizations in the field. Analysis revealed that sage-grouse signal up to 24 dB louder laterally than directly in front of the bird. Males direct these lateral beams towards females during courtship, suggesting that acoustic directionality may have influenced the evolution of behavioural displays, or vice versa (Dantzker & Bradbury 2006). Blackbirds use more directional vocalizations in social contexts where eavesdropping may be costly (Dabelsteen 2005), such as pre-copulatory calls, and less directional vocalizations when broadcasting alarm calls to neighbours and mates. A study using an 8-channel array for acoustic flight path tracking of whiskered bats Myotis mystacinus has shown that call structure is manipulated by the bats to maximise echolocation and reduce collision risk (Holderied, Jones & von Helverson 2006; Jones & Holderied 2007). Acoustic arrays can thus aid in the understanding of directionality and adaptive structure of signals, and the contexts in which those are propagated by communicating animals.

In addition, Patricelli and colleagues have measured the transmission loss and source amplitude of vocal displays using calibrated microphones and playbacks at male calling sites, while using a 24-microphone array to monitor greater sage-grouse leks (Patricelli & Krakauer 2010). Understanding these characteristics of acoustic behaviour in sage-grouse can be important from a conservation perspective, as these signals have evolved in complex mating rituals (Gibson, Bradbury & Vehrencamp 1991), which might be influenced by increasing disturbance, and conservation managers can determine the critical distance between lek sites and areas of such disturbance.

Communication and social networks

Larger arrays have been used to study signalling interactions between pairs of individuals as well as in neighbourhoods of individuals. Recording with an 8-microphone wired array in the humid forests of Costa Rica, Mennill and colleagues have explored the ecological and evolutionary significance of vocal duets in rufous-and-white wrens Thryothorus rufalbus (Mennill et al. 2006; Mennill & Vehrencamp 2008). In many tropical animals, males and females sing coordinated vocalizations, and yet their function remains elusive because many species live in dense vegetation where visual tracking is difficult. Acoustic localization using microphone arrays has provided the first rigorous data on the relative position of males and females, as they perform duets in dense tropical forests (Mennill & Vehrencamp 2008).

Using a 16-microphone array, researchers have simultaneously recorded entire breeding neighbourhoods of black-capped chickadees Poecile atricapillus in Ontario, Canada (Fitzsimmons et al. 2008a,b; Foote et al. 2008a,b; Lippold et al. 2008). In winter, this species lives in flocks with linear dominance hierarchies, and in summer, males defend individual breeding territories against former flock-mates and less-familiar males. Arrays allowed researchers to study interactions that produced empirical support for the communication network model (McGregor 2005), suggesting that territorial males broadcast signals that contain aggressive and repulsive information for rival males, but attractive information for prospecting females.

African elephants Loxodonta africana have been studied with multi-microphone arrays (Payne, Thompson & Kramer 2003; Thompson, Payne & Schwager 2009). Autonomous recording units placed around an entire forest clearing allowed researchers to localize vocalizing animals and to precisely correlate observed behaviour with vocalizations. Researchers used collars fitted with microphones and radiotransmitters combined with video monitoring to study African elephant vocalizations in relation to social and reproductive context in a captive population (Soltis, Leong & Savage 2005a,b; Leong et al. 2003) and found that females’ rumble vocalizations were produced in response to those made by other individuals, especially affiliated females, in a wide range of social contexts. Microphone arrays have also been used to determine the location of alarm calling yellow-bellied marmots Marmota flaviventris, demonstrating that alarm calls are mostly given by individuals close to burrows, in relative safety from predators (Collier et al. 2010).

Acoustic arrays thus allow us to better understand social interactions and networks in animals. This knowledge can then be applied towards wildlife management, because it highlights of the importance of group size and structure; evolved traits that have important fitness consequences (Blumstein & Fernández-Juricic 2010).

Recording equipment and technical considerations

When selecting recording equipment, researchers must clearly identify the study objectives, the target organism(s) and the environmental conditions under which the study will be conducted must be clearly defined. Because technology evolves rapidly, we will not recommend specific technologies. Rather, we will discuss various factors that must be considered to select suitable hardware for each component of a recording system. Additional considerations are discussed in Appendix S1 (Supporting Information).

Microphones and recording units

The type of microphone or sensor required will be driven largely by the kinds of animals being studied because recording equipment is sensitive to a limited range of frequencies. The organism (infrasonic animals, e.g. some mammals, birds; ultrasonic animals such as some bats, etc.) under study or the parts of the soundscape to be monitored will determine the frequency response and type of microphone needed to adequately capture all of the sounds of interest, because different types of sound in the environment (e.g. biotic sounds, natural physical noises and mechanical noises) are characterized by different frequency ranges.

Directional microphones may be preferred if single sensors are meant to capture acoustic information from specific orientations. However, most arrays use omnidirectional microphones that sample sounds with more or less equal efficiency in all directions. Because microphone arrays capture directional information, they allow the estimation of the direction and distance to the sound source.

Recording quality is another important factor. Researchers must consider the sampling rate (samples per sec, or Hz) and number of bits that are required, and whether these parameters need to be adjusted dynamically. Sampling rate determines the spectral range, and bit depth determines the dynamic range of the recordings as well as the size of acoustic files and, thus, storage requirements (Appendix S1, Supporting Information)).

Environmental conditions have substantial impacts on the durability and reliability of acoustic sampling units. Heating, cooling or protecting from rain or humidity may be required for both microphone and recording unit. Recorders with few moving parts (e.g. those with flash memory) may be more reliable in moist conditions. If animals (including humans) are likely to disturb equipment, then camouflage, animal-proof packaging or some sort of animal deterrent should be incorporated into the system.

Location- and time-coding methods

Location and time annotations are a typical requirement of multi-unit recording systems. To achieve this, a system must maintain an estimate of the current location and time of each unit, and the accuracy required by a specific application will determine the appropriate method. For wired arrays, sensor location is typically determined by a global positioning system (GPS) using handheld devices (metre-level accuracy) or surveyor-grade equipment (centimetre-level accuracy). For cases where only relative location is needed, other methods are available, including radio interferometry, which measures the difference in phase of signals received by each detector at varying distances from the source signal location (Maroti et al. 2005). Similarly, local time synchronization can be achieved by correlating local radio frequency (RF) transmissions recorded simultaneously at each microphone and then compared using a filter and correlation method to the precise time of the recording by each specific node that transmits the signal (Collier, Kirschel & Taylor 2010). A concern in these techniques is time synchronization. Slight synchronization differences between clocks on different nodes may lead to errors in triangulation. The simplest way to address this problem is to embed a time code in the recording itself that can be used later in analysis.

Communications and data transfer

Acoustic sensing systems may be developed from readily available components, or they may be custom-made. For example, Patricelli and colleagues used rack-mountable studio hardware and omnidirectional microphones to develop their eight-channel directionality array (Patricelli, Dantzker & Bradbury 2007, 2008) which was synchronized with a three-channel-video array that measured changes in position and posture of singing birds. This was later expanded to a 24-channel wired array in which a commercial time-code generator was used to synchronize three separate eight-channel digitizing preamplifiers and record directly onto a computer (Krakauer et al. 2009; Patricelli & Krakauer 2010). The recent advent of portable four- and five-channel digital recorders designed for filmmakers interested in collecting surround-sound recordings may provide multi-channel recording units for four- and five-channel recording systems at a very low price. Rather than using available components to develop acoustic sensing systems, Girod and colleagues (Girod et al. 2006; Allen et al. 2008) developed first the Acoustic ENSBOX and then VoxNet, both of which are custom-built hardware solutions. The 32-channel VoxNet wireless array comprises an embedded computer on each 4-microphone node, which can be controlled wirelessly by laptop or smartphone and records directly onto compact flash cards on each node (Collier, Kirschel & Taylor 2010). Similarly, Calupca, Fristrup & Clark (2000) have developed terrestrial autonomous recording units (TARU), which can be GPS synchronized and easily customized for a variety of applications through firmware programming.

Signal recognition

The methods for signal comparison and recognition used in bioacoustics range from trained humans listening to recordings and/or visually inspecting multi-channel spectrograms (e.g. McGregor et al. 1997; Hobson et al. 2002; Mennill et al. 2006; Patricelli, Dantzker & Bradbury 2008; Celis-Murillo, Deppe & Allen 2009), to complex machine-based detection, measurement and classification algorithms (e.g. Kogan & Margoliash 1998; Cortopassi & Bradbury 2000; Mellinger & Clark 2000; Urazghildiiev & Clark 2006; Kirschel et al. 2009). Each of these methods has advantages and limitations. Trained observers can cue on subtle pattern differences and reliably identify and discriminate relevant sounds in acoustic recordings. However, given the quantity of data frequently collected during acoustic studies, relying on human experts is a rate-limiting step (Swiston & Mennill 2009) and often is impractical. Automated computer-aided signal recognition systems provide a solution to this problem and are critical for the viability of long-term monitoring studies.

The signal recognition process is typically broken into two tasks: signal detection and signal classification. Signal detection involves the extraction of structured sounds of interest from random background noise. Signal classification involves the labelling of sounds into biologically relevant groups. The following sections discuss the major tasks involved in automated signal detection and classification and some of the most pressing research challenges.

Signal detection

Reliably, detecting signals of interest is an essential first step for automated processing. The salient portions of continuously recorded audio are typically a small fraction of the total recording time. Even if human experts are relied on for the classification task, having automated detection in place can vastly reduce the amount of data to be reviewed (Swiston & Mennill 2009). In addition, reliable automatic detection can reduce the need for large data storage capacity in remote sensing equipment. Rather than recording continuously, only the identified sounds of interest need to be recorded, thereby reducing storage requirements.

However, any automated system suffers from false positives and false negatives. The cost of false positives is that more data are stored. However, the cost of false negatives is that the sound of interest was not recorded and cannot be examined. Systems should be designed to err on the side of false positive detections.

The ability to reliably detect acoustic events depends largely on the structure of the recording environment, the signal-to-noise ratio (SNR) and the complexity and variability of the signals to be detected. For simple or complex sounds with low variability, template-based detection algorithms using spectrogram correlation (Mellinger & Clark 2000) or matched filters (Erbe et al. 1999) can perform quite well, even in higher noise environments.

Feature extraction

Once detected, a signal must be classified to determine what type of signal it is (what species, which individual, etc.). This may be done aurally or by visual inspection of a sound spectrogram, or, alternatively, automated classification methods can be used to assign labels to signals; such methods can be based on exemplar libraries or on pre-constructed models. Regardless of the classification method used, attribute generation or feature extraction is a critical first step (Deller, Proakis & Hansen 1993; Webb 2002). It is the acoustic features of the sound that are used to identify signal type.

Traditionally, bioacousticians have measured features associated with a signal’s frequency and time characteristics from waveforms, spectrograms and power spectra, and selected measurements such as duration, bandwidth, centre frequency, etc., by hand (e.g. Leong et al. 2002; Blumstein & Munos 2005), although several new software tools allow for automated parameter measurement. Machine-based feature extraction algorithms can provide for the rapid, repeatable generation of unbiased signal feature sets. Some examples include using mel-frequency cepstral coefficients, detecting features based on signal energy distribution, on entropy or from sound spectrograms.

Cepstral coefficients are commonly used in human speech processing and recognition (Davis & Mermelstein 1980; Hermansky 1990; Milner 2002) and have recently been used in some wildlife studies (Soltis, Leong & Savage 2005b; Mazhar, Ura & Bahl 2007) by adapting them for the different hearing abilities of the species under study (Clemins & Johnson 2006). Cepstral coefficients work well in classifying human speech (e.g. for automated telephone systems), but might be limited for other animals with different signal structure. Alternative approaches have been explored using more generic feature sets, with no underlying assumptions of the sound production model or the nature of the signal under study. One such approach characterizes patterns of signal energy distribution in time and frequency (Fristrup & Watkins 1992, 1993; Cortopassi 2006) and creates feature sets that can have been used to classify marine mammal sounds from over 50 species (Fristrup & Watkins 1993), explore inter- and intraspecific differences in flight calls from 14 species and 171 individuals of wood warblers (Farnsworth 2007) and look at age and sex differences in the vocalizations of African forest elephants (Thompson, Payne & Schwager 2009). Similarly, Kirschel et al. (2009) adopted a modified hill-climbing algorithm with a sliding median energy plot to detect and extract features from over 30 individuals in a population of Mexican antthrush. Sound spectrograms themselves can be used as feature sets, as is particularly common in nearest-neighbour-based classification schemes (e.g. Pinkowski 1994; Baker & Logue 2003; Cortopassi & Bradbury 2006; Farnsworth 2007). The feature extraction method chosen has typically depended on the focus of the study. While automated techniques help to reduce human effort and possible bias, they have been designed with specific target species’ signals in mind. More generic approaches are desired and a few are beginning to appear, for example in XBAT (Figueroa 2006; Cornell Laboratory of Ornithology).

Signal classification

Classification methods can be either supervised, in which expertly labelled data are used to train the system, or unsupervised, in which the structure of the data itself guides decision-making about class membership. Supervised classification methods fall into two broad groupings. In instance-based methods of classification, a library of known exemplars is used to assign class labels to unknowns based on feature similarity. In model-based methods of classification, abstract models are built based on the features of known signals, and these models assign labels to unknowns based on their attributes.

A variety of these classification methods have been explored in the bioacoustics literature on a diversity of animal taxa, with varying feature sets and degrees of success (e.g. Fristrup & Watkins 1993; Kogan & Margoliash 1998; Baker 2004; Clemins et al. 2005). For example, Fristrup & Watkins (1993) correctly classified up to 85% of marine mammal sounds, Hayward (1997) 75% of marine mammal sounds, Kogan & Margoliash (1998) up to 97% of indigo bunting Passerina cyanea song units, Baker (2004) 96% of laughing kookaburra Dacelo novaeguineae breeding groups, Clemins et al. (2005) 82–94% of African elephant call types and Trifa, Kirschel & Taylor (2008) up to 99·5% of antbird species songs. Kirschel et al. (2009) identified up to 99·4% of Mexican antthrush individuals using three supervised-learning techniques (discriminant function analysis, fuzzy logic and hidden Markov models) and found that they all identified over 97% of recorded songs to the correct individual, despite high levels of rainforest background noise, though one unsupervised-learning method (self-organising maps) performed less well in classifying individuals. Automated classification methods have the potential to identify the individual animal, population or species producing a sound with extremely high success rates, allowing for the reliable processing of data from long-term monitoring studies. However, automated classification methods may not work so well with all signal types, and certain sounds such as alarm calls in birds, and navigation and foraging calls of bats might be more challenging to identify to species or individual.


By localizing animals acoustically, we can gain information about the spatial dynamics of communication, count individual animals, map territories and assess population distribution patterns. We may also be able to study the dynamics of movement and the response to anthropogenic stimuli.

One factor determining the accuracy of an acoustic location system is the error in estimating the positions of sensors in the array. Widely distributed arrays under forest canopy may suffer some loss of accuracy because of the impact of the canopy on the GPS equipment used to measure microphone positions (Mennill et al. 2006), though adopting an acoustic self survey under forest canopy has yielded position of sensor estimates of only 15 cm from ground truth measurements (Collier, Kirschel & Taylor 2010). Topographic variation can also limit source estimation accuracy; sensor elevation is typically known with less certainty, and existing terrestrial algorithms may even ignore the z-axis, using only xy coordinates of microphone locations. Even with such limitations, localization accuracy is typically sufficient to distinguish the source among sets of individuals potentially producing sounds (McGregor et al. 1997). We include a simple illustration of a microphone array localizing a vocalizing animal (Fig. 1).

Localization based on array processing

An array of microphones with time-synchronized data collection capability can be used in various ways for source localization. If there is only a single-sound source, a straightforward approach is to perform cross-correlations among the waveforms from all microphones to extract their relative time delays, from which the source location can be estimated in a variety of ways (e.g. Spiesberger & Fristrup 1990; Clark & Ellison 2000; Bower & Clark 2005). One method is the correlation sum approach, which involves computing cross-correlations between waveforms from each channel and determining the most probable source location by using the Hilbert amplitude envelope of the cross-correlation functions (Collier, Kirschel & Taylor 2010).

The future

The wide-scale application of acoustic recording and processing technology has the potential to transform the fields of ecology, behaviour and conservation biology by allowing us to study animals in a standardized fashion at spatial and temporal resolutions and extents previously not possible and in environments that are difficult to access or monitor using conventional methods. Using acoustics to remotely track changes in indicator species, biodiversity and soundscapes may serve as an efficient way to measure habitat quality and ecosystem health. Because acoustic sampling can be used to monitor many taxonomic groups and environmental processes simultaneously, it can provide an integrative look at ecosystem dynamics and functioning.

However, additional work is required to achieve this potential. For instance, while different research groups have developed a variety of recording devices, it has proved difficult to share the progress made on each platform. The primary reason for this is that each application requires a customized solution.

One of the most pressing challenges for bioacoustic monitoring is reliable signal recognition. Algorithms that provide a confidence or quality estimate for each detection or classification can be helpful; the decision threshold can be adjusted post-processing to adjust the ratio of false positives to false negatives. In addition, it will be beneficial to develop interactive environments that will allow experts to classify ambiguous detections and then dynamically update the recognition scheme.

Environmental noise, equipment failure and natural biological variation can create heterogeneous data sets. Detection and classification algorithms and feature sets that are robust to such data heterogeneities are desirable.

There is a growing need for a common framework in which to develop, run and fully evaluate new bioacoustic recognition systems. Such a framework would include standard performance metrics and visualization tools, techniques for parameter tuning, facilities for running detectors and classifiers and generating feature sets and tools to determine the extent of under- or over-fitting of a system to training data. Also needed is a standard corpus of data sets that can be analysed by a variety of recognition systems to determine the relative strengths and weaknesses of each system. The determination of a standard data format, standard problem classes and a metadata standard for the field of bioacoustics that includes acoustic recording parameters and behavioural or species-specific information would catalyze this effort, because data sets could be analysed using any recognition system that supported these standards. A variety of acoustic data browsing systems with extensible signal detection and measurement capabilities already exist, including software programs such as Raven (Charif, Waack & Strickman 2008) and XBAT (Figueroa 2006; Cornell Laboratory of Ornithology), Syrinx-PC (J. Burt, Seattle, WA, USA) and AviSoft SAS-Lab Pro (R. Sprecht, Berlin, Germany). In addition, WEKA (Witten & Frank 2005) and R (Venables, Smith and the R Development Core Team 2008) are platforms suitable for machine learning and classification. Each system has unique capabilities and limitations. Currently, none provide a fully integrated solution to the task of signal recognition.

To facilitate the development of common standards, the community should set up a website or wiki to serve as a repository for collective experiences and knowledge. Alternatively, some existing web resources could accommodate such a repository (e.g. the Bioacoustics listserv: In future, these online resources could include information for specific popular platforms, such as ‘how-to’ documents, firmware images, instructions for integrating systems and acoustic data sets for testing and comparing algorithms.

To enhance development and adoption of new technology, researchers should write papers that document experiences. Such papers can document and explain the pitfalls and lessons learned about bioacoustic deployments and platforms. Often, researchers discussing their experiences find that they independently discovered and solved the same challenges.

To stimulate hardware development, the bioacoustic community should develop and capitalize on industry partnerships. There may be opportunities to outsource the details of manufacturing certain types of equipment to existing hardware suppliers. If there is a sufficient market for the proposed system, this can be a low-risk way for these companies to increase their volumes. It may also reduce the cost of hardware components.

Ultimately, we believe that the future for bioacoustic monitoring in the terrestrial environment is bright. Fostering discussions and collaborations within the bioacoustic community and among disciplines will be the key to successfully meeting our challenges and ushering in a new era of research in ecology, animal behaviour and conservation biology.


This paper emerged from an NSF funded workshop on bioacoustic monitoring in the terrestrial environment (NSF IDBR 0731674), held at the University of California’s James Reserve. Additional workshop support came from the James Reserve, and the Center For Embedded Network Sensing at UCLA. D.T.B., L.G. and K.Y. are supported by NSF-DBI-0754247, and A.N.G.K. was supported by NSF-0410438. We thank Becca Fenwick and Mihaela Tomuta for logistical support, Charles Taylor for being a co-PI on the grant, the conference participants for stimulating discussions, and the editor, Kamran Safi and three anonymous referees for very constructive comments on previous versions of this manuscript that helped us hone our message.