Monitoring temporal change of bird communities with dissimilarity acoustic indices


  • Laurent Lellouch,

    1. Muséum national d'Histoire naturelle, Département Systématique et Évolution, UMR 7205 ISYEB MNHN-CNRS-UPMC-EPHE, Paris, France
    2. Université Sud Toulon Var, UMR CNRS 7296 LSIS, R229, BP20132, La Garde, France
    Search for more papers by this author
  • Sandrine Pavoine,

    1. Muséum national d'Histoire naturelle, Département Écologie et Gestion de la Biodiversité, UMR 7204 CESCO MNHN-CNRS-UPMC, Paris, France
    2. Mathematical Ecology Research Group, Department of Zoology, University of Oxford, Oxford, UK
    Search for more papers by this author
  • Frédéric Jiguet,

    1. Muséum national d'Histoire naturelle, Département Écologie et Gestion de la Biodiversité, UMR 7204 CESCO MNHN-CNRS-UPMC, Paris, France
    Search for more papers by this author
  • Hervé Glotin,

    1. Université Sud Toulon Var, UMR CNRS 7296 LSIS, R229, BP20132, La Garde, France
    Search for more papers by this author
  • Jérôme Sueur

    Corresponding author
    1. Muséum national d'Histoire naturelle, Département Systématique et Évolution, UMR 7205 ISYEB MNHN-CNRS-UPMC-EPHE, Paris, France
    Search for more papers by this author


  1. A part of biodiversity assessment and monitoring consists in the estimation and track of the changes in species composition and abundance of animal communities. Such a task requires an important sampling over a broad-scale time that is difficult to reach with classical survey methods. Acoustics may offer an alternative to usual techniques by recording the sound produced by vocal animals. Animal species that use sound for communication (sing and/or call) establish an acoustic community when they sing at the same time and at a particular place. The estimation of the acoustic community dynamics could provide indirect cues on what drives changes in community composition and species abundance.

  2. Here, new methods were developed to estimate the changes in bird communities recorded at three woodland temperate sites in France. Both field recordings and simulated data were used to test whether acoustic dissimilarity indices can be used to estimate changes in the composition of the community. Four dissimilarity indices found in the literature, and a new one named Dcf were tested on auditory spectra after transformation to the Mel scale, rather than on classical Fourier frequency spectra. All indices were compared with each other and with compositional indices.

  3. The results show that bird communities occurring at the three sites were dynamic with changes of composition with time. Dissimilarities computed on simulated acoustic communities were correlated with compositional dissimilarity but those computed on field-recorded communities could not be considered as faithful estimators of community composition variations. However, the indices indicate important dates in community changes around mid-April that were also seen in the composition dynamics.

  4. Acoustic dissimilarity indices failed to track accurately changes in species composition of the bird communities. However, these indices, which are easy to compute, still provide information on the acoustic dynamics of bird community. Acoustics might not be considered as a proxy of compositional diversity but rather as another facet of animal diversity that needs to be studied and preserved on its own.


An ecological community can be defined as an assemblage or collection of species found in a particular prescribed area or habitat (Odum 1953; Morin 2011). Ecological communities are not static organizations: they are always on the move, changing with time and sometimes with space. This instability makes biodiversity dynamics difficult to infer, to monitor and to forecast (Magurran 2004; Dornelas et al. 2013). Community variations are mainly due to changes in species composition and species abundance. These variations are themselves related to intra- and inter-species interactions (Mutshinda, O'Hara & Woiwod 2009) and exogenous disturbances either with a natural or anthropogenic origin (Dornelas 2010). Biodiversity conservation requires a good quantification and understanding of diversity, in particular community dynamics, to take appropriate local or regional decisions (Groom et al. 2006). The estimation of temporal change of biodiversity, including communities, has been therefore recently identified as a major task in ecology that should be challenged with specific sampling and analytic tools (Magurran et al. 2010; Magurran 2011; Dornelas et al. 2013).

A community can also be considered in reference to a specific resource shared by the species assemblage, for instance a trophic resource. A part of animal marine and terrestrial species produce sound for different activities, including territory defence, courtship display and social interactions. These species, when singing at the same place and at the time, generate a peculiar assemblage and have to share, through competition or cooperation, the local acoustic space. The dynamics of these communities can potentially be analysed through the sound emanating from the species assemblage. This sound, which can be viewed as a community phenotype, can be monitored to retrieve information on community dynamics. Individuals, populations and communities of singing animal species have been traditionally assessed with aural observers (Digby et al. 2013). However, recent technical developments in unattended digital recording tend to replace human observers by automatic recorders that can drastically decrease the effort of sampling and support a change of scale from population to community (Mennill et al. 2012; Skalak, Sherwin & Brigham 2012). This new generation of acoustic sensors opens the possibility to sample communities over a long period of time and raises at the same time the question of how to treat large audio data and how to assess the temporal dynamics of acoustic communities.

The identification and quantification of community changes can be achieved by following the temporal variation in a static diversity index, such as species richness or species evenness, computed at a regular time interval. In this case, the core of data analysis is to be found in time-series analysis, involving, for instance, autocorrelation, change-point detection or auto-regressive models (Buckland et al. 2005). Another way is to use the so-called turnover index, developed in the context of island biogeography, that takes into account immigration and extinction rates (Brown & Kodric-Brown 1977; Diamond & May 1977). It is also possible to use similarity or dissimilarity biodiversity indices. These indices, which estimate between-group or beta diversity, have been developed in important numbers. They consist in a distance metric that estimates the (dis)similarity in composition of two units, essentially two different communities, more rarely a single community at two distinct times (Jost, Chao & Chazdon 2011).

In the particular case of sound, the assessment of community dynamics mainly relies on comparing pairs of audio samples. A first attempt to compare the acoustic output of acoustic communities was made by subtracting pairs of amplitude envelopes and average frequency spectra, respectively (Sueur et al. 2008b). This led to two subindices, the temporal and spectral dissimilarities (Dt and Df) that were multiplied and scaled between 0 and 1 to obtain an index named D. The D index was applied with success when comparing Tanzanian forest communities (Sueur et al. 2008b) and temperate woodland bird communities (Depraetere et al. 2012). However, it also appears that the subindex Dt requires a perfect temporal alignment between the amplitude envelopes to be compared. This strict homology may not be met even with synchronized recordings. Dt was therefore not used in other analyses, and only Df was kept. The subindex Df revealed clear temporal and geographical variations in distant New Caledonian sites (Gasc et al. 2013b) and highlighted time and spatial patterns within tropical rain forest patch in French Guiana (Rodriguez et al. 2014). The Kolmogorov–Smirnov distance and the symmetric Kullback–Leibler distance were also tested as alternative indices to compare frequency spectra of temperate bird communities (Gasc et al. 2013a). However, a correlation analysis suggested that all indices estimate spectral dissimilarity in a similar way.

The results obtained so far with the acoustic dissimilarity indices mainly focus on differences between recording sites rather than between recording times (Depraetere et al. 2012) or could not be confronted with the species composition of the communities (Gasc et al. 2013b; Rodriguez et al. 2014). In addition, the indices were not tested to estimate whether they could detect time breaking points, that is, specific events of rapid changes in the community composition.

Here, previous and new acoustic dissimilarity indices were tested on simulated and real acoustic communities derived from bird dawn choruses recorded in three sites of a temperate woodland during more than 2 months (Depraetere et al. 2012). In particular, two main questions were addressed: 1) Do acoustic dissimilarity indices give a faithful description of the changes in the species composition of a community? 2) Are the acoustic dissimilarity indices able to detect rapid changes in the acoustic composition of a bird community?

Materials and methods

Data collection and pre-processing

Study area and recording

Recordings of bird dawn chorus were carried out from March 24 to June 5 2009 in the Parc Naturel Régional of Vallée de Chevreuse, a protected area located 40 km south-west of Paris, France. These recordings were used in a previous study (Depraetere et al. 2012).

Three sampling sites were chosen according to a gradient of tree density (Depraetere et al. 2012). They were spaced at least 300 m from each other to avoid acoustic overlapping between recording points. Sampling site A (48° 37·628N, 01° 55·294E, 174 m) was a mature forest composed of a regular grove of pedunculate oak (Quercus robur), sessile oak (Quercus petraea) and downy birch (Betula pubescens). Sampling site B (48° 38·519N, 01° 56·155E, 166 m) was a young forest mainly made up of hornbeam (Carpinus betulus). Sampling site C (48° 38·319N, 01° 57·527E, 160 m) was at the frontier between an open habitat (cornfield) and a forest composed by hornbeam (Carpinus betulus) and European ash (Fraxinus excelsior).

The recording equipment consisted of three digital audio field recorders Song Meter SM1 (Wildlife Acoustics, USA). These off-line and weatherproof recorders were equipped with an omni-directional microphone (flat frequency response between 20 Hz and 20 kHz). The signals were sampled at 44·1 kHz with a 16 bits digitization, and the recording level of all recorders was set to the same value (default factory value of 0·0 dB). Recorders were all positioned on trees at a height of 2 m and microphones pointing horizontally. The distance from the closest road was 300 m, 900 m and 210 m for the sites A, B and C, respectively. Site C was thus close to a road but no important car traffic, which could have impaired the recordings, was noticed. Recordings were achieved 30 min before sunrise when bird species sing in the same time and produce a dawn chorus (Staicer, Spector & Horn 1996), when the acoustic richness reached a maximal value as found by Depraetere et al. (2012). Each recording session lasted 150 s. The complete audio sampling resulted in a total of 222 audio files (74 days * 3 sites) for a total duration of 555 min. For a sake of clarity, days will be hereafter numbered from day 1 (March 24) to day 74 (June 5). Two days with stormy weather (days 49 and 50, corresponding to May 11 and 12) had to be excluded from the data set, resulting at the end in 72 days of sampling.

Aural identification

Species singing during the dawn chorus were identified by one of us (F. J.) who is highly trained in bird aural identification. The identification was achieved by listening to each audio file once with circumaural headphones. This procedure was chosen to ensure a comparison with aural identification made in the field by observers. The aural identification led to the identification of a total of 35 species (all sites, all days, Table 1). A presence/absence matrix was built to indicate for each day and each site the occurrence of each species. Each matrix described the local acoustic community at a specific day. There were 24 species at sampling site A, 21 at site B and 25 at site C (Fig. 1). The three sets of data corresponding to the three sites were processed independently in all following analyses.

Table 1. List of the 35 bird species identified (all sites, all days)
FamilyLatin namePopular name
Aegithalidae Aegithalos caudatus Long-tailed tit
Alaudidae Alauda arvensis Eurasian skylark
Anatidae Branta canadensis Canada goose
Certhiidae Certhia brachydactyla Short-toed treecreeper
Columbidae Streptopelia decaocto Eurasian collared dove
Columba palumbus Common wood pigeon
Corvidae Garrulus glandarius Eurasian jay
Corvus corone Carrion crow
Cuculidae Cuculus canorus Common cuckoo
Emberezidae Emberiza citrinella Yellowhammer
Fringillidae Carduelis chloris European greenfinch
Fringilla coelebs Common chaffinch
Motacillidae Anthus trivialis Tree pipit
Motacilla alba White wagtail
Muscicapidae Phoenicurus phoenicurus Common redstart
Luscinia megarhynchos Common nightingale
Erithacus rubecula European robin
Oriolidae Oriolus oriolus Eurasian golden oriole
Paridae Parus major Great tit
Poecile palustris Marsh tit
Cyanistes caeruleus Eurasian blue tit
Phasianidae Pavo cristatus Indian peafowl
Phasianus colchicus Common pheasant
Phylloscopidae Phylloscopus collybita Common chiffchaff
Picidae Picus viridis European green woodpecker
Dendrocopos major Great spotted woodpecker
Prunellidae Prunella modularis Dunnock
Sittidae Sitta europaea Eurasian nuthatch
Strigidae Strix aluco Tawny owl
Sturnidae Sturnus vulgaris Common starling
Sylviidae Sylvia atricapilla Eurasian blackcap
Troglodytidae Troglodytes troglodytes Eurasian wren
Turdidae Turdus viscivorus Mistle thrush
Turdus philomelos Song thrush
Turdus merula Common blackbird
Figure 1.

Venn diagram representing the distribution of the 35 species found at the recording sites.

Calculation of Mel spectrograms and Mel mean spectra

The frequency composition of sound was analysed using the Mel scale. This nonlinear scale allows a faithful signal reconstitution of the signal and was formerly used for the calculation of bird song spectrograms and mean spectra (Lee, Lee & Huang 2006a). The main advantage of the Mel scale, and therefore of Mel spectra, was to focus spectral description on bird songs. For all field recordings, 100 Mel-frequency cepstral coefficients (MFCC) were calculated using the R package tuneR (Ligges 2011), with a frame shift of 10 ms, a window length of 25 ms and a liftering exponent of 0·6, leading to 256 frequential bands Mel spectrograms (Hermansky & Morgan 1994). Mel mean spectrum of each field recording was then calculated by a temporal mean of its Mel spectrogram and scaled by its maximum. Values lower than 125 Mel (= 185 Hz), corresponding to low-frequency background noise, were filtered out. An example of a Mel spectrogram is given in Fig. 2.

Figure 2.

Example of a Mel spectrogram obtained on a recording selection achieved in the field (Site A, 4 April 2009, 06:08 am). The left axis refers to the nonlinear kMel scale and the right axis to the more usual linear kHz scale. This recording includes songs of Marsh Tit (Poecile palustris), Wren (Troglodytes troglodytes), Song Thrush (Turdus philomelos), European Nuthatch (Sitta europaea) and Great Tit (Parus major).

Simulation of communities

For each of the 35 species aurally identified, a species-specific song (not call) was selected from the MNHN sound library (Deroussen & Jiguet 2006; = 23) and commercial or online recordings (Roché 1990; = 6; Deroussen 2001, = 5; xeno-canto, = 1). Each recording lasted 30 s of song and met two acoustic conditions as follows: (i) the signal had to be emitted by a single isolated individual and (ii) the signal-to-noise ratio (SNR) of the signal had to be higher than 100, where SNR = (Asignal/Anoise)² and Asignal (resp. Anoise) is the root mean square (RMS) amplitude of a random 0·5 s section of the signal (resp. noise).

Mel spectrogram and Mel mean spectrum of each of the 35 species-specific songs were calculated with the same protocol applied to field recordings. Mel mean spectra were then used to build a simulated Mel spectrum for each local acoustic community. This was achieved by averaging the species-specific Mel mean spectra weighted by the absence/presence species matrix. Each simulated community Mel spectrum was then scaled by its maximum.

Calculation of dissimilarity matrices

Acoustic dissimilarity

With math formula, the two 256 points Mel mean spectra of interest; with math formula and math formula the corresponding cumulative distribution functions, five indices were used to estimate the acoustic dissimilarity between the Mel mean spectra of two field recordings or of two simulated communities:

  1. Correlation-based dissimilarity
    • display math
  2. Symmetric Kullback–Leibler divergence (Kullback & Leibler 1951):
    • display math
  3. The integral of pointwise difference (Sueur et al. 2008b):
    • display math
  4. Kolmogorov–Smirnov distance (Rachev 1991):
    • display math
  5. Cumulative frequency dissimilarity, a new index here named Dcf:
    • display math

All five indices ranged from 0 to 1 and increased with spectral dissimilarity. Indices 1, 2 and 3 were pointwise indices, that is, indices that compare the relative amplitudes of homologous frequency bins. The values of these three indices should be interpreted with caution as they may indicate important dissimilarities when comparing two sharp Mel spectra with similar shape but shifted in frequency (Fig. 3). The index 4, which is not pointwise, also evaluated the dissimilarity between such two spectra as important.

Figure 3.

Two theoretical sharp Mel spectra of similar shape with a low-frequency shift. The dissimilarity between these two spectra is estimated as maximal (= 1) by indices 1 to 4, because the two spectra are non-overlapping, but as minimal (= 0·04) with the index 5 (Dcf).

The development of a reliable dissimilarity index, which is not affected by such drawback, is a difficult issue (Karfunkel et al. 1993; de Gelder, Wehrens & Hageman 2001; Bodis, Ross & Pretsch 2007). Here, a new index, Dcf (index 5), which was not affected by the disadvantages of pointwise indices, was introduced. Dcf is sensitive not only to the spectral overlap between two Mel spectra but also to the mean frequential distance between the different peaks of the two spectra. In the example illustrated in Fig. 3, math formula, where math formula are the means of the frequencies in the Mel spectra x and y. The formula math formula is true only if the sign of XiYi is constant for i in [1, 256]. In general, Dcf can be proved to be equal to the sum of math formula on each domain where the sign of XiYi is constant.

For the three sites, the five dissimilarity indices were computed for each pair of field recording mean Mel spectra and for each pair of simulated communities Mel spectra. Thus, each dissimilarity index provided, for each site, a 72-by-72 dissimilarity corresponding to field recordings and another 72-by-72 dissimilarity matrix corresponding to simulated communities. Dissimilarity matrices corresponding to field recordings were numbered R1, R2, R3, R4 and R5, and those corresponding to simulated communities S1, S2, S3, S4 and S5, with number corresponding to index number (e.g. R1 = correlation-based dissimilarity matrix on field recordings).

Compositional dissimilarity

Numerous Euclidean distance indices have been previously developed and repeatedly used to estimate the compositional dissimilarity between two communities characterized by a presence/absence matrix (Gower & Legendre 1986). Among these indices, the Pearson's Phi index estimates the most properly the tendency for two communities to share common species independently from the number of species composing each community (Jackson, Somers & Harvey 1989 and unpublished results). The Pearson's Phi index is defined as:

φ = (1− s)1/2, with = (adbc)/[(b) (c) (b) (c)]1/2, and coefficients a, b, c and d defined as the number of species observed: (a) in the two compared communities; (b) in the first community only; (c) in the second community only; (d ) in none of the two communities.

The compositional dissimilarity between two communities can also be evaluated with a Fisher's exact test, which defines the compositional dissimilarity between two communities as the probability that the number of common species in the two communities is smaller in the real case than if species have been randomly permuted (Fisher 1954). This probability is given by the hypergeometric distribution, following:

= [(b)! (d)! (c)! (d)!]/[a! b! c! d! (+d)!], with a, b, c and d as defined above.

For each site, compositional dissimilarity indices φ and P were computed for each pair of communities, giving two 72-by-72 dissimilarity matrices, respectively, named C1 and C2.

Temporal dissimilarity

Temporal dissimilarity was defined as the temporal difference between two dates, expressed in days. The corresponding 72-by-72 dissimilarity matrix was numbered T1.

The complete process, used to obtain the 13 dissimilarity matrices, is summarized in Fig. 4.

Figure 4.

Summary of the process used to obtain the 13 dissimilarity matrices: acoustic dissimilarity matrices calculated from field recordings (R1–R5), acoustic dissimilarity matrices calculated from simulated communities (S1–S5), compositional dissimilarity matrices (C1–C5) and temporal dissimilarity matrix (T1).

Statistical analyses

Correlation between pairs of dissimilarity matrices

Spearman's rank correlation between each pair of dissimilarity matrices was calculated. Nonparametric correlation was chosen due to nonlinearity of data. Mantel permutation test (Mantel 1967) was used with 1000 permutations to evaluate the significance of correlations between two dissimilarity matrices.

Scatter plots representing each pair of dissimilarity matrices were displayed. Loess regression curves (Cleveland, Grosse & Shyu 1992) were calculated by least-squares method, with neighbourhood parameter α = 0·75.

Permutation tests on species songs

A Monte Carlo permutation test was carried out to evaluate the importance of bird song species specificity on matrix correlations. As high correlation between matrices S1 to S5 on the one hand and between matrices C1 and C2 on the other hand were demonstrated as a first step, the test was carried out using only with the matrices S1, C1 and T1.

The test consisted in choosing randomly a permutation of the set of species and simulating the communities corresponding to the permuted association of species and songs in accordance with the process of communities simulation described above. The matrix S1 was calculated from this new set of simulated communities, and then, Spearman's rank correlations were estimated between S1 and C1 on the one hand and between S1 and T1 on the other hand. These two correlations, evaluated in the case of a permutation of songs, were compared with the two same correlations in the real (non-permuted) case. The test was repeated with 1000 permutations.

Identification of pivot days

Pivot days were defined as days when the acoustic community was rapidly evolving, that is, d was considered as a pivot day if dissimilarity between two communities anterior to d day was usually low, dissimilarity between two communities posterior to + 1 day was also usually low, but dissimilarity between a community anterior to d day and a community posterior to d + 1 day was usually high. Days at which the acoustics (respectively composition) of the community were rapidly evolving were called acoustic (respectively compositional) pivot days. Pivot days could be identified visually on dissimilarity matrix plots or using a divisive hierarchical clustering automatic approach. In the latter case, in each dissimilarity matrix, all elements with the same temporal dissimilarity were first scaled by their average. After this step of scaling, the mean of each dissimilarity matrix coefficients was equal to 1. Each set of n pivot days (d1, …, dn) defining the n + 1 time periods ([1, d1], [d1 + 1, d2], …, [d+ 1, 74]), and pivot days were then identified iteratively as those that minimized the mean of intra-period dissimilarities. The process stopped when the decrease in the mean of intraperiod dissimilarities resulting from the addition of a new pivot day was inferior to 0·01, or inferior to half of the similar decrease in the previous step.

All analyses were carried out with the R software (R Development Core Team 2013), using the tuneR (Ligges 2011) and seewave (Sueur, Aubin & Simonis 2008a) packages for acoustic analyses, and the ade4 (Chessel, Dufour & Thioulouse 2004) package for statistical tests.


Community evolution at the season scale

Global correlations between dissimilarity indices

There was no correlation between the number of different species per recording and the day for sites A (Spearman's correlation: ρ = 0·042, P = 0·73) and C (ρ = −0·067, P = 0·57), but the number of species slightly decreased during the season for site B (ρ = −0·29, P = 0·014).

As shown in Figs 5, 6 and 7 for each of the three sites, correlations between temporal (T1) and compositional (C1, C2) dissimilarity matrices were significant, showing that the more 2 days were temporally distant, the more different the corresponding communities were. Similarly, all the correlations between temporal (T1) and acoustic (S1–S5) dissimilarity matrices calculated from simulated communities were significant. At site A (Fig. 5), three (R1–R3) of the five acoustic dissimilarity matrices calculated from field recordings (R1–R5) were significantly correlated with the temporal dissimilarity matrix (T1). At site B (Fig. 6), none of such correlations were significant, contrary to site C (Fig. 7), where all correlations were significant, showing that the more 2 days were temporally distant, the more their acoustic was dissimilar.

Figure 5.

Correlogram of the 13 dissimilarity matrices for site A. Lower left side of each panel: scatter plots representing each pair of dissimilarity matrices and loess regression curves; upper right side: values of corresponding Spearman's correlations, and significance of the associated Mantel tests indicated with asterisks: ***<0·001, **<0·01, *<0·05, . <0·1.

Figure 6.

Correlogram of the 13 dissimilarity matrices for site B. Lower left side of each panel: scatter plots representing each pair of dissimilarity matrices and loess regression curves; upper right side: values of corresponding Spearman's correlations, and significance of the associated Mantel tests indicated with asterisks: ***<0·001, **<0·01, *<0·05, . <0·1.

Figure 7.

Correlogram of the 13 dissimilarity matrices for site C. Lower left side of each panel: scatter plots representing each pair of dissimilarity matrices and loess regression curves; upper right side: values of corresponding Spearman's correlations, and significance of the associated Mantel tests indicated with asterisks: ***<0·001, **<0·01, *<0·05, . <0·1.

Acoustic dissimilarity matrices obtained from field recordings (R1–R5) were always highly correlated with each other. Similarly, acoustic dissimilarity matrices obtained from simulated communities (S1–S5) were all highly correlated with each other. Correlation between the two compositional dissimilarity matrices (C1 and C2) was nearly maximal revealing an obvious link between the φ and p indices (Figs 5, 6 and 7).

All correlations between a compositional dissimilarity matrix (C1, C2) and an acoustic one, calculated from simulated communities (S1–S5), were high. Correlations between a compositional dissimilarity matrix (C1, C2) and an acoustic one, calculated from field recordings (R1–R5), were globally low. Yet, all of these correlations, except 4 of the 10 calculated at site A, were significant.

At site A (Fig. 5), 11 of the 25 correlations between an acoustic dissimilarity matrix obtained from field recordings (R1–R5) and an acoustic dissimilarity matrix obtained from simulated communities (S1–S5) were significant. At site B (Fig. 6), none of such 25 correlations were significant, contrary to site C (Fig. 7), where all of them were significant.

Important Spearman's correlations, supported by significant Mantel tests, highlighted global tendencies obtained from a great number of points (71*72/2 = 2556). However, scatter plots showing an important dispersion around the loess regression curve suggested that in spite of high correlations, no predictive model could estimate safely and precisely compositional dissimilarities from acoustic dissimilarities. Some pairs of communities were very dissimilar from an acoustic viewpoint, but very similar from a compositional one, and inversely. Similarly, acoustic or specific dissimilarities between two communities could not be accurately estimated with the temporal difference separating them.

Effect of the specificity of bird songs on correlations

The effect of the species specificity of bird songs on correlations between dissimilarity matrices was evaluated by the permutation test described above. For the three sites, Spearman's correlation between dissimilarity matrices S1 and C1 was not statistically different whether permuting or not species songs in the community simulation process (P-values for the Monte Carlo test with H0: correlationpermuted ≠ correlationreal being respectively equal to 0·982, 0·740 and 0·536 for the three sites). Spearman's correlation between dissimilarity matrices S1 and T1 was not statistically different whether permuting or not species songs for sites A and B (P-values for the Monte Carlo test with H0: correlationpermuted ≠ correlationreal being respectively equal to 0·668, 0·320 and 0·064 for the three sites). This revealed the lack of an effect directly linked with the specificity of each species song.

Pivot days

Compositional pivot days identified with dissimilarity matrices C1 and C2 were similar. In the same way, acoustic pivot days identified from dissimilarity matrices R1–R5 on the one hand, and S1–S5 on the other hand, were similar. Due to these similarities, only the results for R1, S1 and C1 were plotted as an example of the results (Fig. 8). At site A, days 11 and 45 were identified as pivot days for matrix R1, whereas day 16 was identified as a pivot day for matrices S1 and C1. At site B, identified pivot days were days 17 and 27 for R1, day 33 for S1 and day 14 for C1. At site C, identified pivot days were day 20 for R1, and day 25 for S1 and C1. Hence, acoustic pivot days obtained from field recordings differed slightly from acoustic pivot days obtained from simulated communities and from compositional pivot days. Acoustic pivot days obtained from simulated communities and from compositional pivot days exactly matched for sites A and C, and differed for site B.

Figure 8.

Graphical representation of dissimilarity matrices R1, S1 and C1 of sites A, B and C. The colour scale of each plot was defined from the deciles of its distribution (Q0, Q10, Q20, …, Q100), dark colours corresponding to high dissimilarities. Pivot days identified by the divisive hierarchical clustering algorithm are indicated by red points.


The estimation of acoustic temporal changes offers a new way to decipher the dynamics of animal communities. However, this recent field of research is still waiting for efficient analytical tools to obtain indices that reliably estimate the differences between pairs of audio sampling. Here, a selection of indices available in the literature was tested on bird acoustic communities in reference to simulated and field-recorded audio data. A new index, named Dcf, was proposed, and auditory spectra (spectra after transformation to Mel scale) were used rather than classical linear frequency spectra to describe the frequency content of the community sound.

The analysis of composition changes revealed that the communities monitored are dynamic in space and time with a species turnover. If species number does not significantly change (sites A and C) or slightly decreases (site B), the assemblage of species differs with time. The composition changes are associated with acoustic variations in simulated communities but less with acoustic variations of recordings made in the field. This indicates that using field recordings only might not provide information accurate enough to infer changes in the number and in the identity of species composing the assemblage. Monitoring compositional changes with field recordings only might be therefore challenging.

The five dissimilarity indices were highly correlated when applied to field recordings (Rij correlations) and simulated acoustic communities (Sij correlations), supporting previous results on simulated data (Gasc et al. 2013a). The new index Dcf, which is based on the cumulated spectrum, showed a slightly lower correlation with the other indices, indicating that it may contain and provide another way to measure spectral dissimilarity. Dcf has the main advantage to return low values when two Mel spectra differ by a slight frequency shift only, when other indices return maximal values. This behaviour potentially reduces a bias towards high values. It is preferable not to use a single index, at least in preliminary analyses, but rather to compute two indices, preferably the Df and Dcf indices, that can potentially provide complementary information. Df and Dcf indices have also the advantage to be very easy and quick to compute.

Mel-frequency cepstral coefficents (MFCCs) are features used in automatic speech and speaker recognition (Rabiner & Juang 1993), music information retrieval (Tzanetakis & Cook 2002) and now in animal vocalization identification (Lee, Lee & Huang 2006a; Lee et al. 2006b) but were used here for the first time for a task that does not involve pattern recognition. The MFCCs as used in speech processing are adapted to bird acoustic communities as they focus on the frequencies mostly used by birds, that is, between 0·3 and 8 kHz. This allows to select appropriate frequencies and to give less weight to high frequencies that are not occupied by bird vocalizations. The same kind of MFCCs could be used as well when monitoring amphibian communities that also produce sound in intermediate frequencies but should be parameterized to higher frequencies if applied to communities including insects that cover a wider frequency band reaching high frequencies (~ 20 kHz). MFCCs might not be adapted either to very rich acoustic communities, as those recorded in tropical habitats, where animal sound covers almost all frequency bands from a few hundred Hz to very high frequencies (>20 kHz) (Jain & Balakrishnan 2011).

Whatever the index applied, acoustic dissimilarity matrices of simulated communities (S1–S5) were highly correlated with compositional dissimilarity matrices (C1, C2), but acoustic dissimilarity matrices calculated from field recordings (R1–R5) were slightly or not correlated with either simulated (S1–S5) or compositional dissimilarity matrices (C1, C2). The likeness between simulated and compositional data can be explained by the fact that simulated communities were directly built from compositional data, by averaging the Mel mean spectra of one typical song of each species composing the community. The Monte Carlo test, which permuted the songs and their identity, that is, the species name attributed to each song, showed that the species specificity of songs had no impact on correlations between simulated and compositional/temporal data. Hence, the main cause of likeness between two simulated communities is the likeness between their respective compositions. This explains high correlations between simulated and compositional data.

What could explain the discrepancies between recorded and simulated/compositional data? Differences in noise level can be eliminated as the signal-to-noise ratio (SNR) of all recordings, including those achieved in the field, was high and homogeneous and as low-frequency background sound dues to anthropogenic noise was selectively removed with a high-pass filter. A reason explaining both low correlations between recorded and simulated data, and high correlations between compositional and simulated data, is the fact that one single song was selected for each species in the community simulation process. This tends to make compositional and simulated data correlated, and simulated communities unfaithful to the acoustic reality. The selection of several songs per species might have been used to obtain a better representation of species-specific variation. This could have led to more realistic descriptions of simulated communities and might have reduced the distance between simulated and compositional matrices. The most striking differences between real and simulated data are probably to be found in the relative abundance of the species songs composing the acoustic community. In a field recording, the song repetition rate can significantly differ between species leading to different species acoustic abundance. If this abundance information had been taken into account when describing the communities, the distance between recorded and composition matrices could have been reduced. However, our objective was to compare acoustic methodology with more traditional indices of functional diversity where functional distances among species rely on simple differences in songs without any other type of acoustic information. The weight of a species will be related to its distance to the microphone, its calling amplitude (sound pressure level). For instance, a species calling loudly and continuously close to the microphone will be dominant in the community sound and have a high impact on dissimilarity indices. Future dissimilarity metrics should be able to evaluate the relative importance of nearby and afar sound. These important parameters may explain the differences between field recordings (R1–R5) and simulated communities (S1–S5) where species are equally weighted. A solution to reduce the bias due to the source-microphone distance would be to increase the field sampling effort by increasing the number of recorders. A loud species would be dominant for a single or small group of microphones but not for the other microphones. This could then attenuate partly the dominance effect and balance the weight of each species composing the community but other solutions, including amplitude signal processes, should be found.

The dynamics of communities can be punctuated by important events at which the composition of the assemblage changes significantly. Such events, occurring at pivot days, may be identified from field recording (R1–R5), simulated community (S1–S5) or compositional data (C1, C2). Pivot days identified from simulated and compositional data were close, but slightly different from those obtained from field recordings. All pivot days were around the 10 of April except one at site A obtained from field recording data (7 of May) and one at site B obtained from simulated communities (25 of April). Changes highlighted by pivot days could be linked to the arrival of long distance migrants on their breeding ground, the start of their display activities and the silence of sedentary species raising their first clutch, and then the display activity of the same sedentary species preceding the second clutch (Moussus et al. 2009). However, it is worth to note that acoustic dissimilarities assess changes in the acoustics of the community, but not necessarily variations in its species composition, as suggested by the differences between R1–R5 and C1–C2 dissimilarity matrices.

To conclude, it seems that acoustic dissimilarity indices do not provide yet detailed information of the global changes in the species composition of a bird community. The indices cannot be used as an accurate proxy of compositional diversity but rather as a proxy of the acoustic dynamics of the community, another facet of community complexity that needs to be studied on its own. The indices can still reveal another side of biodiversity variation as they can indicate important changes like the pivot days that were here identified. Acoustic animal diversity should be considered as a particular facet of biodiversity that needs to be recorded, analysed and preserved.

If the acoustics methods for assessing dissimilarity are not ready yet to be used with high reliability, they still provide an interesting evaluation of community dynamics that have not been explored so far in an objective, non-invasive and repeatable way. Traditional survey methods based on listening are not free from biases. Errors can arise from different observer experience and hearing capacity, a limited observation time, an overestimation of highly detectable species and rare species that might retain observers' attention, and an underestimation of flock species (Gibbons & Gregory 2006). As debated previously (Haselmayer & Quinn 2000; Hutto & Stutzman 2009; Digby et al. 2013), traditional human-based and new autonomous machine-based methods have their own advantages and biases that need to be carefully balanced before to undertake a diversity survey

Eventually, the community approach should not discredit the species-specific approach, and complementary analyses based on aural or possibly automatic computer-based identification should be undertaken to understand in details the patterns and processes of bird acoustic communities.


This research was supported by BIOSOUND, a Fondation pour la Recherche sur la Biodiversité (FRB) grant and SABIOD MASTODONS Big Data CNRS MI project 2012-2017. We thank Marion Depraetere for conducting the field recordings and Amandine Gasc for her help on the selection of species recordings. We thank two anonymous referees for their helpful comments on the manuscript.

Conflict of interest

Other than having purchased their equipment for research purposes, the authors have no relationship with the company WildlifeAcoustics Ltd.

Data accessibility

The audio files are archived in the sound library of the Muséum national d'Histoire naturelle, Paris, France.