Animal social network inference and permutations for ecologists in R using asnipe

Authors


Summary

  1. The sampling of animals for the purpose of measuring associations and interactions between individuals has led to the development of several statistical methods to deal with biases inherent in these data. However, these methods are typically computationally intensive and complex to implement.
  2. Here, I provide a software package that supports a range of these analyses in the R statistical computing environment. This package includes a novel approach to estimating re-association rates of time between frequently sampled individuals.
  3. I include extended demonstration of the syntax and examples of the ability for this software to interface with existing network analysis packages in R.
  4. This bridges a gap in the tools that are available to biologists wishing to analyse animal social networks in R.

Introduction

In recent years, there has been a proliferation of software packages that provide functionality for analysis of social network data. These have largely been driven by the computational needs to analyse and interpret affiliation data in sociology, where data sets can be collected with high resolution and certainty. However, studying social behaviour in non-human animals entails much greater uncertainty in the probability that a dyad exists, and the measured strength of that connection. This has spawned extensive literature, in particular when testing for statistical significance and non-randomness (Whitehead 1997; Bejder et al. 1998; Croft et al. 2008; Whitehead 2008; Croft et al. 2011). Yet, there remains a general lack of simple to use tools in R (R Development Core Team 2012) that implement methods to perform the specialized analyses on sets of observed co-occurrences of individuals in animal groups.

The jump from analysing high-resolution networks, as typically achievable in human social networks, to networks comprising high levels of uncertainty is one of the largest barriers to robust application of social networks in animal behaviour. Croft et al. (2011) provide a comprehensive review outlining the reasons why standard methods, particularly those based on node-based permutations, are not suitable. The need for specialized methods for analyses in this subject was rapidly addressed by statisticians and biologists, culminating in the package SOCPROG (Whitehead 2009) that provides routines for many complex analyses. However, numerous studies are still published using packages such as UCINET (Borgatti et al. 2002) that provide out-of-the-box analyses but typically violate many of the underlying assumptions from data sampling when calculating significance in animal social networks (Croft et al. 2011). For example, social networks from human data generally assume that all individuals are equally likely to be observed at all times. Here, I describe a package that provides routines for several specialized tests based on data describing individual membership in groups. The asnipe package provides these routines in the statistical environment R that enables the results of these routines to be directly integrated with a wide range of social network packages for generating statistics on the inferred social network. By providing these routines in the R environment, I hope to bridge an existing gap in statistical tools and enable more robust use of social networks in animal behaviour research.

Overview of asnipe

asnipe primarily provides tools for analysis of social networks that are performed either on a group by individual matrix or a stack of association matrices represent sampling periods. The former is a matrix where the columns contain the identities of all individuals in the population, and each row describes membership to a distinct group:

Ind.‘A’ Ind. ‘B’ Ind. ‘C’ Ind. ‘D’ Ind. ‘E’ Ind. ‘F’
Group 1 1 0 1 0 0 0
Group 2 0 0 1 1 0 1
Group 3 0 1 0 0 1 0
Group 4 0 0 1 0 0 0
Group 5 0 1 0 0 1 0
Group 6 1 0 0 1 0 1

The latter is a t × N × N matrix, where t is the number of sampling periods, and N is the number of individuals. Each N x N submatrix in this stack contains 1s and 0s depending on whether individuals were associated during that sampling period. In the case below, all individuals were seen together during the first sampling period, whereas individuals ‘A’ and ‘B’ were not seen together in the second.

Period 1

  Ind.‘A’ Ind. ‘B’ Ind. ‘C’
Ind.‘A’ 0 1 1
Ind.‘B’ 1 0 1
Ind.‘C’ 1 1 0

Period 2

  Ind.‘A’ Ind. ‘B’ Ind. ‘C’
Ind.‘A’ 0 0 1
Ind.‘B’ 0 0 1
Ind.‘C’ 1 1 0

Period 3

  Ind.‘A’ Ind. ‘B’ Ind. ‘C’
Ind.‘A’ 0 1 0
Ind.‘B’ 1 0 1
Ind.‘C’ 0 1 0

This is the approach used by the Matlab package SOCPROG. The asnipe package contains routines for turning most common forms of association data into either of these two formats.

Functionality

At present, asnipe includes functions for four key analytical steps that are generally lacking in existing R packages (but available in Matlab via SOCPROG):

  1. Defining a group by individual matrix or generate sampling period co-occurences from association data.
  2. Creating an association matrix from observations of individuals co-occurring in time and space.
  3. Performing permutation tests on the observation stream following the method originally proposed by Bejder et al. (1998) and since refined by other authors (Whitehead 2008; Sundaresan et al. 2009).
  4. Calculating lagged association rates between individuals or classes of individuals.

In addition to the above, the routines in asnipe provide built-in functionality that enables simple incorporation of time, space and classes of individuals. This is important for conducting biologically meaningful statistical tests and enables users to very easily create temporal or spatial networks from their data.

Package functions

Converting group data

Commonly, data on group membership will be in a sequential format based on the observation of the groups. asnipe provides functions for converting these into either a group by individual matrix or a set of sampling periods. Here, I provide two examples for doing this using the provided functions. First, I provide code to demonstrate how to generate a group by individual matrix from a data frame containing individuals and the groups they are observed in. Note that the input for this function must be in this two-column format.

  • ## first load the package

  • R>library(asnipe)

  • ## define group memberships (or read from file)

  • R> individuals <- data.frame(ID=+c(“C695905”,“H300253”,“H300253”,

  • + “H300283”,“H839876”,“F464557”,“H300296”,“H300253”,

  • + “F464557”,“H300296”,“C695905”, “H300283”,“H839876”),

  • + GROUP=c(1,1,2,2,2,3,3,4,5,5,6,6,6))

  • R> individuals

  ID GROUP
1 C695905 1
2 H300253 1
3 H300253 2
4 H300283 2
5 H839876 2
6 F464557 3
7 H300296 3
8 H300253 4
9 F464557 5
10 H300296 5
11 C695905 6
12 H300283 6
13 H839876 6
  • ## get group by individual matrix

  • R> gbi <- get_group_by_individual(individuals, +data_format=“individuals”)

  • R> gbi

  C695905 H300253 H300283 H839876 F464557 H300296
1 1 1 0 0 0 0
2 0 1 1 1 0 0
3 0 0 0 0 1 1
4 0 1 0 0 0 0
5 0 0 0 0 1 1
6 1 0 1 1 0 0

An alternative approach is to record each group and all of the members it contains. This will need to be in the form of a list (and can be directly imported from file as shown in the previous section). Lists are collections of elements that can vary in size or form, making it suitable for groups with different number of members. These can then be converted into a group by individual matrix using the same function as listed above.

  • ## define group memberships (or read from file)

  • R> groups <- list(G1=c(“C695905”,“H300253”),

  • +G2=c(“H300253”,“H300283”,“H839876”),

  • +G3=c(“F464557”,“H300296”),

  • +G4=c(“H300253”),

  • +G5=c(“F464557”,“H300296”),

  • +G6=c(“C695905”,“H300283”,“H839876”))

  • R>groups

  • $ G1

  • [1] “C695905” “H300253”

  • $ G2

  • [1] “H300253” “H300283” “H839876”

  • $ G3

  • [1] “F464557” “H300296”

  • $ G4

  • [1] “H300253”

  • $ G5

  • [1] “F464557” “H300296”

  • $ G6

  • [1] “C695905” “H300283” “H839876”

  • ## get group by individual matrix

  • R> gbi <- get_group_by_individual(groups, +data_format=“groups”)

Both of these input formats can also be used to generate sampling periods. The key difference is that sampling periods must explicitly contain the time when each group was observed, and the period over which data should be represented. Using the two input files from above, the sampling periods can be generated as follows:

  • ## individuals in groups format

  • ## include times for each individual

  • R> individuals <- cbind(individuals,

  • + DAY=c(1,1,1,1,1,2,2,2,3,3,3,3,3))

  • ## now get sampling periods

  • R> SPs <- get_sampling_periods(individuals[,+ c(1,2)],individuals[,3],1,

  • + data_format=individuals)

  • ## sampling periods indexed over the first element

  • R> SPs[1,,]

  C695905 H300253 H300283 H839876 F464557 H300296
C695905 0 1 0 0 0 0
H300253 1 0 1 1 0 0
H300283 0 1 0 1 0 0
H839876 0 1 1 0 0 0
F464557 0 0 0 0 0 0
H300296 0 0 0 0 0 0
  • ## list of groups format

  • ## create a time variable

  • R> days <- c(1,1,2,2,3,3)

  • ## now get sampling periods

  • R> SPs <- get_sampling_periods(groups,

  • + days,1,data_format=“groups”)

These sampling period matrices can then be used to generate association matrices and to perform network randomizations that control for individual gregariousness (see Whitehead 2008, p. 130). In some cases, randomization procedures may need to control for spatial, such as swapping only individuals within the same locations. In that case, the sampling period function also provides the functionality to input location information and calculates independent sampling periods for each location in each time period. Although in the rest of this manuscript, I will be using group by individual matrices to demonstrate functionality, using sampling period data is identical. When entering sampling period data, the data_format flag is changed from ‘GBI’ to ‘SP’.

Generating networks

The main step involved with performing social network analysis is the creation of the social network. Whitehead (2008) provides extensive discussion of the methodology for observing associations, groups and measuring interactions. Yet with the exception of SOCPROG (Whitehead 2009), I am unaware of another package that will accept group data and generate a social network with a chosen measure (see Whitehead 2008, for information on index ratios). In asnipe, I provide a method that calculates the association matrix from either sampling periods or a group by individual matrix. Perhaps, the most powerful aspect of this function is the ability to subset data within the function and therefore generate temporal or spatial networks using a single loop.

In the following examples, I will be using the data from Farine et al. (2012). This data set is provided with the package and can be loaded using the data function as shown below. Once the group by individual matrix has been loaded, the association matrix can be calculated using the function get_network:

  • R> data(“group_by_individual”)

  • R> network <- get_network(gbi, data_format=“GBI”)

However, the get_network function has further functionality that can automatically subset data internally. In the example below, the network is calculated separately for the first and second half of the time. The results are stored in a three-dimensional cube where the first index is the network number, second is the association matrix rows, and the third is the association matrix columns:

  • R> data(“times”)

  • ## define to 2 × N × N array that will hold

  • ## the two N x N association matrices

  • R> networks <- array(0, c(2, ncol(gbi), ncol(gbi)))

  • ## calculate network for first half of the time

  • R> networks[1,,] <- get_network(gbi, + data_format=“GBI”, times=times,

  • + start_time=0, end_time=max(times)/2)

  • Generating 151 × 151 matrix

  • ## calculate network for second half of the time

  • R> networks[2,,] <- get_network(gbi, + data_format=“GBI”, times=times,

  • + start_time=max(times)/2, + end_time=max(times))

  • Generating 151 × 151 matrix

These association matrices can then directly interface with other packages to calculate network statistics:

  • ## convert to igraph network and calculate

  • ## degree of the first network

  • R> library(igraph)

  • R> net <- graph.adjacency(networks[1,,], + mode=undirected, diag=FALSE,

  • + weighted=TRUE)

  • R> deg_weighted <- graph.strength(net)

  • R> detach(package:igraph)

  • ## alternatively package SNA can use matrix stacks ##directly

  • R> library(sna)

  • R> deg_weighted <- degree(networks,gmode=“graph”,

  • + g=c(1,2), ignore.eval=FALSE)

  • R> detach(package:sna)

Network permutations of the data stream

An important finding in the animal social network literature is that the randomization methods used for creating null models in community ecology (for example Manly 1997) can easily lead to biases and overestimates of statistical significance (Bejder et al. 1998). It was proposed by Bejder et al. (1998) that to avoid biases in sampling, randomizations should be performed on the data stream rather than on the association matrix. This is where most software packages are incompatible with the requirements when analysing animal social networks as they typically rely on node-based permutations (with the exception of SOCPROG). Several authors have also suggested improvements to the data stream permutation method, such as randomizing while controlling for space, time or type of individual (Whitehead 1999; Whitehead et al. 2005; Sundaresan et al. 2009).

I incorporate all of these in the function network_permutation to test where the observed data fits on a distribution based on permutations (Fig. 1). This method swaps either individuals between groups (when using group by individual matrices) or associations (when using sampling periods). It then recalculates the network after each swap, creating a stack or set, of p matrices where p is the number of permutations, and each slice in that stack is an N × N association matrix. These swaps can maintain the variance in individual gregariousness and size of each group constant (Bejder et al. 1998). This function enables the swaps in the data stream to be limited between individuals that occur on the same day, in the same location, or are of the same class (such as sex or age class). These variables are not confined to data types and can therefore provide the ability to restrict permutations within any two types of group-level characteristics (days and locations that can represent any time and/or space variable) and one type of individual characteristics (classes that can be any variable describing a characteristic of individuals). In the case of group-level characteristics, these must be explicitly incorporated into the sampling periods if using that method.

Figure 1.

Results from the example permutation show that the observed weighted degree, or strength, in the population was significantly higher than expected by chance in both the morning (A) and afternoon (B). The red line (observed weighted degree) is higher than 100% of the values from permutations in both cases. The plots also suggest a shift in behaviour in the afternoon that leads to an overall increase in associations.

  • # # calculate the weighted degree of the two networks

  • # # calculated previously. The degree function ## accepts stacked graphs as an input.

  • R> library(sna)

  • R> deg_weighted <- degree(networks, gmode=“graph”, + g=c(1,2), ignore.eval=FALSE)

  • # # perform the permutations constricting within hour ## of observation using the days parameter

  • R> network1_perm <- network_permutation(gbi, + data_format=“GBI”,

  • + association_matrix=networks[1,,], times=times, + start_time=0, end_time=max(times)/2

  • + days=floor(times/3600), within_day=TRUE)

  • R> network2_perm <- network_permutation(gbi, + data_format=“GBI”,

  • + association_matrix=networks[2,,], times=times,

  • + start_time=max(times)/2, end_time=max(times),

  • + days=floor(times/3600), within_day=TRUE)

  • # # calculate the weighted degree for each permutation

  • R> deg_weighted_perm1 <- degree(network1_perm,+ gmode=“graph”, g=c(1:1000),

  • + ignore.eval=FALSE)

  • R> deg_weighted_perm2 <- degree(network2_perm,+ gmode=“graph”, g=c(1:1000),

  • + ignore.eval=FALSE)

  • R> detach(package:sna)

  • # # plot the distribution of permutations with the

  • # # original data overlaid

  • R> par(mfrow=c(1,2))

  • R> hist(colMeans(deg_weighted_perm1),breaks=100,

  • + main=paste(“P = ”, sum(mean(deg_weighted[,1]) <

  • + colMeans(deg_weighted_perm1))/ncol +(deg_weighted_perm1)),

  • + xlab=“Weighted degree”, ylab=“Probability”)

  • R> abline(v=mean(deg_weighted[,1]), col=‘red’)

  • R> hist(colMeans(deg_weighted_perm2),breaks=100, + main=paste(“P = ”,

  • + sum(mean(deg_weighted[,2]) < colMeans+ (deg_weighted_perm2))

  • + /ncol(deg_weighted_perm2)),

  • + xlab=“Weighted degree”,ylab=“Probability”)

  • R> abline(v=mean(deg_weighted[,2]), col=“red”)

Using permutations with linear models

Linear models, and variants such as general linear models and generalized linear mixed models, are frequently used in ecology and animal behaviour to test the strength and significance of biological effects. The Bejder et al. (1998) permutation method is a useful way of estimating the significance of parameter estimates against biologically relevant null models, because permutations can control for spatial, temporal and individual variation. Here, I demonstrate how this can be used to show that there is a significant effect of time of day on the weighted degree as shown by the shift in histograms from Fig. 1. In this case, the coefficient estimate for the magnitude of the slope from the original data is compared with the coefficient estimate based on the weighted degrees of individuals from each permuted network. The one-tailed significance is then calculated based on the position of the observed slope estimate relative to the distribution of slopes calculated from the randomized data.

  • # # build dataset with all data in one column of a data ##frame

  • R> input <- rbind(data.frame(Degree=+ deg_weighted[,1],Time=“MORNING”),

  • + data.frame(Degree=deg_weighted[,2],+ Time=“AFTERNOON”))

  • ## build model of strength (weighted degree)

  • ## of each individual as a function of time of day

  • R> model <- lm(Degree∼Time, data=input)

  • ## get parameter estimate of slope

  • R> e <- coef(summary(model))[2,1]

  • R> e

  • [1] 3.316351

  • ## get an estimate of the slope for each permutation

  • ## matrix

  • R> e_perm <- rep(NA,1000)

  • R> for (i in 1:1000) {

  • R> input_perm <- rbind(data.frame+ (Degree=deg_weighted_perm1[,i],

  • + Time=“MORNING”),data.frame+ (Degree=deg_weighted_perm2[,i],

  • + Time=“AFTERNOON”))

  • R> model_tmp <- lm(DegreeT˜ime, data=input_perm)

  • R> e_perm[i] <- coef(summary(model_tmp))[2,1]

  • R>}

  • ## calculate P value from how many of the slopes ##estimated in the randomized data

  • ##are larger than the observed

  • R> P_value <- sum(e_perm > e)/1000

  • R> P_value

  • [1] 0.001

This result suggests a significant effect of the time of day on association patterns because the parameter estimate from the model based on the original data (in this case a slope of 3·316) was greater than the estimate from the randomized data in all but 1 of the 1000 permutations. Thus, the increase in the strength of associations from the morning to afternoon was significant when compared to a null model that randomized the pattern of associations within these same time periods (P < 0·01). Although here I demonstrate the use of this approach using 1000 permutations, this approach does often require some verification that the P value has stabilized, which may only occur after many more permutations. This approach could be extended to incorporate individual identities as random effects using generalized linear mixed models or species as random effects. Here, I demonstrated the approach using linear models for maximum clarity.

Calculating lagged association rates

Lagged association rates are a measure of the probability of being observed re-associating in a given time lag (Whitehead, 1995, 2008). This allows biologists to test for temporal persistence of associations between individuals. The original approach given by Whitehead (1995) measures the average probability of any re-association between individuals during the given time window (see Whitehead 2008, section 5.5.1).

  • R> data(“group_by_individual”)

  • R> data(“times”)

  • R> data(“individuals”)

  • ## calculate lagged association rate for great tits

  • R> lagged_assoc <- LAR(gbi,times,3600, classes=+ inds$SPECIES,which_classes=“GRETI”),

  • + which_classes=“GRETI”)

  • R> lagged_assoc

  [,1] [,2]
2 0.6931472 0.7210728
3 1.0986123 0.6745192
4 1.3862944 0.7021277
5 1.6094379 0.6911413
6 1.7917595 0.6666667
7 1.9459101 0.6539924
8 2.0794415 0.6181102
9 2.1972246 0.5121951

This function returns a 2 × N matrix with  log (time) in the first column and the lagged association rate for each time period τ in the second that can be used directly for plotting the results (Fig. 2). However, this approach generally requires an estimate of the error to be generated. Whitehead (1995) suggests that the Jackknife technique is appropriate. This can easily be implemented by creating subsets of the data, removing one or more observations and calculating the lagged association rate for the new data. By repeating across all possible subsets, the standard error can be estimated and plotted onto the graph.

Figure 2.

Lagged association rate for individuals in the study by Farine et al. (2012) shows little decline over the course of one day.

  • # create an empty variable to store results, and

  • # store the result after each group has been removed

  • R> lagged_assoc_perm <- matrix(NA, + nrow=nrow(lagged_assoc), ncol=nrow(gbi))

  • # create a loop to run each simulation, and run on the

  • # dataset having removed one row at a time. Here we are

  • # only interested in the second column of the result.

  • R> for (i in c(1:nrow(gbi)) {

  • R> lagged_assoc_perm[,i] <- LAR(gbi[-i,],+times[-i],3600, classes=inds $ SPECIES, + which_classes=“GRETI”)[,2]

  • R> }

  • # calculate the standard error

  • R> N <- nrow(gbi)

  • R> means <- rowMeans(lagged_assoc_perm)

  • R> se <- sqrt(((N-1)/N) * apply((means-+ lagged_assoc_perm)^2,1,sum))

  • ## plot the results

  • R> plot(lagged_assoc, type=’l’, axes=FALSE,

  • + xlab=“Time (hours)”, ylab=“LAR”, ylim=c(0,1))

  • R> arrows(lagged_assoc[,1], lagged_assoc[,2]-se, +lagged_assoc[,1], lagged_assoc[,2]+se, + angle=90, code=3, length=0.1)

  • R> axis(2)

  • R> axis(1, at= lagged_assoc[,1], + labels=c(1:nrow(lagged_assoc)))

The lagged rate of association

One significant advance in the study of animal social networks is the greater sampling rates that are achievable through tracking of individuals, giving greater temporal resolution to their behaviours and subsequent associations. However, repeated sampling of individuals within short time periods is likely to upweight random interactions that are driven by spatial overlap, and therefore confound long-term lagged association rates of populations studied at a landscape scale. For example, if individuals are sampled 100 times per day over 3 months and the minimum τ is set to one day, then any dyad needs to be observed just once per day or approximately 90 times, in order to get a constant lagged association rate of 1. Yet the association rate between these individuals could be as low as 0·01 (if associating just once per day).

In this package, I present an alternative measure of the temporal rate of re-association that incorporates the frequency at which dyads are observed to associate for a given time lag τ. This measure is given for individuals X and Y by:

display math(1)

where aj(X,Y) is the number of observations of the dyad X and Y in time period j, and ak(X,X) is the number of observations of individual X in time period k. This results in a measure for τ that is more closely related to the association rate of individuals calculated by the simple ratio index. The LRA function calculates the lagged rate of association using the above formula. Alternatively, it can be used to return a lagged rate of association that is qualitatively similar to Whitehead (1995) but calculated independently for each dyad by setting the association_rate flag to FALSE. Setting association_rate to FALSE ignores the number of observations of individuals within sampling periods (both together and apart), setting their value to 1 if they were seen at least once and 0 if they were never seen together. The mean of this dyadic lagged rate of association may differ to the regular lagged association rate from the function LAR, but can be used to estimate association rates within or between classes of individuals.

  • R> data(“group_by_individual”)

  • R> data(“times”)

  • R> data(“individuals”)

  • ## calculate lagged association rate between great ##tits

  • R> lagged_rates <- LRA(gbi,times,3600,

  • + classes=inds $ SPECIES, which_classes=“GRETI”

  • + association_rate=TRUE)

  • ## calculate the mean rate for individuals at each tau.

  • ## note the difference to the values generated from LAR ## above

  • R> apply(lagged_rates,3,mean,na.rm=TRUE)

  • [1] 0.6100229 0.5473871 0.6131421 0.6426655 0.6293223 0.5905218 0.5775401 0.5121951

  • R> str(lagged_rates)

  • num [1:51, 1:51, 1:8] NA NaN NaN NaN 0.333 ...

  • - attr(*, “dimnames”)=List of 3

  • ..$ : chr [1:51] “1” “2” “3” “4” ...

  • ..$ : chr [1:51] “1” “2” “3” “4” ...

  • ..$ : NULL

This function returns the lagged rate of association for each dyad in a stack of N × N matrices with each slice representing one increment in τ. Dyads not observed associating are returned the value NaN to distinguish this from a probability of zero. The function also provides an alternative output style that is a data frame consisting of each dyad, τ, and probability of re-association. This format is useful for fitting models from other packages or plotting data as a surface.

  • R> lagged_rates <- LRA(gbi,times,3600, classes=+ inds $ SPECIES,which_classes=“GRETI” + output_style=2)

  • Timesteps = 9

  • R> str(lagged_rates)

  • ‘data.frame’: 3680 obs. of 4 variables:

  • $ ID : Factor w/ 45 levels “1”,“10”,“11”,..:

  • + 1 1 1 1 1 1 1 ...

  • $ ASSOCIATE: Factor w/ 51 levels

  • + “1”,“10”,“11”,..: 45 49 50 3 5 6 ...

  • $ TIME : num 1 1 1 1 1 1 1 1 1 1 ...

  • $ RATE : num 0.333 0.5 0.667 0.667 0 ...

In order to demonstrate the difference between the lagged association rate and the lagged rate of association, I calculated both values for a population of wild blue tits sampled repeatedly over 13 consecutive weeks in Wytham Woods, near Oxford, UK. Individuals fitted with passive integrated transponder (PIT) tags were detected on average 226 times (range 1–942) in a stratified grid of 65 feeders fitted with antennae to detect visits by individuals. These feeders were all opened and shut on the same two days per week using the same methods described by Farine et al. (2012) to provide 26 daily samples over a period of 86 days (3 December 2011 to 26 February 2012). Running the two methods demonstrates that the lagged association rate is significantly higher than the lagged rate of association in this population (Fig. 3). Importantly, these two methods have a very different estimate with respect to a null association rate. The lagged rate of association null was calculated using the mean group size experienced by individuals divided by the mean number of total associates for each individual (the mean binary degree). It differs from the null lagged association rate proposed by Whitehead (1995) that used N−1 in the denominator, which assumes equal probability of mixing between all individuals and may not be appropriate for large populations. The standard lagged association rate suggests a high rate of re-associations. This is largely because individuals were detected many times within each sampling period, and this measure requires only one re-association in order to give that time period a value of 1. Therefore, individuals that co-occur in space but are not associates might be repeatedly found co-occurring simply by chance arrivals at the same feeder at the same time. The new method proposed in this package will therefore be most appropriate for data-rich studies using automated sampling, while the traditional lagged association rate will be most suitable for studies in which individuals have fewer observations per sampling period.

Figure 3.

Lagged association rate (top) and lagged rate of association (bottom) will typically show little difference in shape. However, the lagged rate of association estimates a much lower rate of re-association that does not differ significantly from a null expectation (dashed line) in this population. This more strongly reflects the mean (non-zero) association rate of 0·04 for this period (standard deviation = 0·04, range 0·001–0·45). The lagged association rate in this case overestimates the social affinity over time for this population, suggesting it is significantly higher than the null expectation (dotted line). Standard errors were calculated via jacknife where each sampling location was removed one at a time.

Closing comments

A major goal in introducing asnipe is to provide the functionality for specialized analysis of animal social networks, while maintaining access to the wide range of tools for more general analysis of network structure that are available in R, such as sna and igraph. This article provides examples of all the steps required to perform network analysis on data capturing the group membership of individual animals. I hope that by providing a tool that is freely available in a widely used statistical computing platform, this will encourage greater uptake of R by biologists studying animal social networks and further development of tools tailored to the needs of this type of analysis.

Acknowledgements

I sincerely thank the members of the EGI social networks group, and in particular Josh Firth, Colin Garroway, and Reinder Radesma for their comments on early drafts of the manuscript, Ioannis Psorakis for continued debate of the merit of statistical approaches in social networks. I also thank Hal Whitehead and one anonymous referee for valuable comments. This work was funded by a European Research Council grant (AdG 250164) awarded to Prof. Ben C. Sheldon.

Ancillary