RInSp: an r package for the analysis of individual specialization in resource use

Authors


Summary

  1. In the last decade, an increasing number of papers testifies a renewed interest in the topic of individual specialization in resource use and its implication at higher levels of ecological organization.
  2. We present the package R Individual Specialization (RInSp) for the free open-source statistical software r. RInSp provides a comprehensive set of classical and recently proposed indices for quantifying the degree of individual specialization using both categorical and continuous resource use data. The package also includes tools for ad hoc Monte Carlo and jackknife resampling procedures for significance testing, plotting and input/output data manipulation.
  3. The use of RInSp is demonstrated by two examples. In addition, the potential of the package to be implemented beyond its original scope for multi-level quantitative analyses of individual trait variance in natural communities is illustrated.

Theoretical–historical background

Hutchinson's (1957) conceptualization of a species' ecological niche and subsequent advancements of the theory (see, among others, Colwell & Rangel 2009 for a recent review and synthesis) have generally postulated ecological equivalence among conspecifics or irrelevance of any within-species variation. However, Van Valen (1965) first incorporated the idea of individual variation in resource use into niche theory. After this early effort, the interest in intraspecific niche variation grew paralleling the mainstream, species-centred theoretical developments, culminating in Roughgarden's contribution (1972, 1974), casted within a variance-partitioning framework originally developed by Wright (1949). In the following years, the importance of intraspecific variation in niche theory has been inconclusively discussed, ultimately following the fate of the niche concept, faded by the downgrading interest in the role of competition in structuring animal communities (e.g. Connor & Simberloff 1986).

In 2002 and 2003, however, Bolnick et al. made an unprecedented synthesis of the literature on the incidence of interindividual niche variation and on the indices developed to quantify it. Because these early reviews, the number of studies on the topic has more than doubled (Araújo, Bolnick & Layman 2011), testifying a renewed interest in individual specialization and its implication at higher levels of ecological organization (Bolnick et al. 2011; Violle et al. 2012). The knowledge gap originally emphasized by Bolnick et al. (2003) has been greatly reduced, with a considerable proportion of recent investigations quantifying individual specialization adopting relatively novel approaches (e.g. stable isotopes analysis: Araujo et al. 2007; Bolnick & Paull 2009) and using both classical (e.g. those listed in Bolnick et al. 2002) and recently proposed indices (e.g. Araújo et al. 2008).

Origin and scope of RInSp

Bolnick et al. (2002) were complemented by IndSpec1.exe, a command-line Microsoft Windows executable to calculate any of the indices examined in the article and to test their statistical significance by Monte Carlo resampling procedures. IndSpec1 was written in Borland C++ 5·0, using libraries not available on other platforms and with no graphical or export capabilities.

The ‘R Individual Specialization’ (RInSp hereafter) package has been developed to expand the original set of indices of individual specialization listed in IndSpec1 with recently developed metrics. The development of the package for the statistical software r (R Development Core Team 2013) provided the opportunity to overcome IndSpec1 limitations in a cross-platform environment characterized by great flexibility in downstream analyses and high-level graphical capabilities.

In addition, a critical re-examination of the algorithms used to calculate Roughgarden's metrics revealed a potential inconsistency of results on continuous data, stimulating the implementation of a novel, more robust estimation procedure. At the same time, the in-depth scrutiny of the theoretical and statistical assumptions of the metric highlighted the potential of the package to be used for multi-level quantitative analyses of individual trait variance in natural communities.

Overview of RInSp features

The import.RInSp function is compatible with a variety of data formats and produces an R object of class ‘RInSp’ storing the original resource data, their type, a data frame for additional information, a matrix for resource use proportions, and further information concerning numbers and names of resources and prey.

R Individual Specialization calculates a variety of indices common to IndSpec1, whose theoretical background and corresponding functions are summarized in Table 1. The only original implementation regards Roughgarden's indices on continuous data. Specifically, when marked differences occur among individuals in terms of the number of items used, the relation total niche width (TNW) = within-individual component (WIC) + between-individual component (BIC; see Table 1 for notations) may not hold (Roughgarden 1979, p. 528). The biasing influence of such a discrepancy depends heavily on the data set, in terms of the number of individuals and prey items. A simulation using synthetic data clearly illustrates the effects on the index WIC/TNW (Fig. 1). RInSp addresses the issue by specifying how to weight the number of items per individual. Using an ‘equal’ weight strategy, the importance of each element in an individual diet is equated, normalizing the importance of each individual by the number of items in its diet, that is, those with more data contribute more to the estimation of the parameters. Alternatively, when huge among-individual differences in prey number occur, a ‘number of items’ weight strategy is available, where the contribution of each individual diet is proportional to the inverse of the dimension of its diet (see additional information for a practical example).

Table 1. A summary of the indices of individual specialization originally present in IndSpec1 and part of RInSp. A short description of their theoretical background is provided along with relevant references. For further details, please refer to the package help pages
IndexDescriptionImplementation in RInSp
PS; IS

Czekanowski's proportional similarity index (PS) is: inline image (Schoener 1968) where pij is the frequency of category j in the individual i's diet, and qj is the frequency of category j in the population as a whole.

By the average of individuals' PSi values, the prevalence of individual specialization in the population is measured as: inline image

The function PSicalc calculates individual estimates of pairwise specialization, its variance, and IS values, along with other information regarding the population diet used, and the total number of samples.

Hypothesis testing by Monte Carlo resampling is available when all individuals are kept to obtain distributions of PSi and IS. In addition, the user has the option to exclude the focal individual from the calculation of the population diet.

λi; WiThe likelihood ratio of the observed diet of individual i against the population diet is: inline image (Petraitis 1979) where qj is the population proportion of the resource j, pij is the proportion of the resource j in the diet of the individual i and nij is the number of items for the individual i and the resource j. λi is sensitive to the sample size of the diet items of each individual (Di), and a standardized measure is calculated as: inline image For a complete generalist individual, Wi = 1, and the value decreases with greater specialization.RInSp provides the function like.Wi calculating on categorical diet data the value of the mean Wi for individual in the diet matrix, along with the single Wi, λi, and associated probability values.
TNW, WIC, BIC, WIC/TNW

According to Roughgarden (1972), the total niche width of a population (TNW) can be broken down into two components: the variation in resource use within individuals (within-individual component, WIC), and the variance between individuals (between-individual component, BIC). These measures can be calculated as: inline image Assuming a matrix X of diet data, where each element xij is the size (or other measure) of the prey item j in individual i's diet.

With discrete, unordered resource categories, the Shannon information theory formula is used (Roughgarden 1979, p, 528): inline image where Xh denotes a resource category, with = 1,…H, Yi denotes an individual, with i = 1,…S, P(Xh|Yi) is a joint distribution function expressing the probability that a resource is of type Xi and is used by individual Yi, with inline image, p(Yi) represents the population individual distribution, irrespective of their resource use, calculated as inline image, u(Xh) represents the whole population resource use, irrespective of what each individual consumes, calculated as inline image, inline image is the conditional distribution of resource use for each given individual, while inline image is the distribution of individuals using any given resource.

For both continuous and discrete data, the relative degree of individual specialization can be measured as the proportion of TNW explained by within-individual variation, or WIC/TNW.

The procedures WTcMC and WTdMC perform the analysis of Roughgarden indices in the continuous and discrete case, respectively.

A Monte Carlo resampling approach is used to evaluate the statistical relevance against a null hypothesis. The null model corresponds to a population composed of generalists that sample randomly from the population's diet and have diet sizes equal to those of the observed data set.

Figure 1.

Discrepancy Δ values calculated as the difference between total niche width (TNW) and the sum within-individual component (WIC) + between-individual component (BIC) for two simulated synthetic data sets constituted by resource matrices characterized by identical numbers of individuals (N = 50) with different numbers of prey items in their diet (i.e. the sparsity of the resource matrix). In the two data sets, the number of prey items varied among individuals according to a negative binomial distribution characterized by a dispersion parameter θ of 0·8 (empty circles) and 3·2 (grey circles). Further details on the simulation assumptions are provided in the supporting information.

In the following sections, the indices added ex novo are presented in detail.

Pairwise diet overlap, individual specialization E index and network modularity measure Cw

The overlap function calculates the pairwise diet overlap between all individuals in a sample (Bolnick et al. 2002). For N individuals, a × N matrix is provided, where each cell oik represents the diet overlap between individual i and individual k. The overlap ranges from 0, when individuals have no common prey, to 1, when individuals consume the same prey in identical proportions. The individual overlap is:

display math

where pij and pkj are the proportion of the resource j for the two individuals. The diagonal in the diet overlap matrix is all ones because an individual has a 100% overlap with itself, and it can be used for testing whether diet (dis)similarity is a function of various other metrics of between-individual difference or similarity (e.g. Bolnick & Paull 2009). The matrix of oik values is provided for subsequent analyses or to generate figures showing network connectivity among individuals as in Araújo et al. (2008).

The average pairwise overlap similarity is calculated as:

display math

The procedure Eindex provides the E index of individual specialization, a measure of the average pairwise dissimilarity, calculated as E = 1−O (Araújo et al. 2008). The index ranges from 0, when all individuals use the same resources in the same proportions, to 1, when every individual relies entirely on a different resource. Estimates of E may be biased upwards when there are few resource observations per individual. To account for this bias we recommend calculating an adjusted E:

display math

This procedure standardizes the observed value Eobs between 0 (when the observed value is equal to the mean null value Enull) and 1 (when individual specialization is maximal) allowing the comparison of data sets with different mean null values. A jackknife estimation of the variance of E is derived based on the estimation procedure provided by Araújo et al. (2008). In addition, RInSp calculates the Cw measure of the relative degree of clustering in a network to test for modularity in the niche overlap network (Araújo et al. 2008). Two different weighted clustering coefficients can be calculated. The degree of unweighted clustering around a node i in a network is quantified by evaluating the number of triangles in which the node participates, normalized by the maximum possible number of such triangles. By extension, a weighted clustering coefficient should take into account how much weight is present in the neighbourhood of the node, compared with some reference case. Barrat et al. (2004) proposed a weighted version of the clustering coefficient of the form:

display math

where si is the sum of the weights (wi) of all the edges between node i and the nodes to which it is connected; ki is the number of edges between node i and its neighbours; wij is the weight of the edge between two nodes i and j; a is 1 if an edge is present between each pair ij, ih and jh, respectively, and zero otherwise. The sum, therefore, quantifies the weights of all edges between node i and its neighbours that are also neighbours to each other. Saramäki et al. (2007) proposed an alternative clustering index of the form:

display math

where ki is the number of edges between individual i and its neighbours; wij is the weight of the edge between individual i and j obtained by dividing the actual weight by the maximum of all weights. The summation, therefore, quantifies the weights of all edges between individual i and its neighbours that are also neighbours to each other.

The procedure Emc performs a Monte Carlo resampling to derive a statistical test for both indices. For counts of individual prey items representing approximately independent prey captures (i.e. integer data type), it is possible to run a Monte Carlo resampling simulation to test the null hypothesis that any observed diet variation arose from individuals sampling stochastically from a shared distribution (Araújo, Bolnick & Layman 2011). Both E and Cw values are recalculated for the resulting simulated population. A user-specified number of null data sets is generated, and the observed test statistic is compared with the distribution observed under the null hypothesis. Two options are offered for calculating the population's diet proportions (qj, the proportion of the resource j in the population's diet): ‘sum’ and ‘average’. When ‘sum’ is specified, all resource counts are summed within a category across all individuals to get the population's use, then the proportion of each resource category in the population's repertoire is determined as:

display math

The drawback of this approach is that individuals that eat large numbers of items, or larger total mass of items, will bias the population to look more like them (Bolnick et al. 2003). The ‘average’ method circumvents this problem by first converting individual diets into proportions pik, then averaging these proportions for each resource k, thereby giving all individuals equal influence over the estimated population diet.

NODF – a nestedness measure

For studies of individual specialization, one form of diet variation arises when individuals differ in their niche breadth, such that some individuals diet is a subset of other individuals' diets. This is revealed by a nestedness metric, which in RInSp can be estimated for a binary matrix using the function NODF. The procedure is based on two principles: decreasing fill (or DF) and paired overlap (or PO). In a matrix with m rows and n columns, row i is located at an upper position from row j, and column k is located at a left position from column l. In addition, let MT be the marginal total (i.e. the sum of 1's) of any column or row. For any pair of rows/columns i and j, DFij will be equal to 100 if MTj is lower than MTi. Alternatively, DFij will be equal to 0 if MTj is greater or equal to MTi. For columns/rows, POkl is simply the percentage of 1's in a given column/row l that are located at identical row/column positions to those in a column/row k. For any left-to-right column pair and, similarly, for any up-to-down row pair, there is a degree of paired nestedness (Npaired) as zero if DFpaired is zero, and PO if DFpaired is 100. From the n(n−1)/2 and m(m−1)/2 paired degrees of nestedness for n columns and m rows, a measure of nestedness among all columns (Ncol) and among all rows (Nrow) can be calculated by simply averaging all paired values of columns and rows. Thus, a measure of nestedness for the whole matrix is given by (Almeida-Neto et al. 2008):

display math

RInSp utilities and example data sets

In the online supporting information, a description of the utilities included in the package for further analyses and data exchange is provided. In addition, RInSp includes two examples of freshwater fish diets, one for categorical and one for continuous raw data matrices. The Stickleback example reproduces data from Bolnick & Paull (2009) of counts of prey categories in stomach contents of individuals from a population of three-spine sticklebacks (Gasterosteus aculeatus), using 265 specimens sampled from five sites located in Roberts Lake on Vancouver Island, British Columbia. The Trout example reproduces data published in Kahilainen & Lehtonen (2001) on fish prey lengths of a sample of 59 individuals of stocked (S) and native (N) brown trout (Salmo trutta) from the subarctic Lake Muddusjärvi in Northern Finland. Detailed examples of application of RInSp indices using both data sets are provided in the supporting information.

Future implementations of RInSp

The first, stable release of RInSp is available on CRAN at http://cran.rproject.org/web/packages/RInSp/index.html and on GitHub at https://github.com/Nicola-Zaccarelli/RInSp. A limitation on current metrics of individual specialization is the effect of prey size in stomach content data. When prey is of vastly unequal sizes, Monte Carlo resampling methods may generate unrealistic null prey distributions. Future RInSp versions will implement size corrections that use diet data together with a matrix of prey sizes and a vector of forager sizes, to estimate maximum stomach volume and generate Monte Carlo diets that are constrained to remain within realistic limits of stomach volume.

Major efforts, however, will be made to generalize the functions implemented in RInSp for the analysis of continuous data. Specifically, the WIC/TNW ratio partitions the variance in individual resource use assuming a two-level, intra- and interindividual hierarchy. In principle, it is possible to implement a hierarchical variance partition at n levels of ecological organization, with > 2. Recently, Violle et al. (2012) have emphasized the need of a multi-level quantitative scrutiny of trait variance in natural communities, proposing a procedure of hierarchical variance partition of traits across ecological levels of organization – from individuals to community – based on F-statistics (Wright 1949). The implementation of RInSp to calculate n-level hierarchical variance components on both discrete and continuous data is straightforward and will represent a major improvement in the power of the package to highlight changes across different levels of ecological organization in individual diet specialization and, in general, individual traits ranging from leaf area in plants to movement tortuosity in animals (e.g. Cornwell & Ackerly 2009; Mancinelli 2010; Potenza & Mancinelli 2010). The package includes the function Hier2L as a purely introductory illustration of these forthcoming features. The function decomposes the variance for continuous data based on a user-specified factor. For example, the analysis of the Finnish stocked and native trout data run separately provides a WIC/TNW ratio of 39·76% and 58·96%, respectively (no weighting procedure adopted). By considering both groups as belonging to a single population, it is possible to fractionate the total variance in prey lengths, that is, 963·41, in components estimating (i) the contribution of between-group (i.e. native vs. stocked: BGC hereafter) variability equal to 22·42, corresponding to a BGC/TNW of 2·33% and (ii) the within-individual contribution WIC = 475·22, corresponding to an overall WIC/TNW ratio of 49·33%. Upcoming implementations will integrate the features currently included in RInSp – bootstrapping, correction for unequal samples – in a generalized n-level hierarchical variance partition procedure, making the package prospectively useful for multi-scale trait analysis in community studies.

Acknowledgements

Funding by FUR 2010–2011 to G. M. is acknowledged; D. I. B. was supported by the David and Lucille Packard Foundation and the Howard Hughes Medical Institute. Thanks to Travis Ingram for providing the code and solution to the weighting scheme option in Roughgarden's indices calculation, and to Richard Svanbäck and an anonymous reviewer for constructive comments that greatly improved the manuscript. This study is dedicated to Lorenzo and Alice Zaccarelli, and to Sofia Mancinelli, thy eternal summer shall not fade.

Ancillary