• Open Access

Cheddar: analysis and visualisation of ecological communities in R


Correspondence author. E-mail: lawrence.hudson08@imperial.ac.uk


  1. There has been a lack of software available to ecologists for the management, visualisation and analysis of ecological community and food web data. Researchers have been forced to implement their own data formats and software, often from scratch, resulting in duplicated effort and bespoke solutions that are difficult to apply to future analyses and comparative studies.
  2. We introduce Cheddar – an R package that provides standard, transparent implementations of a wide range of food web and community-level analyses and plots, focussing on ecological network data that are augmented with estimates of body mass and/or numerical abundance.
  3. The package allows analysis of individual communities, as well as collections of communities, allowing examination of changes in structure through time, across environmental gradients, or due to experimental manipulations. Several commonly analysed food web data sets are included and used in worked examples.
  4. This is the first time these important features have been combined in a single package that helps improve research efficiency and serves as a unified framework for future development.


Community ecology has long suffered from a lack of standardised methods, especially in the study of ecological networks. This has slowed the advancement of the field in general and reduced possibilities for carrying out comparative studies and meta-analyses (Ings et al. 2009). One especially significant bottleneck has been the lack of software available to food web ecologists for managing, visualising and analysing complex empirical data sets. Existing packages provide some analysis and visualisations of food web networks (e.g. Yoon et al. 2004; Kones et al. 2009; Lin et al. 2011; Perdomo, Thompson, Sunnucks 2012), but these offer subsets of the many standard plots and statistics that community ecologists typically use and do not assist in management of empirical data, which are often highly heterogeneous in form and quality.

Ecologists are focussing increasingly on explaining the structure of communities by enriching traditional food web data with additional information, especially in relation to species' body sizes and abundances (Cohen, Jonsson & Carpenter 2003; Ings et al. 2009; Reuman et al. 2009; Woodward et al. 2010). Many community studies collect either the food web, species' body masses or abundance data, and an ever increasing number of studies measures two or three of these data types and/or additional data (Cohen, Jonsson & Carpenter 2003). Various combinations of data allow different properties to be explored (Fig. 1) and different hypotheses to be tested (e.g. Reuman et al. 2009; Woodward et al. 2012). Unfortunately, researchers are forced to invent their own data formats to deal with heterogeneous data types and to use ad hoc methods to find errors such as typographical mistakes and duplicated trophic links, essentially re-inventing the wheel repeatedly and often imperfectly. Published descriptions of methods are not always precise enough to be unambiguously re-implemented, leading to subtly different interpretations. Data sets are frequently passed between researchers and are often modified, meaning the same data set named in different published articles can refer to different data. These factors lead to duplicated effort, bespoke solutions that are difficult to apply to future analyses and results that are hard, if not impossible, to reproduce exactly.

We present Cheddar – a package for R (R Development Core Team 2012) that solves these problems by providing (i) a flexible, well-defined representation of an ecological community together with functions that make data import easy and that detect most common errors; (ii) several high-quality, published data sets; (iii) functions that allow a range of properties to be plotted, making it easy both to ‘eyeball’ data and to produce figures for publication; (iv) functions that compute a range of food web and related community statistics; (v) functions that perform community manipulations such as trophic lumping; and (vi) functions that manage and analyse collections of communities, allowing investigation of changes in community structure through time, across environmental gradients or resulting from experimental manipulation. Cheddar follows an open-source model to ensure transparency of algorithms and data sets and to allow growth in concert with the research field. General purpose graphing and analysis features are complemented by a large number of functions that are focussed on food web data augmented with additional measures of body mass and/or numerical abundance (Fig. 1). We propose Cheddar as a useful unifying foundation for a growing body of multi-trophic community analyses and research.

Data format and quality checks

Cheddar's LoadCommunity and SaveCommunity functions provide a standard data format for community representation and perform import and export of data. A community is represented by three comma-separated value (CSV) files, which are editable using standard software. The properties.csv file contains data applicable to the community as a whole, such as treatment, latitude and longitude. The second file, nodes.csv, contains the list of species (or other groupings) together with any associated data such as body mass, numerical abundance and taxonomic classification. The optional trophic.links.csv file defines the food web as the names of resource–consumer pairs. Properties such as evidence for the presence of each link (e.g. empirically observed or inferred from literature) and link strength (e.g. diet fraction data) can be added to this file. Cheddar applies a range of data quality checks, and most commonly made errors are detected automatically at the import stage. Data can be added to any of these three aspects simply by adding the columns to the relevant file. Any information so added will be available to Cheddar's plotting and analysis functions.

Included data sets

In the interests of reproducible results, it is just as important for data to be public, transparent and version-controlled as it is for software. Cheddar contains several published food web data sets: Tuesday Lake (Carpenter & Kitchell 1996; Cohen, Jonsson & Carpenter 2003; Jonsson, Cohen & Carpenter 2005), Broadstone Stream (Woodward, Speirs, Hildrew 2005), Ythan Estuary (Hall & Raffaelli 1991; Emmerson & Raffaelli 2004; Cohen et al. 2009), Skipwith Pond (Warren 1989), the Benguela coastal ecosystem (Yodzis 1998), control and treatment webs from a long-running stream mesocosm experiment investigating the effects of drought (Ledger et al. 2011, 2012; Woodward et al. 2012) and 10 of 20 stream communities sampled across a wide pH gradient (Layer et al. 2010). Additional data sets can be included and fully attributed in future versions as part of Cheddar's version control repository. Any data revisions will be tracked by the version control. Cheddar is intended primarily as a comprehensive solution for methods, with its functions as a data repository taking a supporting role – a comprehensive version-controlled repository for ecological data sets is a complex problem that we have not undertaken with Cheddar. However, combining data and methods in the same version control repository (https://github.com/quicklizard99/cheddar/) and the ready availability of source code for previous releases in the CRAN source archive (http://cran.r-project.org/src/contrib/Archive/cheddar/) greatly assist in reproducibility of analyses.

Community visualisation, analysis and data manipulation

Cheddar provides an extremely flexible graphing system that is built upon two general purpose functions: PlotNPS (for PlotNode PropertieS), which plots a point for each node (i.e. species or trophic elements), and PlotTLPS (for PlotTrophic-Link PropertieS), which plots a point for each trophic link. Also included are a number of ‘wrapper’ functions that visualise different combinations of body mass, numerical abundance and trophic links (Fig. 1).

The ecological importance of body mass, M, has long been recognised (Elton 1927; Peters 1983), with rank (Fig. 1a) and distribution (Fig. 1b) of nodes frequently examined (Jonsson, Cohen & Carpenter 2005). Numerical abundance, N, of nodes is similarly visualised by distribution and rank plots (MacArthur 1957, 1960; Jonsson, Cohen & Carpenter 2005; Fig. 1c–d). Mass–abundance allometry considers the relationship between N and M at the taxon level (Fig. 1e) and has implications for community- (Reuman et al. 2008) and ecosystem-level (Brown et al. 2004) theories. In pelagic marine habitats, body size can be at least as important as taxonomy in determining trophic interactions (Jennings et al. 2001; Woodward et al. 2010), and mass–abundance allometry for these communities is typically examined using the abundance- or size spectrum (Fig. 1f), which ignores taxonomy and shows the math formula-transformed total N in equally spaced math formula bins, plotted against log-scale bin centres (Kerr & Dickie 2001). Linear regressions fitted to both forms of mass–abundance allometry (Fig. 1e–f) provide simple but powerful descriptions of community patterns (Reuman et al. 2008, Reuman, Cohen & Mulder 2009, Reuman et al. 2009).

Food webs describe the resource–consumer trophic interactions within a community (Elton 1927), often visualised as a predation matrix with columns as consumers and rows as resources (Stouffer, Camacho, Amaral 2006; Petchey et al. 2008; Fig. 1g). Plotting nodes vertically by trophic level shows ‘food chain distance’ from the primary producers in the community and reveals the typical triangular shape of many resource species, fewer intermediate species and very few top-level species (Cohen, Jonsson & Carpenter 2003; Woodward et al. 2012; Fig. 1h).

Body size is an important determinant of trophic interactions and hence of community structure (Cohen et al. 1993, Cohen, Jonsson & Carpenter 2003; Petchey et al. 2008). The broad expectation that species at higher trophic levels are larger (Elton 1927) has been shown to be true for many habitats by examining the relationship between trophic level and body mass (Cohen, Jonsson & Carpenter 2003; Jacob et al. 2011; Fig. 1j). Similarly, species at higher trophic levels are also expected to be rarer (Fig. 1n). Allometric degree distributions (Cohen, Jonsson & Carpenter 2003; Jonsson, Cohen & Carpenter 2005; Jacob et al. 2011) describe how species' trophic vulnerability (number of links to predators, also referred to as ‘out degree’), trophic generality (number of links from resources, also referred to as ‘in degree’) or total trophic links (‘degree’) scale with their log-transformed body masses (Fig. 1k).

Predator–prey body mass ratios are central to many ecological theories (e.g. Yodzis & Innes 1992; Cohen, Jonsson & Carpenter 2003; Brose, Williams & Martinez 2006; Brose 2010) and are typically visualised as log-transformed M of the consumer plotted against log-transformed M of the resource for each trophic link in the food web (Cohen et al. 1993, Cohen, Jonsson & Carpenter 2003; Jonsson, Cohen & Carpenter 2005; Fig. 1l). Similar relationships have been examined for numerical abundance and biomass abundance (Cohen, Jonsson & Carpenter 2003; Jonsson, Cohen & Carpenter 2005; Fig. 1o,r). The classic ‘pyramid of numbers’ shows the total abundance at each trophic level (Elton 1927; Jonsson, Cohen & Carpenter 2005; Fig. 1m, p).

Recent analyses have used food web, M and N data to examine patterns in all the trophic links, three-species chains and complete food chains from several food webs (Cohen et al. 2009; Woodward et al. 2012), revealing emergent size structure at each organisational level (Cohen et al. 2009). For example, Cohen et al. (2009) plotted the angles made by trophic links when visualised on a math formula-vs.-math formula plot, as well as the relationship between angles of ‘lower’ links in tri-trophic food chains and ‘upper’ links in chains (Fig. 1q); tools for graphing and analysis of these and other related approaches are also in Cheddar.

Figure 1.

Different views of the community of Tuesday Lake sampled in 1984 (Carpenter & Kitchell 1996; Cohen, Jonsson & Carpenter 2003; Jonsson, Cohen & Carpenter 2005). Here M represents species' average body masses and N represents species' population densities. Panels a, c, e, h, i, j, k and n show producers by a green circle, invertebrates by a blue square and vertebrates by a purple diamond. Panels h, i, j, k and n show cannibals with a lighter-coloured filled circle. Panels g, l, o and r show the category of the resource species of each trophic link using the same scheme. Functions marked with a star have a mirror image function that plots the x and y axes swapped, so PlotNvM (panel e) has a sister function PlotMvN. All panels can be produced with one or two lines of code using Cheddar.

Cheddar provides food web statistics such as LinkageDensity and DirectedConnectance (Martinez 1992) and ‘node-level’ information such as connectedness (e.g. InDegree, OutDegree and Degree), connectivity status (e.g. IsBasalNode, IsTopLevelNode, IsIntermediateNode and IsIsolatedNode) and trophic species number (TrophicSpecies). Williams & Martinez (2004) formalised and evaluated six different measures for computing trophic level from binary (presence–absence) food web data, many of which are commonly used (e.g. Jonsson, Cohen & Carpenter 2005; Jacob et al. 2011) and all of which are provided by Cheddar. Cheddar-provided and user-defined functions can be used together with the NPS (for Node PropertieS) function, which assembles tables of ‘first-class’ and computed properties (Table 1).

Table 1. Summary statistics for 6 of the 56 nodes in the Tuesday Lake 1984 data (Carpenter & Kitchell 1996; Cohen, Jonsson & Carpenter 2003; Jonsson, Cohen & Carpenter 2005), showing math formula-transformed M (kg), N (individuals mmath formula) and biomass abundance (B, kg mmath formula), degree (Deg), top-level consumer status (Top), trophic species (TS), chain-averaged trophic level (CATL) and prey-averaged trophic level (PATL). The table was assembled using tail(NPS(TL84, c('Log10MNBiomass', Deg ='Degree', Top ='IsTopLevelNode', TS ='TrophicSpecies', CATL ='ChainAveragedTrophicLevel', PATL ='PreyAveragedTrophicLevel'))). R functions that reproduce this table, Table 2 and greatly expanded versions of both are in Appendices S1 and S2
  math formula math formula math formula DegTopTSCATLPATL
Trichocerca cylindrica −9·424·91−4·5110FALSE162·002·00
Tropocyclops prasinus −8·164·69−3·4723FALSE133·333·14
Chaoborus punctipennis −6·524·08−2·4426FALSE204·603·17
Phoxinus eos −3·000·29−2·7010FALSE215·173·53
Phoxinus neogaeus −2·93−0·88−3·8110FALSE215·173·53
Umbra limi −2·89−0·88−3·7713TRUE225·843·80

Cheddar provides common data manipulations such as removing isolated species (RemoveIsolatedNodes), removing cannibalistic trophic links (RemoveCannibalisaticLinks), lumping taxa into trophic species (LumpTrophicSpecies) and reordering species (OrderCommunity). The MinimiseSumDietGaps and MinimiseSumConsumerGaps functions use simulated annealing learning as described by Stouffer, Camacho, Amaral (2006) to investigate food web intervality, which provides insight into how trophic niches are partitioned (Stouffer, Camacho, Amaral 2006; Allesina, Alonso & Pascusal 2008).

Comparisons among communities

Intercommunity comparisons can reveal how intrinsic and extrinsic factors affect community and food web structure. As the catalogue of high-quality food web data sets grows, such approaches are becoming increasingly common (Ings et al. 2009), and several studies have examined how community patterns are modified by a number of factors (O'Gorman & Emmerson 2009; Ledger et al. 2011; Layer et al. 2010). Layer et al. (2010) found that a number of food web properties varied over a wide pH gradient across 20 naturally occurring stream communities, with diversity, linkage density and complexity all increasing with pH; Cheddar's pHWebs data set contains 10 of these communities, and future versions of cheddar may contain all of them. The CollectionCPS function (for CollectionCommunity PropertieS) assembles a table of predictors and responses and makes analysis with the linear modelling tools of R easily possible with minimal set-up effort (e.g. Table 2). Other high-level functions allow sorting, taking subsets, aggregating and graphing of collections of webs, providing a powerful toolkit for intercommunity analyses. For example, CollectionApply applies a transformation function to each community in a collection. This can be used together with the OrderCommunity and MinimiseSumDietGaps functions to investigate relationships between body mass and intervality (Stouffer, Camacho, Amaral 2006); see Appendix S2 for an example.

Table 2. Summary statistics of 10 of the communities sampled from streams over a wide pH gradient by Layer et al. (2010), showing diversity (S), number of trophic links (L), linkage density (L/S), directed connectance (C), the number of nodes in each category (<unnamed>, invertebrate, producer and vertebrate ectotherm) and the slope of an ordinary linear regregression through math formula vs. math formula. The table was assembled using CollectionCPS(OrderCollection(pHWebs, 'pH', decreasing=TRUE), c('pH', S='NumberOfNodes', C='DirectedConnectance', 'L/S' = 'LinkageDensity','#'='NumberOfNodesByClass', Slope='NvMSlope'))
Mill Stream8·487165419·010·22245319−0·92
Bere Stream7·56694314·290·22234246−0·65
Hardknott Gill7·0443868·770·20228131−0·75
Allt a'Mharcaidh6·5403348·350·21225121−0·77
Duddon Pike Beck6·1352868·170·23221111−0·57
Mosedal Beck5·9211085·140·2421081−0·70
Dargall Lane5·821994·710·2221171−0·74
Afon Hafren5·3251355·400·2221481−0·71
Old Lodge5·0231375·960·26210101−0·66

The stable release of Cheddar is available on CRAN. The development version is at http://quicklizard99.github.com/cheddar/. Cheddar contains help pages and vignettes that provide a comprehensive introduction. The first of the vignettes can be accessed by installing the package and then entering vignette('CheddarQuickstart') into R. We envisage Cheddar will aid food web research by providing a user-friendly, standardised, transparent and expandable toolkit that can be used in a wide variety of ecological contexts with many different data types.


We thank Stephen Carpenter, David Raffaelli and Phil Warren for data and Karline Soetart and three anonymous reviewers for helpful comments. Authors were supported by Microsoft Research through its PhD Scholarship Programme, UK Natural Environment research Council (NERC) grants NE/J011193/1, NE/I528069/1, NE/F013124/1, NE/I009280/1, NE/B/S/2002/215, NE/J02256X/1, NE/H020705/1, NE/I010963/1 and NE/I011889/1, Nuffield Foundation grant NAL/01140/G, The John Spedan Lewis Foundation and a Queen Mary University of London postgraduate studentship. LNH wrote the package with contributions from DCR and RE. LNH and DCR wrote the first draft of the manuscript. All authors tested the package, contributed data sets, provided ideas or edited the manuscript.