mangal – making ecological network analysis simple
Abstract
The study of ecological networks is severely limited by 1) the difficulty to access data, 2) the lack of a standardized way to link meta‐data with interactions, and 3) the disparity of formats in which ecological networks themselves are stored and represented. To overcome these limitations, we have designed a data specification for ecological networks. We implemented a database respecting this standard, and released an R package (rmangal) allowing users to programmatically access, curate, and deposit data on ecological interactions. In this article, we show how these tools, in conjunction with other frameworks for the programmatic manipulation of open ecological data, streamlines the analysis process and improves replicability and reproducibility of ecological network studies.
Ecological networks are efficient representations of the complexity of natural communities, and help discover mechanisms contributing to their persistence, stability, resilience, and functioning. Most of the early studies of ecological networks were focused on understanding how the structure of interactions within one location affected the ecological properties of this local community. They revealed the contribution of average network properties, such as the buffering impact of modularity on species loss (Yodzis 1981, Pimm et al. 1991), the increase in robustness to extinctions along with increases in connectance (Dunne et al. 2002), and the fact that organization of interactions maximizes biodiversity (Bastolla et al. 2009). New studies introduced the idea that networks can vary from one locality to another. They can be meaningfully compared, either to understand the importance of environmental gradients on the presence of ecological interactions (Tylianakis et al. 2007), or to understand the mechanisms behind variation itself (Poisot et al. 2012, 2014). Yet, meta‐analyses of numerous ecological networks are still extremely rare, and most of the studies comparing several networks do so within the limit of particular systems (Schleuning et al. 2011, Dalsgaard et al. 2013, Poisot et al. 2013b, Chamberlain et al. 2014, Olito and Fox 2015). The severe shortage of publicly shared data in the field also restricts the scope of large‐scale analyses.
It is possible to predict the structure of ecological networks, either using latent variables (Rohr et al. 2010, Eklöf et al. 2013) or actual trait values (Gravel et al. 2013). The calibration of these approaches require accessible data, not only about the interactions, but about the traits of the species involved. Comparing the efficiency of different methods is also facilitated if there is a homogeneous way of representing ecological interactions, and the associated metadata. In this paper, we 1) establish the need for a data specification serving as a common language among network ecologists, 2) describe this data specification, and 3) describe rmangal, a R package and companion database relying on this data specification. The rmangal package allows one to easily deposit and retrieve data about ecological interactions and networks in a publicly accessible database. We provide use‐cases showing how this new approach makes complex analyses simpler, and allows for the integration of new tools to manipulate biodiversity resources.
Networks need a data specification
Ecological networks are (often) stored as an adjacency matrix (or as the quantitative link matrix), that is a series of 0s and 1s indicating, respectively, the absence or presence of an interaction. This format is extremely convenient (as most network analysis packages, e.g. bipartite, betalink, foodweb, require data to be presented this way), but is extremely inefficient at storing meta‐data. In most cases, an adjacency matrix provides information about the identity of species (in the cases where rows and columns headers are present) and the presence or absence of interactions. If other data about the environment (e.g. where the network was sampled) or the species (e.g. the population size, trait distribution, or other observations) are available, they are often either given in other files or as accompanying text. In both cases, making a programmatic link between interaction data and relevant meta‐data is difficult and, more importantly, error‐prone.
By contrast, a data specification (i.e. a set of precise instructions detailing how each object should be represented) provides a common language for network ecologists to interact, and ensures that, regardless of their source, data can be used in a shared workflow. Most importantly, a data specification describes how data are exchanged. Each group retains the ability to store the data in the format that is most convenient for in‐house use, and only needs to provide export options (e.g. through an API, i.e. a programmatic interface running on a web server, returning data in response to queries in a pre‐determined language) respecting the data specification. This approach ensures that all data can be used in meta‐analyses, and increases the impact of data (Piwowar and Vision 2013). Data archival also offers additional advantages for ecology. The aggregation of local observations can reveal large‐scale phenomena (Reichman et al. 2011), which would be unattainable in the absence of a collaborative effort. Data archival in databases also prevents data rot and data loss (Vines et al. 2014), thus ensuring that data on interaction networks – which are typically hard and costly to produce – continue to be available and usable.
Elements of the data specification
The data specification introduced here (Fig. 1) is built around the idea that (ecological) networks are collections of relationships between ecological objects, and each element has particular meta‐data associated with it. In this section, we detail the way networks are represented in the mangal specification. An interactive webpage with the elements of the data specification can be found online at < http://mangal.io/doc/spec/ >. The data specification is available either at the API root (e.g. < http://mangal.io/api/v1/?format=json >), or can be viewed using the whatIs function from the rmangal package. Rather than giving an exhaustive list of the data specification (which is available online at the aforementioned URL), this section serves as an overview of each element, and how they interact.

An overview of the data specification, and the hierarchy between objects. Every box corresponds to a level of the data specification. Grey boxes are nodes, blue boxes are interactions and networks, and green boxes are metadata. The bold boxes (dataset, network, interaction, taxa) are the minimal elements needed to represent a network.
We propose JSON, a user‐friendly format equivalent to XML, as an efficient way to standardise data representation for two main reasons. First, it has emerged as a de facto standard for web platform serving data, and accepting data from users. Second, it allows strict validation of the data: a JSON file can be matched against a scheme, and one can verify that it is correctly formatted (this includes the possibility that not all fields are filled, as will depend on available data). Finally, JSON objects are easily and cheaply (memory‐wise) parsed in the most commonly‐used programming languages, notably R (equivalent to list) and python (equivalent to dict). For most users, the format in which data are transmitted is unimportant, as the interaction happens within R – as such, knowing how JSON objects are organized is only useful for those who want to interact with the API directly. As such, the rmangal package takes care of converting the data into the correct JSON format to upload them in the database.
Functions in the rmangal package are names after elements of the data specification, in the following way: verb + Element. verb can be one of list, get, or patch; for example, the function to get a particular network is getNetwork. The function to modify (patch) a taxon is patchTaxa. All of these functions return a list object, which means that chaining them together using, e.g. the plyr package, is time‐efficient. There are examples of this in the use‐cases.
Node information
Taxa
Taxa are a taxonomic entity of any level, identified by their name, vernacular name, and their identifiers in a variety of taxonomic services. Associating the identifiers of each taxa allows using the new generation of open data tools, such as taxize (Chamberlain and Szöcs 2013), in addition to protecting the database against taxonomic revisions. The data specification currently has fields for ncbi (National Center for Biotechnology Information), gbif (Global Biodiversity Information Facility), tsn (Taxonomic Serial Number, used by the Integrated Taxonomic Information System), eol (Encyclopedia of Life) and bold (Barcode of Life) identifiers. We also provide the taxonomic status, i.e. whether the taxon is a true taxonomic entity, a ‘trophic species’, or a morphospecies. Taxonomic identifiers can either be added by the contributors, or will be automatically retrieved during the automated curation routine.
Item
An item is any measured instance of a taxon. Items have a level argument, which can be either individual or population; this allows representing both individual‐level networks (i.e. there are as many items of a given taxa as there were individuals of this taxon sampled), and population‐level networks. When item represents a population, it is possible to give a measure of the size of this population. The notion of item is particularly useful for time‐ replicated designs: each observation of a population at a time‐point is an item with associated trait values, and possibly population size.
Network information
All objects described in this sub‐section can have a spatial position, information on the date of sampling, and references to both papers and datasets.
Interaction
An interaction links two taxa objects (but can also link pairs of items). The most important attributes of interactions are the type of interaction (of which we provide a list of possible values), and its obs_type, i.e. how it was observed. This field helps differentiate direct observations, text mining, and inference. Note that the obs_type field can also take confirmed absence as a value; this is useful for, e.g. ‘cafeteria’ experiments in which there is high confidence that the interaction did not happen.
Network
A network is a series of interaction objects, along with 1) information on its spatial position (provided at the latitude and longitude), 2) the date of sampling, and 3) references to measures of environmental conditions.
Dataset
A dataset is a collection of one or several network(s). Datasets also have a field for data and papers, both of which are references to bibliographic or web resources that describe, respectively, the source of the data and the papers in which these data have been studied. Datasets or networks are the preferred entry point into the resources, although in some cases it can be meaningful to get a list of interactions only.
Meta‐data
Trait value
Objects of type item can have associated trait values. These consist in the description of the trait being measured, the value, and the units in which the measure was taken. As the measurment was taken at a different time and/or location that the interaction was, they have fields for time, latitude and longitude, and references to original publication and original datasets.
Environmental condition
Environmental conditions are associated with datasets, networks, and interactions objects, to allow for both macro and micro environmental conditions. These are defined by the environmental property measured, its value, and the units. As traits, they have fields for time, latitude and longitude, and references to original publication and original datasets.
References
References are associated with datasets. They accommodate the DOI, JSON or PubMed identifiers, or a URL. When possible, the DOI is preferred as it offers more potential to interact with other online tools, such as the CrossRef API.
Use cases
In this section, we present use‐cases using the rmangal package for R, to interact with a database implementing this data specification, and to serve data through an API (< http://mangal.io/api/v1/ >). It is possible for users to deposit data into this database through the R package. Note that data are made available under a CC‐0 Waiver (Poisot et al. 2013a). Detailed information about how to upload data are given in the vignettes and manual of the rmangal package. In addition, the rmangal package comes with vignettes explaining how users can upload their data into the database through R.


Create taxa and add an interaction





Link–species relationships


Relationship between the number of species and number of interactions in the anemonefish‐fish dataset. Constant connectance refers to the hypothesis that there is a quadratic relationship between these two quantities.
Network beta‐diversity



Relationships between the geographic distance between two sites, and the species dissimilarity, network dissimilarity with all species, and network dissimilarity with only shared species.
Spatial visualization of networks



Spatial plot of a network, using the maps and rmangal packages. The circles in the inset map show the location of the sites. Each dot in the main map represents a species, with symbiotic mutualisms drawn between them. The land is in grey.
Conclusions
The mangal data format will allow researchers to put together datasets with species interactions and rich meta‐data that are needed to address emerging questions about the structure of ecological networks. We deployed an online database with an associated API relying on this data specification. Finally, we introduced rmangal, an R package designed to interact with APIs using the mangal format. We expect that the data specification will evolve based on the needs and feedback of the community. At the moment, users are welcome to propose such changes on the project issue page: < https://github.com/mangal‐wg/mangal‐schemes/issues >. A python wrapper for the API is also available at < http://github.com/mangal‐wg/pymangal/ >. Additionally, there are plans to integrate this database with GLOBI, so that data can be accessed from multiple sources (Poelen et al. 2014).
To cite the mangal.io, or acknowledge its use, please cite this software note. To cite the rmangal R package, or acknowledge its use, please cite:
Poisot, T. et al. rmangal (ver. 1.0.1). – doi: 105281/zenodo.16998
Acknowledgements
This paper was developed during a workshop hosted at the Santa Fe Institute. TP, DBS, and DG acknowledge funding from the Canadian Inst. of Ecology and Evolution. We thank Scott Chamberlain and one anonymous reviewer for comments on the manuscript. TP is funded by a start‐up grant from the Univ. de Montréal. We thank the rOpenSci team and developers for inspiration.
References
Citing Literature
Number of times cited according to CrossRef: 17
- Arthur Andrew Meahan MacDonald, Francis Banville, Timothée Poisot, Revisiting the Links-Species Scaling Relationship in Food Webs, Patterns, 10.1016/j.patter.2020.100079, (100079), (2020).
- Andreas Makiola, Zacchaeus G. Compson, Donald J. Baird, Matthew A. Barnes, Sam P. Boerlijst, Agnès Bouchez, Georgina Brennan, Alex Bush, Elsa Canard, Tristan Cordier, Simon Creer, R. Allen Curry, Patrice David, Alex J. Dumbrell, Dominique Gravel, Mehrdad Hajibabaei, Brian Hayden, Berry van der Hoorn, Philippe Jarne, J. Iwan Jones, Battle Karimi, Francois Keck, Martyn Kelly, Ineke E. Knot, Louie Krol, Francois Massol, Wendy A. Monk, John Murphy, Jan Pawlowski, Timothée Poisot, Teresita M. Porter, Kate C. Randall, Emma Ransome, Virginie Ravigné, Alan Raybould, Stephane Robin, Maarten Schrama, Bertrand Schatz, Alireza Tamaddoni-Nezhad, Krijn B. Trimbos, Corinne Vacher, Valentin Vasselon, Susie Wood, Guy Woodward, David A. Bohan, Key Questions for Next-Generation Biomonitoring, Frontiers in Environmental Science, 10.3389/fenvs.2019.00197, 7, (2020).
- François Massol, Emilie Macke, Martijn Callens, Ellen Decaestecker, A methodological framework to analyse determinants of host–microbiota networks, with an application to the relationships between Daphnia magna's gut microbiota and bacterioplankton, Journal of Animal Ecology, 10.1111/1365-2656.13297, 0, 0, (2020).
- Gaëlle Legras, Nicolas Loiseau, Jean-Claude Gaertner, Jean-Christophe Poggiale, Dino Ienco, Nabila Mazouni, Bastien Mérigot, Assessment of congruence between co-occurrence and functional networks: A new framework for revealing community assembly rules, Scientific Reports, 10.1038/s41598-019-56515-7, 9, 1, (2019).
- Ignasi Bartomeus, Lynn V Dicks, The need for coordinated transdisciplinary research infrastructures for pollinator conservation and crop pollination resilience, Environmental Research Letters, 10.1088/1748-9326/ab0cb5, 14, 4, (045017), (2019).
- Samuel Hamard, Bjorn J. M. Robroek, Pierre-Marie Allard, Constant Signarbieux, Shuaizhen Zhou, Tongchai Saesong, Flore de Baaker, Alexandre Buttler, Geneviève Chiapusio, Jean-Luc Wolfender, Luca Bragazza, Vincent E. J. Jassey, Effects of Sphagnum Leachate on Competitive Sphagnum Microbiome Depend on Species and Time, Frontiers in Microbiology, 10.3389/fmicb.2019.02042, 10, (2019).
- Zacchaeus G. Compson, Wendy A. Monk, Brian Hayden, Alex Bush, Zoë O'Malley, Mehrdad Hajibabaei, Teresita M. Porter, Michael T. G. Wright, Christopher J. O. Baker, Mohammad Sadnan Al Manir, R. Allen Curry, Donald J. Baird, Network-Based Biomonitoring: Exploring Freshwater Food Webs With Stable Isotope Analysis and DNA Metabarcoding, Frontiers in Ecology and Evolution, 10.3389/fevo.2019.00395, 7, (2019).
- Steve Vissault, Dominique Gravel, Timothée Poisot, Mangal: An open infrastructure for ecological interactions, Biodiversity Information Science and Standards, 10.3897/biss.3.37037, 3, (2019).
- R. Early, S. A. Keith, Geographically variable biotic interactions and implications for species ranges, Global Ecology and Biogeography, 10.1111/geb.12861, 28, 1, (42-53), (2018).
- Dominique Gravel, Benjamin Baiser, Jennifer A. Dunne, Jens‐Peter Kopelke, Neo D. Martinez, Tommi Nyman, Timothée Poisot, Daniel B. Stouffer, Jason M. Tylianakis, Spencer A. Wood, Tomas Roslin, Bringing Elton and Grinnell together: a quantitative framework to represent the biogeography of ecological interaction networks, Ecography, 10.1111/ecog.04006, 42, 3, (401-415), (2018).
- Eva Delmas, Mathilde Besson, Marie‐Hélène Brice, Laura A. Burkle, Giulio V. Dalla Riva, Marie‐Josée Fortin, Dominique Gravel, Paulo R. Guimarães, David H. Hembry, Erica A. Newman, Jens M. Olesen, Mathias M. Pires, Justin D. Yeakel, Timothée Poisot, Analysing ecological networks of species interactions, Biological Reviews, 10.1111/brv.12433, 94, 1, (16-36), (2018).
- Stéphane A.P. Derocles, David A. Bohan, Alex J. Dumbrell, James J.N. Kitson, François Massol, Charlie Pauvert, Manuel Plantegenest, Corinne Vacher, Darren M. Evans, Biomonitoring for the 21st Century: Integrating Next-Generation Sequencing Into Ecological Network Analysis, Next Generation Biomonitoring: Part 1, 10.1016/bs.aecr.2017.12.001, (1-62), (2018).
- Reuber Antoniazzi, Wesley Dáttilo, Victor Rico-Gray, A Useful Guide of Main Indices and Software Used for Ecological Networks Studies, Ecological Networks in the Tropics, 10.1007/978-3-319-68228-0, (185-196), (2018).
- Loïc Pellissier, Camille Albouy, Jordi Bascompte, Nina Farwig, Catherine Graham, Michel Loreau, Maria Alejandra Maglianesi, Carlos J. Melián, Camille Pitteloud, Tomas Roslin, Rudolf Rohr, Serguei Saavedra, Wilfried Thuiller, Guy Woodward, Niklaus E. Zimmermann, Dominique Gravel, Comparing species interaction networks along environmental gradients, Biological Reviews, 10.1111/brv.12366, 93, 2, (785-800), (2017).
- Timothée Poisot, Cynthia Guéveneux‐Julien, Marie‐Josée Fortin, Dominique Gravel, Pierre Legendre, Petr Keil, Hosts, parasites and their interactions respond to different climatic variables, Global Ecology and Biogeography, 10.1111/geb.12602, 26, 8, (942-951), (2017).
- Michael Krabbe Borregaard, Edmund M. Hart, Towards a more reproducible ecology, Ecography, 10.1111/ecog.02493, 39, 4, (349-353), (2016).
- Dominique Gravel, Camille Albouy, Wilfried Thuiller, The meaning of functional trait composition of food webs for ecosystem functioning, Philosophical Transactions of the Royal Society B: Biological Sciences, 10.1098/rstb.2015.0268, 371, 1694, (20150268), (2016).




