helminthR: an R interface to the London Natural History Museum's Host–Parasite Database
Abstract
The understanding of the diversity and distribution of helminth parasites is currently constrained by the limited number of host–parasite interaction databases, and the difficulty in accessing existing data. The London Natural History Museum's Host–Parasite Database represents one such underutilized database, containing over a quarter million helminth parasite occurrence records, accessible through a web interface. To enable users to programmatically search and manipulate data from this database, I developed an R package called helminthR. Here, I introduce the core functions of the package, and detail how helminthR can be used to obtain host–parasite interaction records, citations for interactions, and host taxonomic data.
Helminth parasites are one of the most common infectious agents to humans (Stoll 1947, De Silva et al. 2003, Hotez et al. 2008), wild animals (Poulin and Valtonen 2002, Jolles et al. 2008), and livestock (Over et al. 1992, Morgan et al. 2013). Limitations in data availability have hampered our understanding of the spatial distribution of helminth parasites, and associations between helminth parasites and both human and wildlife hosts. Further, there is a need for basic scientific research into the community ecology and macroecology of host–helminth associations (Rohde 2002). Such efforts could provide tests of principles from community ecology, and macroecological patterns in parasites.
To address these research concerns, data on host–helminth associations across broad spatial scales are needed. Efforts to document known host–parasite associations in large databases are fairly recent, and represent valuable resources for researchers (Gibson et al. 2005, Nunn and Altizer 2005, Strona et al. 2013). However, a portion of these databases are not openly accessible, requiring users to contact database administrators or to copy data from web interfaces. These methods of accessing databases may lead to transcription errors, duplicated efforts among labs, and create static copies of the data that are difficult to update if and when new data are added. Allowing host–parasite databases to be open and easy to access may promote open and reproducible science, and would potentially promote the discovery of ‘general laws’ in parasite ecology (Poulin 2007).
To this end, I have developed an R package capable of extracting information from a large global database of host–helminth parasite occurrence records maintained by the London Natural History Museum (NHM; Gibson et al. 2005). This curated database includes more than 250 000 host–helminth records from over 28 000 published peer‐reviewed articles. However, the web interface of the database makes data analysis difficult, which subsequently limits the use of this data resource by researchers (but see Strona and Fattorini (2014) and Wells et al. (2015)). The goal of the helminthR package is to make all the data contained in the London Natural History Museum's database accessible from R, a commonly used open source statistical programming environment (R Core Team).
Core package functionality

Querying the database
Host–parasite records in the NHM database contain information on host and parasite species, one or more citations for the host–parasite association, and the location of the interaction georeferenced to the country, state (for the United States), or water body (e.g. Lake Erie) level. Queries can be made to find all interactions of a known host species (findHost), all interactions of a known parasite species (findParasite), or all interactions at a specific geographic location (findLocation). Links to citations for a given helminth record can be obtained from any of the functions listed above by setting the citation argument to TRUE.




Visualizing host–parasite networks
The above code demonstrates the functionality of the helminthR package for querying host–parasite interactions by host and parasite genus and/or species, and also for locating all host–parasite interactions in a given country or locality. Using the findLocation function, I queried the database for all host–parasite interactions occurring within Lake Erie, one of the US Great Lakes, and visualized the resulting host–parasite interaction network (Fig. 1) using the igraph R package (Csardi and Nepusz 2006). Detailed code to create this type of visualization is provided in the Supplementary material Appendix 1.

The host–parasite association network for Lake Erie, one of the Great Lakes located in the northern United States. Grey lines between boxes represent interactions between hosts (larger blue dots) and helminth parasites (smaller black dots).
Data limitations
The data contained in the London Natural History Museum's Host–Parasite Database represent a valuable resource, but are not without limitation. First, the data are from studies published anytime after 1922, and the data owners themselves accept no responsibility for data accuracy. Second, the data are only georeferenced to the country level in most cases, which limits their application. However, citations are given for each host–parasite association, and an attempt has been made to obtain latitude and longitude values for the centroids of countries (using the command data(locations)). While this may be time consuming, the examination of original references would help assure data quality, and provide more fine georeferencing. Nevertheless, the data can still be used to address many macroecological patterns in their current form. For example, data on aquatic and marine parasites are georeferenced to coastal areas (e.g. ‘Coast of New Guinea’) or larger bodies of water (e.g. ‘Aral sea’), providing a way to apply macroecological theory to largely unexplored questions related to the diversity and distribution of marine parasites (Rohde 2002, 2010).
Conclusions
In this paper I have shown how the R package helminthR permits the programmatic access of the Natural History Museum Host–Parasite Database, making it easy to generate host–parasite networks at different geographical scales spanning from local to global. This database represents one of the most complete aquatic host–parasite databases (but see Strona et al. 2013), providing data on parasite occurrences for both terrestrial and aquatic hosts. With any luck, helminthR will promote the application of concepts from community ecology and macroecology to parasite communities at a broader spatial scale. This project is hosted on Github, and uses TravisCI for continuous integration of the package on different R versions. Issues or improvements can be suggested at this link (< https://github.com/ropensci/helminthR/issues >).
To cite helminthR or acknowledge its use, cite this Software note as follows, substituting the version of the application that you used for ‘version 0’:
Dallas, T. 2016. helminthR: an R interface to the London Natural History Museum's Host–Parasite Database. – Ecography 39: 391–393 (ver. 0).
Acknowledgements
I thank the London Natural History Museum, and specifically D. A. Baylis, the original curator, and the current curation team (D. Gibson, R. Bray, and E. Harris). Colin Carlson, Kevin Burgio, Alexa McKay, Maxwell Joseph, and Giovanni Strona provided thoughtful comments on earlier drafts. I thank my advisor, John Drake, and the developers at rOpenSci for their guidance, support, and general views on open science.
References
Supplementary material (Appendix ECOG‐02131 at < www.ecography.org/appendix/ecog‐02131 >). Appendix 1.
Citing Literature
Number of times cited according to CrossRef: 17
- Colin J. Carlson, Skylar Hopkins, Kayce C. Bell, Jorge Doña, Stephanie S. Godfrey, Mackenzie L. Kwak, Kevin D. Lafferty, Melinda L. Moir, Kelly A. Speer, Giovanni Strona, Mark Torchin, Chelsea L. Wood, A global parasite conservation plan, Biological Conservation, 10.1016/j.biocon.2020.108596, (108596), (2020).
- Rosana Wiscovitch-Russo, Jessica Rivera-Perez, Yvonne M. Narganes-Storde, Erileen García-Roldán, Lucy Bunkley-Williams, Raul Cano, Gary A. Toranzos, Pre-Columbian zoonotic enteric parasites: An insight into Puerto Rican indigenous culture diets and life styles, PLOS ONE, 10.1371/journal.pone.0227810, 15, 1, (e0227810), (2020).
- Sonja Leidenberger, Sven Boström, Matthew Wayland, Host records and geographical distribution of Corynosoma magdaleni, C. semerme and C. strumosum (Acanthocephala: Polymorphidae), Biodiversity Data Journal, 10.3897/BDJ.8.e50500, 8, (2020).
- Paula Pappalardo, Ignacio Morales‐Castilla, Andrew W. Park, Shan Huang, John P. Schmidt, Patrick R. Stephens, Comparing methods for mapping global parasite diversity, Global Ecology and Biogeography, 10.1111/geb.13008, 29, 1, (182-193), (2019).
- Tad Dallas, Alyssa‐Lois M. Gehman, A. Alonso Aguirre, Sarah A. Budischak, John M. Drake, Maxwell J. Farrell, Ria Ghai, Shan Huang, Ignacio Morales‐Castilla, Contrasting latitudinal gradients of body size in helminth parasites and their hosts, Global Ecology and Biogeography, 10.1111/geb.12894, 28, 6, (804-813), (2019).
- Robert Poulin, Eleanor Hay, Fátima Jorge, Taxonomic and geographic bias in the genetic study of helminth parasites, International Journal for Parasitology, 10.1016/j.ijpara.2018.12.005, (2019).
- Colin J. Carlson, Casey M. Zipfel, Romain Garnier, Shweta Bansal, Global estimates of mammalian viral diversity accounting for host sharing, Nature Ecology & Evolution, 10.1038/s41559-019-0910-6, (2019).
- I. Bartomeus, J. R. Stavert, D. Ward, O. Aguado, Historical collections as a tool for assessing the global pollination crisis, Philosophical Transactions of the Royal Society B: Biological Sciences, 10.1098/rstb.2017.0389, 374, 1763, (20170389), (2018).
- Tad A. Dallas, A. Alonso Aguirre, Sarah Budischak, Colin Carlson, Vanessa Ezenwa, Barbara Han, Shan Huang, Patrick R. Stephens, Gauging support for macroecological patterns in helminth parasites, Global Ecology and Biogeography, 10.1111/geb.12819, 27, 12, (1437-1447), (2018).
- Tad Dallas, Timothée Poisot, Compositional turnover in host and parasite communities does not change network structure, Ecography, 10.1111/ecog.03514, 41, 9, (1534-1542), (2018).
- Sushant K. Singh, Evaluating two freely available geocoding tools for geographical inconsistencies and geocoding errors, Open Geospatial Data, Software and Standards, 10.1186/s40965-017-0026-3, 2, 1, (2017).
- Christian Mulder, Pathogenic helminths in the past: Much ado about nothing, F1000Research, 10.12688/f1000research.11752.1, 6, (852), (2017).
- Christian Mulder, Pathogenic helminths in the past: Much ado about nothing, F1000Research, 10.12688/f1000research.11752.3, 6, (852), (2017).
- Christian Mulder, Pathogenic helminths in the past: Much ado about nothing, F1000Research, 10.12688/f1000research.11752.2, 6, (852), (2017).
- TAD DALLAS, ANDREW W. PARK, JOHN M. DRAKE, Predictability of helminth parasite host range using information on geography, host traits and parasite community structure, Parasitology, 10.1017/S0031182016001608, 144, 02, (200-205), (2016).
- Patrick R. Stephens, Sonia Altizer, Katherine F. Smith, A. Alonso Aguirre, James H. Brown, Sarah A. Budischak, James E. Byers, Tad A. Dallas, T. Jonathan Davies, John M. Drake, Vanessa O. Ezenwa, Maxwell J. Farrell, John L. Gittleman, Barbara A. Han, Shan Huang, Rebecca A. Hutchinson, Pieter Johnson, Charles L. Nunn, David Onstad, Andrew Park, Gonzalo M. Vazquez‐Prokopec, John P. Schmidt, Robert Poulin, The macroecology of infectious diseases: a new perspective on global‐scale drivers of pathogen distributions and impacts, Ecology Letters, 10.1111/ele.12644, 19, 9, (1159-1171), (2016).
- Michael Krabbe Borregaard, Edmund M. Hart, Towards a more reproducible ecology, Ecography, 10.1111/ecog.02493, 39, 4, (349-353), (2016).




