sFDvent: A global trait database for deep‐sea hydrothermal‐ vent fauna

Motivation: Traits are increasingly being used to quantify global biodiversity patterns, with trait databases growing in size and number, across diverse taxa. Despite grow‐ ing interest in a trait‐based approach to the biodiversity of the deep sea, where the impacts of human activities (including seabed mining) accelerate, there is no single re‐ pository for species traits for deep‐sea chemosynthesis‐based ecosystems, including hydrothermal vents. Using an international, collaborative approach, we have compiled the first global‐scale trait database for deep‐sea hydrothermal‐vent fauna – sFD‐ vent ( s Div‐funded trait database for the F unctional D iversity of vent s). We formed a funded working group to select traits appropriate to: (a) capture the performance of vent species and their influence on ecosystem processes, and (b) compare trait‐based diversity in different ecosystems. Forty contributors, representing expertise across most known hydrothermal‐vent systems and taxa, scored species traits using online collaborative tools and shared workspaces. Here, we characterise the sFDvent da‐ tabase, describe our approach, and evaluate its scope. Finally, we compare the sFD‐ vent database to similar databases from shallow‐marine and terrestrial ecosystems to highlight how the sFDvent database can inform cross‐ecosystem comparisons. We also make the sFDvent database publicly available online by assigning a persistent, unique DOI.


| BACKG ROU N D
Traits provide a "common currency" that can be used across taxa and biogeographic regions to analyse global-scale biodiversity patterns and to evaluate links between species and ecosystem processes (Stuart-Smith et al., 2013;Violle, Reich, Pacala, Enquist, & Kattge, 2014). Taxonomic and phylogenetic information underpins traditional diversity metrics, such as species richness and phylogenetic diversity, whereas traits enable us to compare fish, mammal, bird and other biodiversity, using a language common across phyla.
Given increasing application of trait-based approaches in biodiversity research (Petchey & Gaston, 2006) Some of the first and, now, largest trait databases focus on plants, where strong links exist between leaf traits (e.g., area, angle), plant growth, and primary production via photosynthesis (Kattge et al., 2011;Kühn, Durka, & Klotz, 2004). Similar relationships between organisms, traits and energy sources were relatively recently discovered in deep-sea hydrothermal-vent fauna, when life was first discovered in deep-sea vent environments 40 years ago [Corliss et al., 1979; photosynthesis was first discovered 200 years before this (Ingen-Housz, 1779)]. Instead of exploiting photosynthetic pathways, vent animals are strongly dependent on energy from reduced compounds in hydrothermal fluid through chemosynthetic microorganisms (Jannasch, 1985). Deep-sea hydrothermal vents therefore offer a compelling system for applying trait-based approaches (e.g., see Chapman, Tunnicliffe, & Bates, 2018). Moreover, the distribution of hydrothermal-vent communities has been shaped through geological and evolutionary time by the movement of tectonic plate boundaries (Ramirez-Llodra, Shank, & German, 2007;Tunnicliffe, 1988). Vent fauna therefore group into distinct biogeographic provinces (Bachraty, Legendre, & Desbruyères, 2009;Moalic et al., 2012;Rogers et al., 2012), which offer a pertinent framework upon which to compare taxon-based biodiversity patterns to those derived from biological trait data.
Trait-oriented analyses of global-scale biodiversity patterns can also inform conservation and management plans (Mouillot, Graham, Villeger, Mason, & Bellwood, 2013;Stuart-Smith et al., 2015). At vents, this is increasingly important, as commercial-scale miningthe first large-scale, direct human impact on these remote ecosystems -will begin before 2020 (Van Dover et al., 2017. Despite the potential for a trait-based approach to progress ecological understanding and to inform deep-sea mining policies and strategies for vent conservation, it was not possible to pursue this approach on large scales before now, due to a lack of suitable trait data for vent species. Here, we describe, and make publicly available, a global-scale trait database for deep-sea hydrothermal-vent species -sFDvent (sDivfunded trait database for the Functional Diversity of vents). We: (a) characterize the database; (b) describe the international, collaborative compilation process, and highlight the importance of a working group and web-based document-sharing tools in our workflow; and (c) provide summary statistics and usage guidelines for the recommended first version of the database. Through sFDvent, we aim: to promote the use of a trait-based approach in conjunction with taxonomic and phylogenetic methods when analysing deep-sea biodiversity patterns; to encourage international collaboration and knowledge sharing in the deep-sea chemosynthesis-based-ecosystem research community; and to facilitate macroecological analyses including vent fauna.

| An international, collaborative approach to trait data collection
A working-group meeting at the German Centre for Integrative Biodiversity Research (iDiv) facilitated the design of the sFDvent database populated by an international group of expert collaborators (detailed in Supporting Information Figure S1 A.1 and Appendix S3).
We selected traits using a three-step process: (a) creating a "wishlist" of traits that could inform understanding of the performance of a species in its ecosystem, as well as its influence on ecosystem function ( Figure 1); (b) reducing this trait list to those that could be scored for the majority of vent species across the globe; and (c) checking the traits selected in step (b) against similar traits in established trait databases (e.g., Faulwetter et al., 2017;Madin et al., 2016;Stuart-Smith et al., 2013) to ensure cross-ecosystem compatibility in terminology and definitions.
The working-group meeting was also a platform for data-collection design. We used data compendia such as the Ocean Biogeographic Information System (OBIS, 2017), the World Register of Marine Species (WoRMS) (Horton et al., 2017), ChEssBase (Baker, Ramirez-Llodra, & Perry, 2010), and Desbruyères, Segonzac, and Bright (2006) to populate species trait scores as a starting point for further contributions from the wider deep-sea research community. Data collection was carried out using the Google Sheets platform, given its in-built capacity for version control and collaboration on shared documents stored online. Each contributor initially received a personal data collection sheet, so entries could be tracked and credited appropriately. These sheets were designed to be as user-friendly as possible while also expediting processing. For example, fixed, dropdown scoring options were provided: (a) for ease of entry for contributors, and (b) to ensure inconsistencies in spelling, grammar, and other symbols did not affect compilation or processing for database end-users. A unique contributor ID (email) column was provided, to ensure each contribution could be tracked and credited after compilation and processing. Example data sheets were tested before distribution to collaborators.
The sFDvent project aimed to engage as many members of the deep-sea research community as possible. Thus, several calls for contributors were made following the working-group meeting, including direct emails, mailing lists (INDEEP, 2018), the Deep-Sea Life newsletter (Baker, Pattenden, & Ramirez-Llodra, 2017) and a poster presentation at an international conference (Chapman et al., 2017).
Forty contributors from 29 institutions in 13 countries contributed expert knowledge to the database.

| Data compilation, processing, quality control and analysis
Quality assurance measures were implemented to minimize errors in the database, including: an online video tutorial (Supporting Information Video S4.1, Appendix S4) demonstrating how to input data; a glossary (Supporting Information Table S4.1, Appendix S4), to ensure all contributors had a good understanding of each of the traits and scoring options (modalities); a certainty score column, per trait, ranging from 0 (used when unknown, to show a cell was empty due to lack of knowledge) to 3 (high certainty); and a reference column per trait (permitting "expert opinion" in place of a literary source where appropriate).
Traits scored using expert opinion are often considered to be lower in certainty and/or quality than those scored using published sources. We included traits scored based on expert opinion in sFDvent because of the value of undocumented expert knowledge of deep-sea species and habitats. The current state of knowledge is not always captured in publications or cruise reports for vent species, as remotely operated vehicles can be used to make observations for many hours that do not form part of a formal study. During these hours, scientists gain insights into the behaviour, feeding ecology, size, mobility, and other traits of deep-sea fauna, which would not be captured if sFDvent required all trait scores to be supported in published resources. The decision to include expert-contributed scores in sFDvent makes the certainty data provided with the database particularly useful, as it acts as an indicator of the confidence an expert (or group of experts) has in a given score (e.g., according to the number of observations or laboratory measurements). Traits scored using available literature were also peer-reviewed by experts as part of the database review process. sFDvent contributions were compiled and processed according to strict, documented criteria, which are described in detail in Supporting Information Appendix S4 and files referenced therein.
A summary of the traits, modalities (or scoring options), and associated rationale for raw and recommended data files is provided in Table 1. Finally, summary statistics were computed and a coverage map created ( Figure 2) using the recommended dataset (Supporting Information Table S4.2) to facilitate gap analysis and comparison with other well-known trait databases. sFDvent will be updated in future according to the processes outlined in Supporting Information Appendix S5 and Figure S1 A.2.

| Data description
The clean, "ready-to-use" sFDvent trait dataset (Supporting Information Table S4.2) includes traits scored with the most coverage and certainty, comprising 646 records across 13 traits with 55 modalities ( Table 1). Six of these traits are ordinal, three are binary, and four are qualitative, categorical traits (Table 1). The structure of the sFDvent database is outlined in Supporting Information Figure   S1 A.3. The traits in sFDvent were scored at species-level for adult F I G U R E 1 Deep-sea hydrothermal-vent species traits included in the sFDvent database, adapted from the Litchman, Ohman, and Kiørboe (2008) framework (see also Brun et al., 2017). Here, ecological functions and processes potentially influenced by a trait are shown on the x axis, and trait categories are given on the y axis (see Supporting Information Table S4.1 for a glossary of trait definitions) TA B L E 1 Species traits included in the sFDvent database, with further detail on category, type, and modalities. The "Rationale" column is provided to outline the reasons for including each trait in the database (e.g., why it might be ecologically important for the performance of a vent species and/or its influence on ecosystem processes). The glossary in Supporting Information The mobility of a species affects access to food, vent fluid (and the microbes within it), and also its ability to escape predation and/or relocate if, for example, vent fluid supplies shut down or competition becomes too strong.

Geographic Distribution
Depth Range (m) Maximum and minimum depth ranges, from a choice of 11 (from 0 m to > 5,000 m in 500 m increments)

Ordinal
Depth range captures information on relative geographic range size and also facilitates the assessment of trait-environment relationships in the vertical dimension of space. Thus, this trait can be included with the others, or used as an environmental variable, depending on the research question.

Categorical, ordinal
As highlighted in the category, this trait captures information on specialist/generalist adaptations that a species may have to thrive in given environments and is therefore also an important indicator of vulnerability to disturbance or environmental change. For instance, a species dependent on vent environments may be more prone to extinction given deep-sea mining impacts or the shutdown of vent fluid supply than a species that can also live in other chemosynthesis-based ecosystems.
Life History Estimated Maximum Body Size (mm) 0.01, 0.1, 1, 10, 100, 1,000 Ordinal Body size is known to influence the contribution of a species to ecosystem functioning, as well as its own fitness within a system. This trait captures information on reproduction, life history, fitness, and resilience to change, as well as its energy demand.

Habitat Use Zonation from a Vent
High, Medium, Low (Periphery)

Categorical, ordinal
This trait is specific to vent species, but could be adapted for other environments (e.g., to capture the "halo" zonation at seeps and wood falls). It captures the dependence of a species on vent fluid and the microbes it contains, as well as the thermal tolerance of a species (which can be a physiological indicator and thus related to fitness and energy demand).

Substratum
Hard, Soft Binary This trait captures species-association information, assuming substratum preference can be indicative of shared niche space. The preferred substratum of a species may also be an indicator of resilience, as hard and soft substrata may be affected by different impact types and intensities during deep-sea mining, for example. This trait also facilitates prediction using trait information, as hard and soft substrata are often mapped during geological and geophysical surveys.

Habitat Complexity
Does not add, Mat forming (< 10 cm), Bed forming (> 10 cm), Dense bush forming, Open bush forming, Burrow forming Categorical This trait is a shape indicator, providing insight into the structures and habitat complexity added by a species, and, thus, whether a species might be considered an ecosystem engineer or a foundation species. In adding habitat complexity, a vent species can alter fluid dynamics and access to nutritional resources and therefore influences ecosystem function, energy available to other species, and its own fitness.
How often found in groups or clusters?

Categorical, ordinal
Gregariousness captures information on the potential of a species to influence other processes, as it might be assumed that gregarious species limit space available to other species and are likely to be more common than solitary species. Conversely, gregarious species may depend on others for nutritional and/or reproductive purposes and thus be more vulnerable than species that can thrive alone if population sizes are reduced by disturbance or environmental change.
(Continues) fauna, rather than individual-level or for different life stages, given the variability in effort associated with measurements, observations, and descriptions of vent species (Tunnicliffe, 1990). In total, 646 taxa from 345 genera, 181 families, and 12 phyla have trait data with associated, expert-provided location information (Table 2, Supporting Information Figure S1 A.4). Arthropoda is the best-scored phylum, with 216 records, whilst Acanthocephala has the lowest num- "Chemosynthesis-obligate", "Relative Adult Mobility", and "Estimated Maximum Body Size" traits are scored for more than 99% of taxa; "Depth Range" and "Nutritional Source" traits have greater than 90% coverage (Supporting Information Figure S1 A.4). The remaining traits are scored for at least 69% of taxa. "Estimated Maximum Body Size" is one of the best-scored traits and also has the highest average certainty (2.8 of a possible score of 3). Average certainty across all traits is, however, greater than 2.5, apart from Gregariousness, "Nutritional Source" and "Trophic Mode" (averaging 2.4; Table 2). For a trait-by-trait summary of results, see Supporting Information Appendix S6.

| Comparison with other datasets
The sFDvent dataset has fewer traits and records than many trait databases focusing on shallow-marine, freshwater, and terrestrial taxa (Table 3). Nonetheless, sFDvent has more traits than the carabids.org (Homburg, Homburg, Schäfer, Schuldt, & Assmann, 2013) and stream invertebrates ( The trophic mode of a species affects its energy demand, as well as the amount of food it makes available to others during the feeding process. This trait is also an indicator of resilience, as more generalist feeders (such as detritivores and omnivores) are less likely to be affected by competition for food and/or changes to food supplies and quantities. Contrarily, carnivores depend on the presence of prey to survive and are potentially more vulnerable to environmental change affecting prey populations.

Nutritional Source
Sediment or rock surface, Water column, Fauna, Symbiont Categorical This trait captures similar information to trophic mode, but also reflects the dependence of a species on a particular feature of the local environment. For example, a species dependent on nutritional sources in the water column might be more at risk if mining creates sediment plumes in the water column that clog the organism's feeding apparatus.
On the other hand, if a species can supplement its chemosynthetic energy source with a water column supply when vent fluid dynamics change, it may survive better in an area where food supply is greater (e.g., in the water column of an area of high primary productivity). Thus, the importance of and rationale behind use of this trait, as with all traits in this   similar biological parameters to all of the trait databases described in Table 3, differing in terminology (trait names and modalities) rather than conceptual basis (e.g., see Table 4). For example, feeding, survival, growth, reproduction, and community assembly processes can be assessed using the traits in this database ( Figure 1) and in databases focussing on other ecosystems and/or fauna (Table 4). Note: Superscript numbers are used to identify trait database sources, as provided in Supporting Information Table S7.1, and "NA" is used to abbreviate "not applicable". Note that the summary information for each of these databases (e.g., number of records, species, and traits) is accurate as of 20 November 2017.

TA B L E 3 A comparative review of animal trait databases
TA B L E 4 A proposed "common terminology" for faunal trait databases to ensure their comparability across ecosystems, based on a comparative review presented in Table 3 and Supporting Information Binned size classes to enable entry of rounded estimates.

Body shape (adult and offspring separately)
Foundation Species (as body shape affects the ability of a species to provide a foundation) Body shape 14 , growth form(s) 9, 13 , shape factor 13 Fixed options from a range of trait databases, to capture shape more broadly than per taxonomic group.
Reproduction strategy Reproductive Type* Reproduction/reproductive type 6,7,9,13 , mode of reproduction 12 , sexual system 1 Options covering how many times an animal reproduces per lifetime, whether it requires a partner for reproduction, and whether reproduction can take place more than once per year.

Development mechanism
Larval Development* Developmental mechanism 9, 12 , larval development 12 Simple scoring options to capture extent to which offspring are dependent on parents or their resources for development.

Feeding
Primary diet (adult and offspring separately, and then also secondary diet) Nutritional Source Diet 2, 8 , food source 6 , food 13 , feeding diet 13 To enable cross-system comparisons, this would need to be broad. For example, "plant-based", "animal-based", "detritusbased" or "other", would capture major groups, including omnivory importance.
Primary feeding mode (adult and offspring separately, and then also secondary feeding mode) Trophic Mode Feeding mode 11 , feed mode 14 , characteristic feeding method 9 , feeding habits 7 , trophic level 5 This could be used to capture the source of food and the energy required to find food. For example, broad options could be: "scavenging", "hunting", and "dependent on other fauna".
Food active or passive Nutritional Source (e.g., carnivorous species eating fauna would have "active" food and species depending on the water column would have "passive") Food active or passive 3 , hunting abilities 5 This is a simplistic trait that could be used in place of "primary feeding mode". If this is to be cross-ecosystem comparable, this would likely need converting to scores such as: "rock-based", "plant-based", etc.

| D ISCUSS I ON
sFDvent is a global-scale trait database for deep-sea hydrothermal-vent species, compiled using literary sources, existing taxonomic databases [ChEssBase (Baker et al., 2010), WoRMS (Horton et al., 2017) and Desbruyères, Segonzac, and Bright (2006) Body size, for example, is a trait identified to play a fundamental role in ecosystem functioning, ecological processes, and shaping biodiversity (Mindel, Webb, Neat, & Blanchard, 2015); this trait ("Estimated Maximum Body Size") has been scored for all but three taxa in sFDvent. Also scored with high coverage is mobility -identified in marine ecosystems as important for dispersal potential (Costello et al., 2015) and, thus, population dynamics, as well as ability to escape in the event of a disturbance. Scores for "Relative Adult Mobility" are provided for more than 99% of taxa in sFDvent and can now be used in diversity-oriented studies as well as those investigating reproduction in vent fauna and its influence on vent biogeography (Mullineaux & France, 1995;Yahagi, Watanabe, Kojima, & Kano, 2017). Similarly, due to complete coverage, "Chemosynthesis-obligate" can be used to ascertain endemism levels in taxonomic, geographic, and other groups, which may be particularly important when considering the impacts of mining on vent ecosystems, given the close relationships between endemism and resilience (Vasconcelos, Batista, & Henriques, 2017).
The sFDvent database also has an important role in its capacity to highlight knowledge gaps and research biases. For instance, missing and/or low certainty scores in "Gregariousness", "Trophic Mode" and "Nutritional Source" traits highlight a need for observational and behavioural studies. These traits would improve our understanding of community structure and dynamics, as well as macroecologicalscale variability in vent food webs. In addition, despite literary focus on vent annelids and molluscs (Supporting Information Appendix S2), Ecological process/function

Similar trait(s) in other databases
Potential scoring mechanism to enable scoring in less well-studied ecosystems

Gregariousness
How often found in groups or clusters? (Gregariousness) Sociability 9, 12 , coloniality 1 , occurrence in large quantities 13 This can be simply broken down to: "always found with others", "sometimes found with others" and "never found with others".

Dependency
Chemosynthesis-obligate, Position of Symbiont Dependency 9 Symbiotic relationship types present across all ecosystems would need to be included as scoring options (e.g., mutualistic, parasitic).

Ecosystem engineer Habitat Complexity Ecosystem engineering 12
This can be a "yes/no" score, depending on whether a species modifies the habitat around them or creates habitat for other fauna by being present.

Average associated depth / altitude (m)
Depth Range (m) Water depth 1 , depth 2 , depth preferences 7 , altitudinal preference(s) 7, 13 500-1,000 m intervals can be established from the deepest ocean basin to the highest mountain, to capture depths and altitudes in a comparable way (e.g., with ranges below sea level expressed with a minus sign).
Note: Italicized items are either: (a) not ecological traits (e.g., location information), or (b) similar in what they capture but more context-dependent than other traits compared. Superscript numbers are used to identify trait database sources, as provided in Supporting Information Table S7.1. Traits with an asterisk were removed from the recommended sFDvent dataset (Supporting Information Table S4.2) but are present in the raw dataset (Supporting Information Table S4.3).

TA B L E 4 (Continued)
arthropods are the best-scored fauna in the database. Meanwhile, as one might expect given publication and sampling bias (Supporting Information Appendix S2), the North Pacific has the highest number of scored taxa, emphasizing a need to score traits in less well-sampled regions. Furthermore, despite the fundamental importance of reproductive traits in ecology (Mullineaux et al., 2018), trait scoring for "Reproductive Type", "Larval Development" and "Dispersal Mechanism" did not have sufficient coverage to be included in sF-  (Brun et al., 2017;Madin et al., 2016;Parr et al., 2017). Highly scored traits will facilitate cross-ecosystem analyses. Nevertheless, our traits were designed for highly specialized fauna in remote, deepsea environments. Therefore, to conduct a comparative analysis across different trait databases, we would need to "translate" the trait terminology used (Table 4). Thus, we echo calls for common terminology across systems (Costello et al., 2015) to advance traitbased approaches for macroecological biodiversity studies. While important goals for ecological understanding can be met using species-and ecosystem-specific traits (e.g., mapping global biodiversity patterns), a common language linking databases and systems would enable us to investigate truly global-scale patterns, as well as human impacts upon these systems [Convention on Biological Diversity (CBD), 1992].
Comparing sFDvent to other databases also highlights our unique approach to data collection. Other databases have tended to focus on literary sources of information [including other databases; e.g., MarLIN (2006)], whereas sFDvent was predominantly filled using expert knowledge, and sFDvent entries scored using the literature were peer-reviewed by experts. A major finding of the sFDvent project is that there is a lag, wherein published information remains behind the current knowledge of experts. Furthermore, publications tend to focus on species in a given location and, when used to score species traits, might not represent the most common trait score for a species more generally. Many deep-sea species are observed using remotely operated vehicles but remain unsampled, with traits unmeasured and undocumented. Despite this, scientists participating in research cruises accumulate a wealth of knowledge through observations of these "unrecorded" species. This emphasizes the importance of including collective expert opinions, in combination with published information, in trait databases. We expect that terrestrial, freshwater, and marine ecologists, too, gain insights as to the common traits of species -undocumented in official publications, but recorded in field notebooks, photographs, and recalled observations. A trait-based approach enables researchers to capture these "hidden" data sources, although we advise remaining cautious by capturing relative certainty in conjunction with expert-derived scores.
Moreover, pooling expert opinion on species-trait scores captured the current state of knowledge in a relatively quick timeframe (1 year as opposed to 10 or more for other databases; Supporting Information Figure S1 A.1), where knowledge from observations made during research cruises, and unpublished data, could be incorporated and credited using contributor ID metadata. Thus, we suggest that using a working-group approach and online collaboration tools to produce a shared data source, designed, tested and agreed upon by experts who have contributed to, and will benefit from, the data, is a means to produce a quality product. We expect that sF-Dvent will form a baseline single repository for expert knowledge on deep-sea hydrothermal-vent species, with ongoing community input. In addition to promoting international collaboration in its design and population, the database showcases the benefits of a working-group approach and knowledge sharing among members of the chemosynthesis-based-ecosystem research community. Experts across the globe can use sFDvent to reduce uncertainty when developing conservation and management plans for deep-sea hydrothermal vents -previously untouched, but now under threat from human exploitation.

ACK N OWLED G M ENTS
We would like to thank the following experts, who are not authors on this publication but made contributions to the sFDvent database: Anna Metaxas, Alexander Mironov, Jianwen Qiu (seep species contributions, to be added to a future version of the database) and Anders Warén. We would also like to thank Robert Cooke for his advice, time, and assistance in processing the raw data contributions to the sFDvent database using R. Thanks also to members of iDiv and its synthesis centre -sDiv -for much-valued advice, support, and assistance during working-group meetings: Doreen

AUTH O R CO NTR I B UTI O N
Order of authorship is as follows: (a) first author (ASAC); (b) alphabetical for core sFDvent working group members involved in database design; (c) alphabetical for all other contributors; and (d) senior author (AEB). This database and manuscript was proposed to the Synthesis Centre of the German Centre for Integrative Biodiversity Research (sDiv, iDiv) by ASAC and AEB in response to a call for working group proposals in 2016. An sFDvent working group (AEB, ASAC, SEB, AC, AVG, AH, ER, JZ, TCK and VT) met at iDiv to design and test the database, as described in this manuscript. All manuscript authors contributed data to the database, with AEB, ASAC, IJC, and SJS gathering data from existing literary and online sources as a start point. AEB and ASAC organized contributions from international collaborators before: compiling, cleaning and processing the data; conducting the analyses; and writing the first draft of the manuscript.
All authors checked and edited and/or approved the recommended dataset and manuscript.

DATA ACC E S S I B I L I T Y
Raw, processed, and recommended versions of the sFDvent database (Supporting Information Appendix S4) and usage notes (Supporting Information Appendix S5) are described, and digital object identifiers (DOIs) provided, in the Supporting Information accompanying this manuscript. We recommend that readers refer to Supporting Information Appendices S4 and S5 for guidance on using the datasets, as the recommended version is the only version that could be analysed without further processing. Nevertheless, we encourage users to consider further processing, to ensure it is tailored to, and appropriate for, the research question and/or analysis being planned. The recommended, "ready-to-use" database is accessible via https ://doi.org/10.5061/dryad.cn2rv96. The reference list for all literary sources cited in the raw and recommended database files is provided in Supporting Information Appendix S1.