Towards an ecological trait‐data standard

1. Trait‐based approaches are widespread throughout ecological research as they offer great potential to achieve a general understanding of a wide range of eco - logical and evolutionary mechanisms. Accordingly, a wealth of trait data is avail - able for many organism groups, but this data is underexploited due to a lack of standardization and heterogeneity in data formats and definitions. 2. We review current initiatives and structures developed for standardizing trait data and discuss the importance of standardization for trait data hosted in distributed open‐access repositories. 3. In order to facilitate the standardization and harmonization of distributed trait datasets by data providers and data users, we propose a standardized vocabulary that can be used for storing and sharing ecological trait data. We discuss poten - tial incentives and challenges for the wide adoption of such a standard by data providers. 4. The use of a standard vocabulary allows for trait datasets from heterogeneous sources to be aggregated more easily into compilations and facilitates the crea - tion of interfaces between software tools for trait‐data handling and analysis. By aiding decentralized trait‐data standardization, our vocabulary may ease data in - tegration and use of trait data for a broader ecological research community and enable global syntheses across a wide range of taxa and ecosystems.


| INTRODUC TI ON
Functional traits are phenotypic (i.e. morphological, physiological, behavioural) characteristics that are related to the fitness and performance of an organism (McGill, Enquist, Weiher, & Westoby, 2006;Violle et al., 2007).Recent years have seen a proliferation of traitbased research in a wide range of fields: trait data have been used to understand the evolutionary basis of individual-level properties (Salguero-Gómez et al., 2016), global patterns of biodiversity (Díaz et al., 2016), and the relationship between ecosystem functions and the functional composition of species assemblages (Bello et al., 2010;Mouillot, Graham, Villéger, Mason, & Bellwood, 2013).This research provides the mechanistic framework for linking climate change or anthropogenic land use to biodiversity and its related functions (Allan et al., 2015;Díaz et al., 2011;Lavorel & Grigulis, 2012).Species traits have been suggested as indicator variables for monitoring ecosystem health at the individual level, like for instance changes in body sizes in a population of fish (Kissling et al., 2018).Because functional traits allow us to infer the ecological role of organisms from their apparent features, regardless of their taxonomic identity (Grime, 2001;Moretti et al., 2017;Villéger, Brosse, Mouchet, Mouillot, & Vanni, 2017), their measurement is also a promising means of bypassing taxonomic impediment, i.e. the fact that most species are yet undescribed, and little is known of their interactions with other organisms and their environment.
Despite the importance of trait-based approaches, fully exploiting their potential relies heavily on the broad availability and compatibility of trait data to achieve sufficient taxonomic and regional coverage, both of present-day taxa as well as in evolutionary deep-time.However, the heterogeneity of data arising from different research contexts render trait data extremely heterogeneous and make the task of data compilation time-consuming and error-prone.To date, trait data have traditionally been harmonized and compiled into centralized databases only for specific organism groups and regional scope, often centred around particular research questions (e.g.PanTHERIA, Jones et al., 2009;TRY, Kattge, Díaz, et al., 2011;AmphiBio, Oliveira, São-Pedro, Santos-Barrera, Penone, & Costa, 2017).Less well-studied taxa and specialized research questions lack the resources for such an endeavour.Besides initiatives aiming at assembling data, tools to enable the compatibility of data across databases are being developed.These include software to access trait data from the Internet (e.g.Ankenbrand, Hohlfeld, Weber, Foerster, & Keller, 2018;Chamberlain, Foster, Bartomeus, LeBauer, & Harris, 2017), semantic web standards (Page, 2008;Wieczorek et al., 2012) and thesauri of consensus terms (Garnier et al., 2017;Walls et al., 2012).
Meanwhile, open and reproducible science has become mainstream: publication of research data without access restrictions, with structured metadata and in accordance with data standards to enable their reuse, has become the declared goal of an open biodiversity knowledge management (http://www.bouchoutde clara tion.org/) and is increasingly demanded by journals and public research funding agencies (Alliance of German Science Organisations, 2010; Royal Society Science Policy Centre, 2012).As a result, an increasing number of individual research projects publish their primary data on general-purpose file hosting services, where no data standards are enforced upon the uploaded material (Wilkinson et al., 2016).It is thus likely that trait data will become increasingly available, but a lack of data and metadata standardization will hamper the efficient reuse and synthesis of published datasets.
In this paper, we review existing initiatives for trait-data collection and standardization from the pragmatic view of data providers, data curators and data users, as well as data managers.We discuss current efforts to make trait data visible, accessible, interoperable and reuseable in downstream data analysis, as demanded by the FAIR guiding principles for scientific data (Wilkinson et al., 2016).Furthermore, we show how the current deficit in the standardization of primary data hampers the implementation of interoperability and reuse of trait data.Based on these considerations, we propose a versatile vocabulary for describing ecological trait datasets, which builds upon, and is compatible with, existing terminology standards for biodiversity data, in particular the Darwin Core Standard for biodiversity data (DwC; Wieczorek et al., 2012).Since a standard vocabulary relies on the adoption by a broad research community, we discuss incentives for its use and lay out mechanisms for future consensus-building and community development towards an accessible and easy-to-use ecological trait-data standard vocabulary.

| INITIATIVE S FOR TR AIT-DATA S TANDARDIZ ATION
The need for standardizing trait data arises from the prospective gain of compiling heterogeneous trait datasets for data synthesis.
Often, the scientific scope and focus differs between data providers measuring and assessing the trait data in the first place and data users who reuse published data for a broader synthesis application.
Furthermore, data curators and data managers are taking up the task of providing compiled and harmonized data and prepare them for future use and long-term preservation.Data managers are concerned with the development of complex digital infrastructures for handling and analysing large amounts of data.These are idealized roles of researchers that are dealing with trait-data standardization throughout the data life cycle.In this chapter, we review four types of initiatives that are of relevance for trait-data standardization (see Glossary in Table 1 for italicized terms): 1. Initiatives that provide trait datasets which have been assembled out of a particular research interest, either by measurement or collated from the literature.We consider these initiatives separately although they are often developed in conjunction to serve a particular database project, such as the TRY plant database (Kattge, Díaz, et al., 2011;Kattge, Ogle, et al., 2011) and the Thesaurus of Plant characteristics (TOP; Garnier et al., 2017).We show how the degree of trait-data standardization in existing datasets is highly variable, and which tools and standards are currently applied to achieve harmonization of data from multiple, distributed sources.The objective of this review is to raise awareness of the generic structure of trait data and aid researchers in how to share and publish their own datasets in an appropriate form.

| Trait datasets
In the field of comparative biology, morphological traits, such as traits related to flower shape, leaf and stem structures for plants or wing and beak measurements for birds, as well as life-history traits such as Ellenberg values for plants or physiological and reproductive traits for animals (e.g.feeding biology, dispersal, metabolic rate and body size) have been assessed for decades and have been published in regular journal articles or books.With the rise of ecological traitbased research, measurements and information available from species descriptions have been compiled into project-specific datasets that typically comprise a local set of taxa and a focal set of traits.A plethora of such static datasets has been published alongside scientific articles, or as standalone data publications (see Kleyer et al., 2008 for a review on plant data; for animal data, e.g.Gossner et al., 2015 and Appendix S1, Table A1).
Today, the online publication of such data is greatly facilitated by file hosting services (e.g.Figshare, Zenodo, Researchgate, Data Dryad), which warrant long-term accessibility, and citeability via TA B L E 1 Glossary of terms from the biodiversity data-management context as they are used in this paper; draws from Garnier et al. (2017) (Michener, 2006).In the context of trait data, such additional information can move to the body of the primary data table when data are compiled from different sources

Occurrence
The observation context of a single individual, i.e. the existence of an organism at a particular place and time; Sometimes used as synonym of 'observation' in data management context Ontology A semantic model of the objects and their relationships in a domain of interest (Gruber, 1995); defines terms and concepts in a formal language that provides cross-references and semantic meaning; commonly published in OWL format for machine readability

Semantic web
An extension of the World Wide Web that aims for machine-readable meaning of information via well-defined data standards, ontologies and exchange protocols (Berners-Lee et al., 2001); the World Wide Web Consortium (W3C) defines standards, i.e. specifications of protocols and technologies for the semantic web (http://www.w3.org/stand ards/seman ticwe b/) Term A word that names or labels a particular concept as part of the specialized vocabulary of a field.

Terminology
The body of terms and concepts used with a particular application in a subject of study, usually formalized in a thesaurus or ontology

Thesaurus
Controlled vocabulary that provides key terms with their associated concepts and relations for a specific field or domain of interest (Laporte et al., 2013) depending on the research question and observation context.The column descriptions and terminology applied to taxa and traits are mostly project-specific and rarely chosen for compatibility with larger database initiatives.Variability in the number and meaning of columns in these data tables requires tedious manual adjustments when merging multiple datasets (Wickham, 2014).Furthermore, metadata provided along with the primary data vary in their level of detail, e.g. for documenting descriptions of variables, measurement procedures or sampling context (Kattge, Ogle, et al., 2011).
While, in some datasets, information like geolocation or sampling date and time might be dataset-level information, thus qualifying as metadata, in other datasets they might be collected on a level of individual observations (see section on data compilations below).
More importantly, clear statements on ownership and authorship, terms of use, or internationalization (e.g.separators and delimiters), are often still neglected in primary trait-data publications.The task of harmonizing trait data is taken up by data-curating initiatives, who compile heterogeneous data into comprehensive databases (see next section).

| Data compilation initiatives
In the past two decades, many distributed trait datasets have been aggregated and harmonized into greater collections with particular taxonomic or regional focus (e.g.Kleyer et al., 2008;Oliveira et al., 2017, see Appendix S1, Table A1).While these initiatives successfully address issues of heterogeneity in units or categorical variables, or achieve high taxonomic or geographic coverage, few of these compilations apply a standardized terminology for taxa or trait definitions.Additionally, in the process of data aggregation, rich metadata content might be lost, as the detail in the original files dif-  Guralnick et al., 2016).
Specialized online portals have been created to attract data submissions from a defined research field and take care of data harmonization, thereby greatly facilitating data synthesis.For example, by aiming for a universal framework for plant traits, the TRY database (Kattge, Díaz, et al., 2011) attracted more data submissions and downloads than any other trait-data platform.The online portal enables selective data download and management of user permissions.For animal trait data, however, a single unified platform and harmonizing scheme is still lacking.Nonetheless, initiatives for particular groups of animals do exist.Examples are the BETSI database on soil invertebrate traits (http://betsi.cesab. org/; Pey et al., 2014), the Carab ids.org web portal (http://www.carab ids.org/), the Coral Trait Database (Madin et al., 2016), or the Global Ants Database (Parr et al., 2017, see Appendix S1, Table A1).The role of online portals and database initiatives in standardizing data and making them more accessible is paramount.
Trait-data portals incentivize data submissions by offering increased data visibility and usage, while providing data-use policies that secure author attribution and, potentially, co-authorship of associated articles.However, maintaining centralized database infrastructures is costly and requires long-term funding (Bach et al., 2012).

| Terminology standards for traits
A major challenge in trait-data standardization is the lack of widely accepted and unambiguous trait definitions (Kissling et al., 2018).
Previous standard definitions of trait concepts range from listings of selected definitions in vocabularies, over well-defined method handbooks and comprehensive thesauri, to formalized definitions of trait concepts in ontologies.The initiatives behind method handbooks, thesauri and ontologies are essential for building community consensus for trait definitions.
Very general classes of traits are defined within the list of GeoBON Essential Biodiversity Variables (Kissling et al., 2018) aiming for a list of functional indicators for ecosystem health.
Assigning a detailed and unambiguous methodological protocol for a trait, including the units to use or the ordinal or factor levels to be assigned, is essential for standardizing its measurement process.
Efforts to develop handbooks for measurement protocols provide such a methodological standardization for plants (Cornelissen et al., 2003;Perez-Harguindeguy et al., 2013) or invertebrates (Moretti et al., 2017), but are of limited use in harmonizing trait data that predate or ignore this standard (Kattge, Ogle, et al., 2011).
A thesaurus provides a 'controlled vocabulary designed to clarify the definition and structuring of key terms and associated concepts in a specific discipline' (Garnier et al., 2017;Laporte, Garnier, & Mougenot, 2013).To provide a logic structure for trait terms, Garnier et al. (2017) suggest the Entity-Quality model (EQ), where a trait is defined as 'an entity having a quality' (for instance for trait 'femur length', 'femur' is the entity and 'length' the quality).In thesauri, hierarchies of concepts can be formalized by linking each term to broader or narrower terms, or to synonyms.For example, the definition of 'femur length of first leg, left side' is narrower than 'femur length' which is narrower than 'leg trait' which is narrower than 'locomotion trait'.Being publicly available, it is also possible to refer to these defined terms via globally unique Uniform Resource Identifiers (URIs).For example, a measurement of fruit mass could be linked to the definition of the term within the Thesaurus of Plant characteristics (TOP, Garnier et al., 2017) via its URI 'http://top-thesa urus.org/annot ation Info?viz=1&&trait =Fruit_mass'.
In addition to defining terms for human interpretation, ontologies define terms by their relationship to other defined terms, thereby providing a semantic model of the concepts used within a domain of research, with the objective of enabling the computational interpretation of data (Kissling et al., 2018;Walls et al., 2012Walls et al., , 2014)) (Mungall, Torniai, Gkoutos, Lewis, & Haendel, 2012).
To conclude, there is already a suite of globally available thesauri and ontologies for traits.However, definitions in some domains are better covered than others (Kissling et al., 2018)

| Trait-data structures
While trait thesauri and trait ontologies typically define concepts of measurements and observations for focal groups of organisms, they do not specify the format or structure in which trait data should be stored and labelled.
A trait dataset typically contains multiple data entries, where each entry describes a trait value observed on an instance of a scientific taxon.The item on which the value has been observed can be very variable, ranging from an occurrence of an individual at a specific place and time in its natural environment or a preserved specimen in a collection (Figure 1a), a group of individuals of a specific taxon (Figure 1b), or an entire population of a species (Figure 1c,d).The reported trait values may be quantitative measurements or qualitative facts.Quantitative measurements are values obtained either by direct morphological, physiological or behavioural observations on single specimens (Figure 1a), by aggregating replicated measurements on multiple entities (Figure 1b) or by estimating the means or ranges for the respective taxon as reported in the literature or other published sources (e.g.databases, Figure 1c).This encompasses a wide range of numeric data types, including continuous, binary, integer, intervals or ratios, as well as categorical (ordinal or nominal) values.
Qualitative facts are assignments of categorical information, often on entire taxa, e.g. of a behavioural or life-history trait (Figure 1d).
Beyond these core observations, further information might be available that specify the taxon concept applied, provide detail on the measurement method, or that place the reported measurement in a broader observation context (including geolocation as well as date and time of sampling).As such data may be useful for future analysis of the causal reasons of trait variation or to explain noise in measurement data, it should always be published along with the core data.In most cases, information on place and time apply to the entire dataset, and thus would be included in the metadata accompanying a data publication (potentially applying Ecological Metadata Language, EML, KNB, 2011 as a formal structure).In the case of trait data and depending on the research scope, the information may also have been collected on a level of measurement, occurrence or taxon level.Geolocation or date and time would then not be provided as metadata, but as covariate data in additional columns of the primary dataset.When compiling datasets, it is a key task of data curators to deal with dataset-level information and maintain it for downstream analysis by incorporating it into the compiled data table.
Standard terms for the formal description of the common con- the scope of this review (Franz et al., 2016).Initiatives that aim at providing a stable reference while tracking the changing taxon concepts are for instance the Catalogue of Life (https ://www.catalogueo flife. org/) or the EDIT Platform for Cybertaxonomy (https ://cyber taxon omy.eu/).The GBIF Backbone Taxonomy (GBIF Secretariat, 2017) collects and bundles existing terminologies into a single reference framework.

| Closing gaps to improve trait-data reuse
In sum, we attest to a gap between the trait-data structures developed for data curators and data managers and the data input produced by data providers.Hardly any of the aforementioned standalone or aggregated trait datasets for birds, amphibians, mammals or invertebrates employs the described standard terminologies, ontologies or data standards.As it stands, reusing these data in larger compilations or integrating them into structured database initiatives is error-prone and labour-intensive and the potential for a broad synthesis is diminished.
One likely reason for this lack of standardization is the complexity of the task: the proposed data structures are designed for multilayered, relational databases rather than for standalone datasets for which a two-dimensional data Another solution for data users to access trait data in a structured way is offered by decentralized tools and tool chains to facilitate the use and analysis of trait data.For instance, the r-package traits (Chamberlain et al., 2017) contains functions to extract trait data directly from their source, including Birdlife, EOL TraitBank or BetyDB.The package tr8 provides similar access to plant traits from a list of databases (including LEDA, BiolFlor and Ellenberg values; Bocci, 2015) and aggregates them into a species × traits wide-table .FENNEC (Ankenbrand et al., 2018) is an online tool or self-hosted service capable of extracting trait information from multiple sources for a target species community.
A more widespread implementation of ontologies would advance the possibilities to integrate datasets and reduce noise and uncertainty when aggregating data.First, groups of trait researchers must take up the task of developing consensus definitions into semantically defined ontologies that are useful for their use case.Platforms like OBO Foundry can help structuring this process.Second, the reference to ontologies and thesauri must be incentivized and facilitated for individual data providers by the development of tools for matching concepts from the available ontologies to their data.Third, frameworks for providing trait data in an unambiguous and machine-readable structure must be simplified to match the limited resources of small and intermediate research projects.This can be achieved by extending documentation or providing tools for the application of existing ontology frameworks and database structures (e.g.data validator services), and by defining easy-to-use standard vocabularies that enable the interoperability of data at minimal effort.
However, no unified and widely adopted terminology for primary trait-data publications has emerged across the multiple sub-disciplines of trait-based research.In the following chapter, we propose a unified vocabulary for trait data that can serve as a minimal consensus for describing and labelling trait data.The simplicity of this standard terminology will lower the thresholds and offer high pay-off in the visibility and reuse of published data.By establishing this as a 'best-practice' in trait-based research, trait data will eventually fulfil the FAIR guiding principles for scientific data (Wilkinson et al., 2016).

| INTRODUCING THE ECOLOG I C AL TR AIT-DATA S TANDARD VOC ABUL ARY
As a response to the challenges outlined above, we propose a versatile standard vocabulary for trait-based ecological research.The Ecological Trait-data Standard Vocabulary (ETS) is accessible at https ://termi nolog ies.gfbio.org/terms/ets/pages/ and combines terms of DwC with newly defined terms to cover the variety of trait-based approaches and their different needs to report measurement detail.
Rather than prescribing a data structure or exchange format, the vocabulary is intended as a more inclusive terminology that can be used in three major use cases: 1. by data providers: for publication of standardized primary data on open-access data repositories, or for labelling project-specific data for local use and exchange with collaborators, e.g. in two-dimensional data tables or project databases, 2. by data users and data curators: as a consensus vocabulary when compiling data from distributed sources into aggregate datasets, e.g. to map standardized columns and refer to taxa and trait definitions in a uniform way, and 3. by data managers: in developing data exchange formats between online resources, web services and software tools, e.g. when providing database queries via a web service or defining input and output formats of software packages.
All terms may be applied to describe columns of a data table (Figure 2; see Appendix S2 for best-practice principles and examples for publishing primary data).By applying these standard terms, data providers can ensure that the description of trait measurements uploaded into public data repositories will be unambiguous.It will facilitate interoperability of published data and enable their reuse for future data aggregation initiatives and data synthesis, while warranting long-term accessibility.
Our vocabulary offers three extensions to contain additional information on the context of the observation along with the core data in analogy to DwC extensions ('Taxon', 'Measurement or Fact', and 'Occurrence'; see section on extensions below).Further terms are provided for dealing with typical dataset-level information on authorship and rights of reuse of the data (based on terms of Dublin Core Metadata Initiative, DCMI), as well as for defining own trait concepts (see section on metadata below).Aspects not covered by the vocabulary may draw from terms provided by other existing terminologies (in particular DCMI and DwC and its extensions), or be added as user-defined columns (which should then be clearly specified in the metadata-information accompanying the dataset).
F I G U R E 2 Formats used for trait datasets: (a) taxon-level trait data compiled from literature or aggregated from measurements are often published as a compiled species × traits wide-table ; (b) observation long-tables are a well-defined and tidy data format, reporting one single measurement per row and relating it to a standard trait definition and accepted taxon name; (c) additional columns may provide original names for maintaining author-side continuity, identifiers reference to taxa and trait concepts via unambiguous URI pointers.Additional identifiers relate each row to other layers of information on (d) the taxon resolution, the individual organism (i.e.occurrence), or the origin of or confidence in the reported measurement or fact

| Building community consensus
In designing this vocabulary, we drew on the combined expertise of empirical biodiversity researchers (data providers), biodiversity synthesis researchers (data users), and biodiversity informatics researchers (data managers).The aim was to develop a simple, easy-to-use template for standalone trait-data publications or data compilations, to facilitate their reuse for synthesis and integration into larger database structures.Earlier proposals for trait-data standards (e.g.Kattge, Ogle, et al., 2011;Parr et al., 2016) have been designed for relational database structures from a data manager perspective, which may be the reason why they have so far hardly been adopted for primary data publications.We paid particular attention to these existing data standards (e.g.Garnier et al., 2017;Kattge, Díaz, et al., 2011;Kattge, Ogle, et al., 2011;Madin et al., 2007;Parr et al., 2016)

| Specification of core terms
To qualify as trait data complying with the ETS, the following content is required at minimum (Figure 2b): To ensure compatibility with project-specific databases or analytical code, it might be in the interest of the data author to keep user-specific identifiers for those terms, for which we are suggesting the use of verbatimScientificName and verbatimTraitName (Figure 2c).
By allowing user-side entries along with consensus terms, we acknowledge the fact that most authors have their own schemes for standardization which may refer to different scientific community standards (as also practiced in TRY, Kattge, Díaz, et al., 2011;Kattge, Ogle, et al., 2011).The redundancy of labelling allows for continuity for data providers while also enabling quality checks and comparability for data curators.
Similarly, standardization of units can be achieved by relying on SI base units or by relating units to unambiguous concepts via URIs provided by ontologies (Gkoutos et al., 2012;Keil & Schindler, 2018;Madin et al., 2007).For categorical or binary traits, the categories should conform to expected levels as defined in the trait concept or be unambiguously defined in the metadata of the dataset.The vocabulary offers terms for keeping the user-defined values in dataset-specific units and factor levels along with standardized entries (verbatimTraitValue and verbatimTraitUnit, Figure 2c).

| Extensions for additional data layers
Beyond measurement units or higher taxon information, further information might complement the core data which are related to the individual specimen, the reported fact, measurement or sampling event.We propose three extensions of the vocabulary that should be used to describe this information (Figure 2d), in line with the existing DwC extension structure: 1.The Taxon extension provides further terms for specifying the taxonomic resolution of the observation and to ensure the correct reference in case of synonyms and homonyms.
2. The MeasurementOrFact extension provides terms to describe information at the level of single measurements or reported facts, such as the original literature reference for the reported value, the method of measurement or statistical method of aggregation.
It provides important information that allows for the tracking of potential sources of noise or bias in measured data (e.g.variation in measurement method) or aggregated values (e.g.statistical method), as well as the source of reported facts (e.g.literature source or expert reference).
3. The Occurrence extension contains vocabulary to describe information on the observation context of individual organisms, such as sex, life stage or age.This also includes the method of sampling and preservation, as well as the date and geographical location, which provide an important resource to analyse trait variation due to differences in space and time.fier references the precise specimen at a particular place and time from which the measurement was taken (Groom, Hyam, & Güntsch, 2017;Güntsch et al., 2017).Also, as 'occurrence' is strictly defined by a date-time event, it may be identical to the common-sense concept of 'observation'.As such, data entries for location of sampling (provided in column locationID) and sampling campaigns (eventID), which are often recorded and published along with trait data, are tightly linked to the concept of 'occurrence'.As occurrence is the narrower term and the key concept for linking an individual organism to a location and sampling event in DwC, and since it is indeed relevant to distinguish between multiple 'occurrences' of the same organism in some trait-based research applications, the ETS sticks to this terminology.
Identifiers can also be used to provide a structure within the measurement data table, e.g. to link rows of measurements on the same individual (by having entries share the same ID in column oc-currenceID).Similarly, the values of multivariate measurements can be linked by using the same measurementID for several rows.
The terms of the extensions draw from terms of the DwC extensions of particular relevance for trait data.See the documentation of the ETS for further detail on the use of extensions.

| Specification of metadata
Dataset-level information about structure, provenance of data, authorship and data ownership, as well as terms of use should be considered when sharing and working with trait datasets (Kissling et al., 2018;Michener, 2006).In the case of primary measurement data, this information usually applies to the entire trait dataset, and would be stored along with the published data as metadata entered in a template provided by the file hosting service.To facilitate interoperability and computational evaluation of metadata, specific standards for metadata may be provided, e.g. by applying Ecological Metadata Language (EML, KNB, 2011).Whenever data from different sources are compiled into a single dataset, metadata information would become part of the resulting data table, as each data entry would have to maintain reference to the original data provider and conditions of reuse of these data.This can be achieved by appending the metadata terms as columns to the core dataset, or by linking to a secondary data table via an unambiguous datasetID (e.

| D ISCUSS I ON
To serve the demand for the standardization and harmonization of ecological trait data which has arisen from a growing number of distributed datasets of different research contexts, we propose a versatile vocabulary for the publication of new datasets, for the creation of data compilations, and for the exchange and handling of trait data in the context of the semantic web.
Consensus building on how traits are to be used and evaluated is currently under way in several fields of ecological research with their taxonomic focus and project-specific questions (Garnier et al., 2017;Kissling et al., 2018;Moretti et al., 2017;Pey et al., 2014).Such community discussions on trait definitions and measurement practices are leading to a better quality of data, naturally.However, they still require a stronger linkage into the global biodiversity data initiatives.
With our proposal of an Ecological Trait-data Standard Vocabulary (ETS), we aim to capture the common core concept of trait data in a single resource terminology and provide a starting point for the development of a joint language and terminology around traits as a crosssectoral topic of ecological and evolutionary research.To enable the ETS to capture the different approaches in trait-based research

2.
Initiatives that aim to harmonize trait data from the literature or from direct measurements into data compilations or database infrastructures and make those data widely available on the Internet.3. Initiatives that aim at the standardization and development of consensus measurement methods and definitions for traits and provide standard terminologies.4.Initiatives that aim to combine data (1 & 2) and terminologies (3) into formalized structures for knowledge representation to link trait data to a wider set of biodiversity data.
cepts of biodiversity knowledge have been provided in the schema for biological collection records (Access to Biological Collection Data, ABCD; Holetschek, Dröge,Güntsch, & Berendsohn, 2012) or the Darwin Core Standard for biodiversity data (DwC;Wieczorek et al., 2012).Both DwC and ABCD are ratified standards of the Biodiversity Information Standards (TDWG, http://www.tdwg.org)which is a global network to support the development and wide adoption of exchange standards for biodiversity data.These terms may be used for defining columns in data tables that contain measurement values, units and categorical levels, taxon names, variables such as sex or life stage, information of time and date of observation and methodological details(Robertson, Döring, Wieczorek, DeGiovanni, & Vieglais, 2009).A suite of terminology extensions links to and expands the capacities of DwC(Wieczorek et al., 2012).Of particular importance for trait data is the 'MeasurementOrFact' extension, which typically would be used in database management and bioinformatics to structure trait observations(Parr et al., 2016).While the above-mentioned standards provide terms and concept definitions, and the logic relationships of those, they do not prescribe explicit structure for trait data.Based on the terms of DwC, the Extensible Observation Ontology (OBOE,Madin et al., 2007;Schildhauer et al., 2016) formalizes observations and measurements into a machine-readable ontology, thus being easily integrated into larger database management systems.By applying this scheme for plant traits,Kattge, Ogle, et al. (2011) propose a generic database structure that covers most potential use cases of trait-based ecology.This data structure is built around a central data table that contains observations of individual plants linked to several measurements of traits via identifiers.The observations are also linked to a taxonomy and metadata descriptors of the observation context, like location or experimental treatment.Kissling et al. (2018) discuss different ontologies (including OBOE) that formalize the structure of observation data and attest that for the use cases of trait data these ontologies are still difficult to integrate.The Encyclopedia of Life (EOL) has proposed TraitBank (Parr et al., 2016) as a standard structure for uploading data on physiological and life-history traits of all kingdoms of life.It is to date the most general approach of an integrated structure for trait data.The framework employs established terms provided by the DwC and the DwC MeasurementOrFact extension (Parr et al., 2016).Additional layers of information cover bibliographic references, multimedia archives and ecological interactions.TraitBank invites data submissions to the EOL database in a structured Darwin Core Archive (DwC-A, GBIF, 2017), which is a set of simple text files (csv), a file to specify relationships between these text files (called meta.xml), and a file for metadata descriptions using EML (called, EML.xml, see GBIF, 2017 for specifications, archives can be validated before upload on https ://tools.gbif.org/dwca-validator/).All of these structures suggest the use of stable URIs to refer to taxon concepts.The difficulties with keeping taxonomic references intact along with continuous changes in taxonomy consensus are a central challenge of biodiversity data management and are beyond F I G U R E 1 Types of ecological trait data assume different entities or reported qualities: (a) morphometric or morphological measurements of individual body features (lengths, areas, volumes, weights) or other quantities related to life history (e.g.reproductive rates, life spans); (b) aggregated trait values are reported as means taken on multiple measures of organisms of a taxon; (c) quantitative traits may be extracted from literature or existing databases, referring to the entire taxon (or a subset, e.g. a sex) as the subject of description; (d) qualitative traits are categorical, ordinal or binary descriptors of the entire species or higher taxonomic level (also called 'facts') (a) Measured quantitative data: an individual x of taxon y femur length of 14.1 mm (b) Aggregate quantitative data: (c) Quantitative literature data: average femur length of 12.2 ± 2.3 mm adult individuals of taxon y average body length of 43 mm males of taxon y (d) Qualitative literature or database data:

1
. a value (column traitValue) and -for numeric values -a standard unit (traitUnit); 2. a descriptive trait name (traitName) that links the observation to a standardized definition (i.e. a concept); 3. the scientific taxon name (scientificName) for which the measurement or fact was obtained that links the observation to an accepted taxon concept.The traitName and scientificName would use unambiguous terms assigning both to clearly defined concepts.Eventually, disambiguation can be warranted by adding globally valid Uniform Resource Identifiers (URIs) for taxon (taxonID) and trait definitions (traitID).For example, referring to GBIF Backbone Terminology, for Bellis perennis, the taxonID would be 'https ://www.gbif.org/speci es/3117424'; the traitID for 'fruit mass' according to Flora Phenotype Ontology would be 'http://purl.obolibrary.org/obo/FLOPO_0005265'.Wherever possible, the field traitID should point to an unambiguous trait definition in a published ontology.If no suitable reference exists, trait data should always be accompanied by a dataset-specific listing of trait concepts.Such a controlled vocabulary would, in its simplest form, assign trait names with an unambiguous definition of the trait and an expected format of measured values or reported facts (e.g. units or legit factor levels).Ideally, this definition refers to or refines terms from published trait ontologies.By providing a minimal vocabulary for trait lists within the ETS, we hope to facilitate the unambiguous definition of traits for trait datasets.This vocabulary might also prove useful for the future publication of trait ontologies.
g. a URI pointing to the source DOI) and a descriptive datasetName (e.g. a descriptive name for the source).The ETS metadata vocabulary provides terms for a minimal set of information that should be provided along with trait data.The suggested terms originate from Dublin Core Metadata Initiative (DCMI), and are widely compatible with terms provided by the DataCite Metadata Schema (DataCite Metadata Working Group, 2019).The terms can be extended and complemented by using terms from these resources.In order to ensure traceability, the metadata of any dataset that employs the ETS should refer to the specific online version that was used to build the dataset, e.g. by entering 'Schneider, F.D., Jochum, M., Le Provost, G., Penone, C., Ostrowski, A. and Simons, N.K., 2019, Ecological Traitdata Standard Vocabulary v0.10, https ://doi.org/10.5281/zenodo.2605377,URL: https ://termi nolog ies.gfbio.org/terms/ ets/pages/ ' in the metadata field conformsTo.Wherever referring to individual terms of the vocabulary in publications or metadata, this should be done via their individual URIs.
ConceptAn idea, notion or object that is made explicit in an information context by a term definition, and referenced to a URI or other accessible referenceControlled vocabularyA list of terms that gives all valid consensus terms for a particular context, while no unlisted entries are accepted fers, while the reference to the original dataset becomes obscured, as only aggregated values are reported (e.g.means or medians).Such trait-data compilations are often labelled 'database', although they do not formally provide data in a database structure in the strict data-management sense.Instead, the data are released as static data tables of raw measurements or aggregate trait values on jour- tives (e.g.www.markm ybird.org).For example, the VertNet database compiled and harmonized large quantities of vertebrate trait data from collections; the resulting data are published as versioned data tables which are updated as new data sources become available (http://vertn et.org, Park et al., 2013)05; Walls et al., 2012; the Flora Phenotype Ontology,   Hoehndorf et al., 2016), and for specific animal taxa (e.g. the Hymenoptera Anatomy Ontology, Yoder, Miko, Seltmann, Bertone, & Deans, 2010; the Vertebrate Trait Ontology,Park et al., 2013).The UBERON ontology is an integrated cross-species anatomy ontology for all animals, which combines concepts from different existing ontologies, with wide application in biomedical or physiological research . The Plant Trait Ontology (TO) definition of the concept 'seed size' contains references to other globally defined terms: 'A seed morphology trait (TO:0000184) which is the size of a seed (PO:0009010)'.Thus, Comprehensive trait thesauri have been developed in TOP (which is employed in the TRY database, Garnier et al., 2017) and in the Thesaurus for Soil Invertebrate Trait-based Approaches (T-SITA, http://t-sita.cesab.org/,Pey et al., 2014).Ontologies of trait definitions have been developed for plants (e.g. the Plant Ontology, , and different curation strategies and measures for peer-review and community building are employed.To this end, the OBO Foundry is providing a development platform for (biological) ontologies and offers review and quality control(Smith et al., 2007, http://www.obofoundry.org/).
to maximize compatibility.