Four key challenges in the open‐data revolution

Abstract In Focus: Culina, A., Adriaensen, F., Bailey, L. D., et al. (2021) Connecting the data landscape of long‐term ecological studies: The SPI‐Birds data hub. Journal of Animal Ecology, https://doi.org/10.1111/1365‐2656.13388. Long‐term, individual‐based datasets have been at the core of many key discoveries in ecology, and calls for the collection, curation and release of these kinds of ecological data are contributing to a flourishing open‐data revolution in ecology. Birds, in particular, have been the focus of international research for decades, resulting in a number of uniquely long‐term studies, but accessing these datasets has been historically challenging. Culina et al. (2021) introduce an online repository of individual‐level, long‐term bird records with ancillary data (e.g. genetics), which will enable key ecological questions to be answered on a global scale. As well as these opportunities, however, we argue that the ongoing open‐data revolution comes with four key challenges relating to the (1) harmonisation of, (2) biases in, (3) expertise in and (4) communication of, open ecological data. Here, we discuss these challenges and how key efforts such as those by Culina et al. are using FAIR (Findable, Accessible, Interoperable and Reproducible) principles to overcome them. The open‐data revolution will undoubtedly reshape our understanding of ecology, but with it the ecological community has a responsibility to ensure this revolution is ethical and effective.

Calls to arms to ecologists for a more biogeographically representative, longer-term, open-access body of biodiversity data are not new. However, these calls have become more prominent in recent years (Mills et al., 2015;Wilson, 2017). Recognition of the importance of open-access data and reproducible research pipelines in ecology has led multiple funding agencies (e.g. NERC, NSF, ARC) and journal publishers, including the British Ecological Society (2016), to 'strongly suggest' in the first instance, and later to mandatorily require for published research to be FAIR (Wilkinson et al., 2016): Findable, Accessible, Interoperable (i.e. data can interact with other data and workflows) and Reusable. Precipitated by this new research model, but also by ecologists' ethos regarding open access (Gallagher et al., 2020), volumes of ecologically relevant data are being amassed and subsequently released; these titanic efforts continue despite the glaring lack of funding support in most countries (Farley et al., 2018;Hampton et al., 2013).
Despite the great progress made in the last decade in open data in ecology, one should not get too comfortable. The open, big data landscape that is starting to emerge in ecology brings new challenges that may test more traditional ecological mindsets (Hampton et al., 2013). Here, we discuss four of these challenges, namely regarding (1) harmonisation, (2) biases, (3) expertise and (4)

| CHALLENG E . HARMONIS ING OPEN DATA
Different datasets, even when collected strictly within the same sub-field of ecology (e.g. animal population ecology), can differ vastly. For instance, ornithologists refer to the term 'recruitment' as the age at which an individual first reproduces (J.D. Lebreton, pers. comm. 2015;B. Sheldon, pers. comm. 2021), whereas plant ecologists refer to it as the germination of a seedling (Harper, 1977).
Thus, it is strongly advised to harmonise (i.e. standardise and homogenise) data from various sources, or databases that house data from different researchers and sub-disciplines, before the proposed analyses are conducted (Nadrowski et al., 2013) so that they are Interoperable and Reusable. Culina et al. (2021) navigate this through an interoperability pipeline and develop standardised formats for data such as the breeding season in the SPI-bird data hub ( Figure 1). Furthermore, database curators invest significant efforts and time harmonising data and complementing them with metadata, as well as creating thesaurus to help users navigate their rich platforms (e.g. Garnier et al., 2017;Pey et al., 2014). However, sometimes the information detailed in the original sources, such as MSc/PhD thesis, grey reports, peer-review publications in different languages, does not allow for this task to be performed satisfactorily. When the harmonisation of data is incomplete, users of databases may benefit from the warnings and errors identified by database curators. For instance, in SPI-bird (Culina et al., 2021), there are standard quality checks, and warnings are explicitly noted as values that are uncommon or unusual, while 'likely errors' are flagged as seemingly impossible values. It is important to note that the ultimate responsibility to correctly conduct an analysis with open-access ecological databases remains with the user. Just because one can run an analysis with all the data at one's disposal, it does not mean one should do so.

| CHALLENG E . B IA S E S IN OPEN DATA
Naturally, the search for broad global patterns in ecology will only be as robust as the data that analyses are based on. Many global ecological datasets are taxonomically biased towards mammals and birds. In the case of long-term animal ecology datasets, a significant proportion of studies and databases are well represented primarily in areas of the planet with low biodiversity (Titley et al., 2017), or in areas that are actually least vulnerable to F I G U R E 1 Four key challenges in the era of open data in ecology, and how the SPI-Birds database (Culina et al., 2021) has developed an effective platform to navigate these challenges climate change (Paniw et al., 2021). Likewise, most terrestrial biodiversity is found in countries with low GDP, for which fewer data exist relative to countries with higher GDP. Like many initiatives, Culina et al. (2021) also display these geographic biases. Instead, the present study takes the approach of creating a framework and standards for the 'well-defined' community (primarily in northern Europe) that acts as a platform for global efforts (Figure 1).
We propose several ways that ecologists may navigate this challenge. As a minimum, ecologists using open-access data in ecology to search for global patterns must be aware of (and account for, where possible) geographic and taxonomic biases, contextualising findings rather than making blanket statements about findings occurring 'worldwide'. Likewise, phylogenetic approaches offer numerous tools to impute missing data following patterns of phylogenetic inertia-but one needs to be aware of which tools fit the job better (Gallagher et al., 2020). Finally, cross-matching algorithms to improve the overlap of interoperable databases can drastically increase the analytical power (Pennell et al., 2016). Ultimately however, greater international efforts are needed to increase the coverage of global biodiversity data in under-represented countries. In this regard, the application of conservation prioritisation in data-poor countries to expedite ecological data collection is a promising avenue of progress (El-Gabbas et al., 2020;Kujala et al., 2018). Furthermore, the development of lasting partnerships between researchers in high-income and low-income countries to build capacity is required to even biases in ecological data archiving (Donhauser & Shaw, 2019).

| CHALLENG E . E XPERTIS E IN OPEN DATA
There is also a need to acquire the necessary expertise in the field to harness the full potential of the data. The multitude of records made available by, in this case, the SPI-Bird data hub contain great potential. However, the large volumes of data cannot be a substitute for the invaluable ornithological expertise of the researchers who collected the data, nor the quantitative skills of researchers used to analyse them. Unfortunately, this kind of expertise also tends to be geographically clustered in countries with high GDP

| CHALLENG E 4. IMPROVING COMMUNI C ATI ON OF THE OPEN -DATA COMMUNIT Y
The era of big data in ecology is being support by a community composed of-at least-four different entities: data contributors, data curators, funding agencies and journals/societies. These entities risk failure of the whole enterprise if they do not adequately engage with each another. As such, communication and trust between them is critical. For instance, one of the main reasons that researchers may choose not to share data and contribute them to open-access databases is the risk of being scooped (Laine, 2017).
This reticence to share data can prevail even though research has shown that researchers who publish second still end up getting a substantial portion of the recognition (Callaway, 2019). A way that open-data curators can support data contributors to overcome this initial concern is by offering an embargo period (something that we do in COMPADRE and COMADRE, but of which <1% of contributors request), or the possibility of making their data accessible (not open access) on the condition that they be offered co-authorship. (Culina et al., 2021) partly follow the latter model, but with a minimal percentage of their total data ( Figure 1). As a minimal requirement, SPI-Birds users must explicitly acknowledge any data owners and funding sources of the raw data (stored in meta-data).

SPI-Birds
This not only improves communication in the community but also makes the raw data more findable in the future.
Database curators should make sure that credit be placed where it is due. Requesting that the original paper introducing a given database be cited when the database is used seems logical. However, what is even more logical-as well as fair and F.A.I.R.-is to request the individual contributing authors be cited too. This action to ensure appropriate accreditation may be hard to implement due to (1) the lack of database infrastructure to replicate a subset of citations in the final analysis and/or (2) the lack of space in journal prints to accommodate the potentially hundreds of the citations. For the former, some databases have already developed the functionality to provide database users with a citation summary of the data they have downloaded. For the latter, the move by many journals and societies from printed version to online only means that price-per-page is no longer a limitation to citation counts (Fox et al., 2016). In this way, data contributing researchers can benefit from other users utilising their data.

| FINAL REMARK S
Noah's ecological data ark is beginning to get crowded. However, ecologists, data curators, funding agencies, journals and ecological societies need to adapt their mindsets, infrastructures and approaches to fill this ark faster, with fewer biases, and more efficiently. A more coordinated effort between data contributors, curators, users, journals and societies will result in much-needed interoperability. Culina et al. (2021) is a testament to a new way of interaction, one that promotes FAIR principles to overcome these challenges and actively promotes international collaboration.
Ultimately, the inherent value of SPI-Birds (Culina et al., 2021) will grow exponentially when considered in conjunction with, for instance, the long-term trends of insects on which birds depend (via InsectChange;Van Klink et al., 2021), human influence (via the Human Footprint Database; Venter et al., 2016) and climatic patterns (via CHELSA; Karger et al., 2017). The promise of big, open-access data in ecology is huge. We must endeavour, as a community, to deliver it.

ACK N OWLED G EM ENTS
We are grateful to the thousands of animal ecologists who routinely

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.