“Be sustainable”: EOSC‐Life recommendations for implementation of FAIR principles in life science data handling

The main goals and challenges for the life science communities in the Open Science framework are to increase reuse and sustainability of data resources, software tools, and workflows, especially in large‐scale data‐driven research and computational analyses. Here, we present key findings, procedures, effective measures and recommendations for generating and establishing sustainable life science resources based on the collaborative, cross‐disciplinary work done within the EOSC‐Life (European Open Science Cloud for Life Sciences) consortium. Bringing together 13 European life science research infrastructures, it has laid the foundation for an open, digital space to support biological and medical research. Using lessons learned from 27 selected projects, we describe the organisational, technical, financial and legal/ethical challenges that represent the main barriers to sustainability in the life sciences. We show how EOSC‐Life provides a model for sustainable data management according to FAIR (findability, accessibility, interoperability, and reusability) principles, including solutions for sensitive‐ and industry‐related resources, by means of cross‐disciplinary training and best practices sharing. Finally, we illustrate how data harmonisation and collaborative work facilitate interoperability of tools, data, solutions and lead to a better understanding of concepts, semantics and functionalities in the life sciences.


L
ife Science (LS) communities cover multiple scientific domains and carry out a diversity of research, from basic biological studies to applied epidemiological and environmental investigations.This breadth is evidenced by the key role played by LS communities during the COVID-19 pandemic, ranging from fundamental studies of the SARS-CoV-2 virus, the discovery of new therapies and the development of novel vaccines, to establishing and validating methods for contact tracing and wastewater surveillance.The COVID-19 pandemic demonstrated that LS communities can provide high-quality, reliable data for reuse by the wider scientific community (see recommendations from Research Data Alliance (RDA) COVID-19 Working Group, 2020).LS communities around the world have, however, voiced the need to improve the sharing of, access to and, ultimately, reuse of data resources in a FAIR (Findable, Accessible, Interoperable and Reusable; Wilkinson et al, 2016) manner.Increased data sharing and accessibility would: (i) enhance the value of results generated within a specific domain and expand their utility by integrating data from related communities; (ii) provide a vital data substrate that supports nonhypothesis-driven scientific discovery and advances through the application of machine-based methods and (iii) inform decision making by policymakers (e.g.funders, politicians) and industrial partners.
To facilitate research, in addition to FAIR data, there is a need for FAIR research software (Lamprecht et al, 2020; Barker et al, 2022;Chue Hong et al, 2022).Software is used in almost all areas of science and must be sustainable to guarantee the reproducibility and reusability of the results, data and analyses it generates.This requires longlived, executable software, which in turn necessitates funds, typically only available through fixed-term project calls that value novelty above utility.This problem has, however, been recognised by funders (Strasser et al, 2022) and has led to the Amsterdam Declaration on Funding Research Software Sustainability that aims to change comprehensively the way funders deal with research software (https://zenodo.org/records/8325436).
EOSC-Life is a European project funded under Horizon 2020.It brings together 13 LS Research Infrastructures (RIs) from the European Strategy Forum on Research Infrastructures (ESFRI) to create an open, digital and collaborative space for biological and medical research.The project publishes FAIR data and a catalogue of services provided by participating RIs that enable the management, storage and reuse of data in the European Open Science Cloud (EOSC; preprint: Appleton et al, 2020).The project is framed by a data management plan describing the minimum requirements necessary to initiate research data sharing (Blomberg et al, 2020 in Appendix 1).The project participants, however, have recognised that merely using this tool will not ensure the long-term and large-scale use of scientific data.EOSC-Life has identified organisational, technical, financial and legal/ethical challenges that represent the main barriers to effective sustainability (definitions in Box 1).
The organisational challenges are associated with the non-aligned impact and reward mechanisms operating within academic organisations.Here, pursuing novelty and following new trends are more likely to result in positive funding or tenure decisions compared to investments in communityoriented collaborative initiatives addressing long-term structural issues.The technical challenges are related to the need to promote the effective, widespread awareness and implementation of FAIR components, including the need to address sustainability dependencies arising from operating systems, versioning, identifiers, metadata schemata, vocabularies and provenance.These components are often inconsistently applied, hindering interoperability and reducing opportunities for reuse of both data and software.The financial challenges arise from the proliferation of services and LS data repositories (between 2020 and 2023, for example, the Nucleic Acids Research (https:// www.oxfordjournals.org/nar/database/c/)Database issue reported 267 new ones) and the increasing size and complexity of datasets and services.Consequently, the resources available to perform basic operations, including curation, storage, computing and access, may be insufficient.Ethical and Legal challenges include the need to respect intellectual property rights and comply with the GDPR (General Data Protection Regulation; https:// gdpr-info.eu/) in a fragmented international legal landscape.How these challenges are to be overcome when a project ends is rarely addressed, leading to a reluctance to reuse resources once projects' primary objectives have been met.
In this paper, we describe resources and services, as well as the associated training and knowledge exchange, which have been supported and/or created within the EOSC-Life project.We emphasise sustainability strategies formulated to ensure long-term access to, and reuse of, these resources and services in a world where open-science policies are being developed and where data and software are both shared and widely accessible.As EOSC-Life is part of the wider EOSC ecosystem, we highlight special considerations and twelve key recommendations ([R1-R12]) for resource sustainability.These reflect the scientific diversity of EOSC-Life's constituent communities, as well as external factors, such as the need to align with regulatory requirements related to personal health data.
The radical collaboration framework (McGovern, 2018a;Pickering et al, 2021) was used to assess the sustainability of EOSC-Life outputs.We subsequently operationalised the methodology while developing this manuscript (Appendix Supplementary Information S1).
Box 1. Definitions for sustainability types addressed in this paper.i Organisational sustainability: (i) organisations create and support data, as well as services to maintain that support and (ii) organisations continue to support their members who are making the efforts to produce FAIR data, services, software, etc. ii Technical sustainability: metadata, data, software, workflows, AAI (Authentication and Authorization Infrastructure), etc., need to be future proof and flexible enough to continue to be used even as technologies evolve.iii Financial sustainability: the financial resources to provide for the human resources and the storage and operational space (cloud infrastructure, supercomputing, computer storage with IT support).iv Sustainability through continued re-use: data, tools and software continue to be used because they can be found, accessed, operated on and re-used, and so do not grow stale.Also, building on existing solutions, i.e. established user-bases, as opposed to redundant innovation.v Sustainability through training: tools and software can continue to be used because they are actively disseminated and explained through training, and where the training serves also to identify the new needs of old and new users in emerging communities and emerging scientific domains.

Sustainability essentials, tools and challenges
PART-1: community components: "humans and data"

Sustainability and governance
Governance concerns the responsibilities for managing research data and tools over the long term, in compliance with applicable regulations, as well as the allocation of suitable resources for data availability.Governance structures that engage research communities to promote best practices consistently are essential to ensure long-term accessibility to research outputs [R8].Here, researchfunding agencies play essential roles in catalysing best practices (Jahn et al, 2023 in Appendix 1).Many of them now require all data to be shared or deposited in specific repositories that apply FAIR and TRUST (Transparency, Responsibility, User-Focus, Sustainability, Technology-Lin et al, 2020) principles.Best practices resulting from consultation with relevant user communities in governing data access, and resource sharing, should be an explicit part of operational management and risk mitigation (including cybersecurity) of research organisations, and of guidelines for infrastructures managing long-term sustainable access to data.These aspects have been highlighted in several high-level reports by, e.g. the Organisation for Economic Co-operation and Development (https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0463), G7, the Group of Senior Officials on global RIs (http://www.gsogri.org/)and the EC via ESFRI (https:// www.esfri.eu/sites/default/files/u4/ESFRI_SCRIPTA_VOL2_web.pdf).
In this context, the core goal of EOSC-Life is to facilitate data interoperability: to allow the interoperation and analysis of data across disciplines, whilst conforming to necessary governance and consent procedures.Data and metadata resources gain added value when they are curated by experts [R1], well annotated, processed in sophisticated ways, and integrated into or linked to other datasets.For example, the value of sequencing data is enhanced when they are linked to the corresponding organism-level functional data, allowing the connection to be made between genotype and phenotype e.g. for crop performance or human diseases.Such an approach is illustrated by the National Human Genome Research Institute-European Bioinformatics Institute catalogue of human genome-wide association studies (https://www.ebi.ac.uk/gwas/;Sollis et al, Box 2. EOSC and connected communities. The ambition of the EOSC is to develop an "open multi-disciplinary environment where researchers can publish, find and re-use data, tools and services, thus enabling them to conduct their work better".It builds on existing infrastructure and services in a federated "system of systems" approach.In order to link EOSC with efforts of the European Strategy Forum on Research Infrastructures (ESFRI) and other key European RIs, five Science Cluster projects were launched in 2019 and are structuring a large part of the EOSC research landscape (Lamanna et al, 2021 in Appendix 1): i ENVRI-FAIR (https://envri.eu/home-envri-fair/;environmental sciences) ii EOSC-Life (https://www.eosc-life.eu/;life sciences) iii ESCAPE (https://projectescape.eu/; astronomy, astroparticle and particle physics) iv PaNOSC (https://www.panosc.eu/;photon and neutron sciences) v SSHOC (https://sshopencloud.eu/;social sciences and humanities) The science cluster projects have built on long-standing interactions between the RIs in the different scientific domains and aim to improve researchers' access to data, tools and resources, as well as FAIR data management practices.Each science cluster project addresses domainspecific requirements for linking their data resources to EOSC, but all of them also consider intra-domain interoperability and alignment.2023).To facilitate such data integration over time, and therefore promote sustainability through re-use; however, interoperability between more disparate data types needs to be developed.Adequate data management is a key component of this interoperability.Minimising information loss and creating an auditable trail ensures data can be trusted.As interoperability also relies on the granting of permission to access and re-use available data, it is essential that research outputs are accompanied by clear licencing information, and for the licences used to be as permissive as possible [R11].In addition, the procedures and standards used to achieve added value must be clearly documented and made transparent to stakeholders, as was reported, for example, by the plant science community for the "MIAPPE" Sustainability at multiple scales Smaller teams and projects from specialist communities, such as those integrated through EOSC-Life Open Calls, often lack access to expertise in FAIR data management, in setting up sustainable hosting resources, or in implementing interoperability using available standards.This means newly established resources risk being abandoned after projects are completed.To address this, the EOSC-Life open call process was structured to demonstrate first the utility of its support model for sustainable FAIR data management [R3] and then to explore more complex needs, including those of sensitive-and industry-related data resources.Project applicants were asked, as part of the project design and implementation plan, to identify outcomes that would need to be sustained, and relevant domain experts, and then provided support and guidance [R1].The sustainability of outcomes was analysed at the end of the project, in an iterative lessons-learnt process, using "World-Cafe (https:// theworldcafe.com/key-concepts-resources/world-cafe-method/)" methodology.Supported by EOSC-Life publications [R2] and training materials, many user projects contributed to improving the quality, functionality and scope of existing resources and created new resources, such as data repositories and standards.Significant impacts are described in Appendix Supplementary Information S2.Overall, this process highlighted the benefit of increasing awareness of existing sustainability solutions, and of training as a path to promote sustainability in small projects.In a fragmented environment of highly distributed data sources, both sustainability and data interoperability can be greatly enhanced by data integration into data warehouses.EOSC-Life supported the initiation of large, centralised platforms, showing how these could be quickly built to meet urgent community and societal needs [R6] e.g. the COVID-19 Data Portal    interoperability services portable to increase their sustainability.

Sustainability of sharing and shared (meta)data
Data and software must be archived in online portals along with rich and semantically annotated metadata to be findable and usable over the long term.Sustainability through re-use requires that the data contain, or be intimately linked to, their own metadata to ensure that the data are useful and understandable, even after they have been downloaded from the portal

Curation of FAIR data resources and collections
The constant curation of data resources ensures the quality, accuracy and depth of data, improves their reuse and builds trust in those data resources [R7].This requires funds to sustain the expert interpretation and classification of vast amounts of information (Bourne et al, 2015;Karp, 2016;Chen et al, 2020).Expert curation is essential for the provision and dissemination of high-quality, accurate data and the associated metadata, which can be productively used for specific research applications.(Parkinson et al, 2021(Parkinson et al, , 2022 in Appendix 1).For instance, the PDB-REDO resource (EOSC-Life Open Call project) exploits collections of curated crystallographic datasets to refine, rebuild and validate structural models of biomolecules automatically (Joosten et al, 2014).An extensive collection of open standards and data portals is now available to LS communities.To facilitate discovery and use, these resources are registered in FAIRsharing (https://fairsharing.org/).FAIRsharing collection and EOSC Embedded in EOSC, and recommended by funders and publishers, FAIRsharing is a curated, informative and educational resource for data and metadata standards that is interrelated to databases and data policies and extends across all disciplines (Sansone et al, 2019).FAIRsharing encourages users to discover, select and exploit these resources with confidence.It also encourages producers to make their resources more sustainable and discoverable, so that they will be more widely adopted and cited.In the context of the sustainable findability of EOSC-Life aligned outputs, a dedicated collection of its 133 data resources are richly described.The descriptors used are also served by FAIRsharing in a machine-readable form to feed the information into EOSC portals and other tools, such as the OpenAIRE Graph (Appendix Supplementary Information S9).Many of the descriptors themselves (e.g.life cycle status, links to sustainability documentation and relationships to the organisations who fund and maintain them) are also key to helping both EOSC and the wider research community assess the sustainability of the resources curated within FAIRsharing.
Box 4. Key aspects for ensuring sustainable provenance (Traceability of Data).
Provenance information is always provided via metadata (who, what, where, when and how) and often consists of heterogeneous data, such as software logs, workflow input files and Standard Operating Procedures.As such, the steps that need to be taken to ensure the sustainability of the provision and the existence of the (meta)data apply equally to the provenance (meta)data.To achieve this sustainability, the following are needed and, in turn, should themselves be sustainable resources: i Standards, e.g. for metadata and data formats, vocabularies, provenance and workflow management and security.ii Tools for accessing provenance information, such as the LS AAI, OLS and FAIRsharing.
iii Suitable and continually relevant access technologies and methodologies for human and machine consumers.iv Accessibility via online archives/portals; the adoption of policies concerning how long provenance (meta)data are stored and shared and how long the necessary AAI (especially for sensitive information) is provided, updated and versioned.v Appropriate cryptographic techniques, e.g.hashes and digital signatures, used in combination with security policies.
(https://cordis.europa.eu/project/id/101094287), is planned.The model also serves as a conceptual foundation beyond open science communities as it is being used for the proprietary ISO 23494 (https://www.iso.org/standard/80715.html)provenance standard series, currently being developed to provide a standard sustainability framework for the biotechnology industry.
The same principles apply to computational work, where the use of modern workflow management systems provides a "retrospective provenance": the detailed record of the implementation of a computational workflow with information related to every executed process and the execution environment used (Khan et al, 2019).In this context, the FAIRification of software, which is often initially conceived as a stand-alone tool, would benefit from the software also being designed for inclusion in workflows (Brack et al, 2022).The goal is to develop computational workflows that can be easily used with modern workflow management systems, as well as be retrieved and deployed seamlessly from registries such as the WorkflowHub (Goble et al, 2021(Goble et al, , 2022 in Appendix 1), while remaining, evolving and being curated in their home repositories [R7].Workflows become more sustainable thanks to their portability across hosts and the reduced deployment overhead for new users that such workflow management systems provide.Two main EOSC-Life contributions to the management of provenance are the systematic use of RO-Crate (Soiland-Reyes et al, 2022) and LifeMonitor [R9].The latter service complements the WorkflowHub by ensuring the correct operation of the workflow over time, through monitoring and then triggering automated workflow tests.
Traceable legal requirements from the data Data that were collected from physical sources, e.g.human subjects, lab animals or field samples, for which legal and policy steps of any type were required, are not by definition usable, unless these steps are documented.In different cases, sustainable access to the original or derived data must be ensured.Proof of compliance with legal requirements is part of the provenance of data [R5].Clear governance approaches for data use are needed to ensure that the investment in data generation is matched by sustained usage and to ensure that relevant laws are adhered to [R8].For controlled access data, such as human genetic data, GA4GH standards can be referenced to annotate each data use case and provide a machine-readable terminology (Lawson et al, 2021) that can be implemented by multiple resources.EOSC-Life has contributed to the development of these standards, as well as operationalisation of the standards, e.g. through development of the LS AAI system to support machine readable access protocols ("GA4GH passport and visas", Cabili et al, 2021).
Sustained usage and FAIR implementation require machine-readable (meta)data and software licences.We have found that the more open the licence, the more likely that the data or software will be reused.EOSC-Life has promoted technical sustainability by using permissive licences (https:// fossa.com/blog/all-about-permissive-licenses/)and open-source codebases, (https://pncn mnp.github.io/blogs/oss-guide.html) which can be reused and receive contributions from a wide community.

Role of cloud providers
In the past, widely accessible services were designed and deployed from a central location.As cloud deployment costs have decreased, and because some datasets/data resources are deployed in industrial and secure settings, the interoperability services must also be portable.This offers software scientists considerable benefits, as portable resources are more agile to be developed with multi-site teams [R6], and their deployment is often simpler, resulting in a reduction in the overall effort to sustain a service.
To enable the deployment of LS workflows in the EOSC, EOSC-Life relies on a technology stack that standardises the software installation process to ensure the reproducibility of computational workflows and automates software deployment in cloud environments.The key to community adoption has been the availability of free and publicly accessible services and technologies [R11].Integrating such free-to-use services and technologies has made the research process more efficient, reproducible and collaborative, but has also resulted in some sustainability and reproducibility risks.Currently, components of the computational ecosystem rely on services from commercial entities.For example, we estimate that GitHub (https://github.com/)provided services to the bioinformatics community in 2021 with a value of over $1 million.Similarly, quay.io(https://quay.io/)provides services with an estimated $500,000/year in value for BioContainers (https://biocontainers.pro/).Communities work on the assumption that the conditions under which these technologies and services are provided will remain compatible with the research requirements and the companies' abilities, e.g. that they will remain free of charge and freely accessible, and have adequate performance characteristics.This may not always be true (Box 5).

Sensitive data challenges
Across research domains, sensitive data present challenges related to sustainability, cross-domain categorisation and discovery.Sharing sensitive data within EOSC-Life is particularly challenging when they need to Box 5. Code and software management.
Already, over the project's lifespan, we have seen changes in and restrictions to some widely used services.To highlight a few: i TravisCI (https://www.travis-ci.com/): a popular continuous integration system, initially used by the LifeMonitor, switching to a paid model.ii DockerHub (https://hub.docker.com/): a well-established repository of container images, which introduced limits to the number of containers that can be pulled without subscription.iii Conda (https://docs.conda.io/en/latest/):a popular open source package management and environment management system which changed the licence terms of its default channel usage.
be made available to third parties not contractually bound to the original data controller.The sensitivity of the data may arise not only from their personal nature but could also originate from intellectual property considerations, biohazard concerns or compliance with the Nagoya Protocol (https:// www.cbd.int/abs/To speed up the evolution of the RIs, and the integration of user projects, we employed consultative orientation meetings and hackathons, including the cross-RI FAIR Hackathon for Training on Demonstrators, and Open and Internal Calls project teams.This consortium-wide training was provided to assess and improve the FAIRness of the projects, orient the teams based on available examples, share knowledge on solutions and resources provided by EOSC-Life experts [R1] and RIs communities.Overall, the initiatives provided fruitful opportunities and platforms for goal-oriented, hands-on teaching, introducing EOSC-Life topics to consortium teams.Such interactive platforms for consortium-wide exchange facilitated further networking and collaboration among experts from the EOSC-Life community and strengthened capability building in new projects (Example in Appendix Supplementary Information S10).
This consortium effort in building up knowledge, skills and trust, made possible rapid responses to large-scale scientific, societal, environmental or other challenges of concern, e.g.pandemics similar to the COVID-19 pandemic [R6].A compelling EOSC-Life example for implementation of interdisciplinary training, to address challenges that emerged during the coronavirus pandemic, was the creation of an epidemiology mathematical modelling training course (https://www.eosc-life.eu/news/training-modelling-covid-19-epidemics/), containing content relevant to future pandemic situations.
EOSC-Life's investment in building a community of experts, and in providing training to these experts, has already increased organisational and technical sustainability.It is hoped that experts trained in the EOSC-Life consortium will continue working in RIs, taking on the role of master trainers, disseminating their expertise to others.To promote and strengthen the evolution of the cross-RI expert network and support cross-community activities, EOSC-Life also organised the "EOSC-Life translator training series" [R9], which enables the experts to understand the drivers and challenges in the different RIs and data professions and break down silos [R1].

Technical and operational sustainability
The long-term preservation of data and tools for re-use currently relies on multiple repositories.In such a fragmented landscape, sustainability depends on broadly distributed funding, but also on adequate linkage and/ or cross-referencing between repositories.This is becoming a pressing issue for multimodal datasets, where individual components are split into data-type specific repositories.Examples of this are single-cell omics datasets combining imaging and sequencing, where the sequencing data are submitted to sequence archives, and images are submitted to imaging archives, without necessarily any formal linkage.The wider adoption and implementation of linked data principles (https://www.w3.org/DesignIssues/LinkedData; Bizer et al, 2009) could mitigate this.Fragmentation of a project's data also prevents users from accessing them as a whole and fully understanding them.One way forward could be to build dedicated web applications and API access allowing the visualisation and browsing of projectspecific data on top of distributed data archives.While some research groups already implement this in their projects, such web applications have tended to have a limited lifetime.The use of cloud resources in EOSC-Life has been limited to well-established academic clouds mainly due to associated costs.A sustainability threat due to the use of a commercial cloud arises from, on the one hand, the expertise gap created by outsourcing and, on the other hand, a lack of cost control due to vendor lock-in, which is seen as incompatible with the time limited, fixed budget of most research grants.In the case of sensitive data, the complex legal framework and country derogations means that most institutions are risk averse and researchers refrain from using cloud environments that are not directly under their control.

Financial sustainability
Open-Science allows collaborations to take place without the red tape of negotiating access, intellectual property transfer and detailed assignment of ownership.Thus, Open-Science practices are not only effective in bringing people together around common solutions but are also unparallelled tools for long-term sustainability in a complex landscape of national and international funding sources [R11].
The network of experts across the RIs and LS domains established as a result of EOSC-Life provides a solid foundation for ensuring the sustainability of knowledge, the reusability of EOSC-Life tools and resources and the creation and promotion of EOSC services, increasing adaptability for emerging user needs.These experts, e.g.data stewards who have domain-specific skills, and a firm understanding of Open-Science Cloud solutions, are key to technical and operational sustainability within a FAIR and Open-Science framework (Fig 4).Setting aside the question of the acute lack of qualified personnel, the activities and disponibility of these professionals, with their unique skill sets, can only be sustained by ensuring adequate financial resources.This would allow stable employment and keep cloud infrastructure solutions operational.Sustainability essentials are prerequisites including availability of financial resources, competent experts, technical infrastructures, aligned policies and recognition.These components allow a set of sustainability tools operating towards sustainability to be defined: e.g.via integration of user projects (open calls); community training; dissemination addressing specific needs; creating and expanding the experts networks.The main linked sustainability challenges are securing long-term financing to retain expertise and maintain solutions; engaging institutions and communities while harmonising alignment in common strategies and priorities; and enhancing a reward system that supports re-use of available resources.

EOSC-Life
EOSC-Life does, however, face a general sustainability challenge in that it relies almost completely on EC project funding.It is thus dependent on the integration of its outputs into new EC proposals and initiatives.These are constrained by their own limited lifespans and the need to focus on constant innovation and new developments, rather than on the operation of existing assets.This in turn can have a negative impact on research, as it can be easier to obtain funding to reinvent services and resources, rather than to support the long-term operation of existing valuable resources, and their experts (EOSC TF FinSus, 2022).
While, in the case of EOSC-Life, some resources have already been taken up by other projects, for many of its resources, there is still no immediate follow-up funding available.This means the project partners need to find alternative financial means to sustain the resources, as well as the respective expertise.In essence, the EOSC-LIFE project partners have two options: i A partner organisation takes it on as a core resource or ii the resource becomes the responsibility of a community, i.e. the means to operate, maintain and develop the resource is given through future grants and/or "in kind" contributions from intrinsically motivated research parties/individuals (in an equivalent model to various opensource communities).
It is important to note that, in either model, full responsibility not only includes the continued technical and operational support for the resource but also the curation of the resource, which ensures that the content is correct and up to date.
There is a risk, however, that if no project partner takes on the responsibility, and no immediate effort is made by the community, awareness of an existing tool can be lost.This leads to resource-intensive reinvention of something similar; it may take months, sometimes years before a tool is taken up again.For instance, the project Biotracks (https://github.com/CellMigStandOrg/biotracks) provides a standard format for cell migration files and a series of converters that can be used to convert popular tracking software packages to the Biotracks formats.The project was supported by CORBEL (https:// www.corbel-project.eu;2016-2020).Due to the lack of funding, the project was put on hold.It is only now that the project has been revived as the need for re-use has emerged in the context of Open Microscopy Environment Next-Generation File Formats (https://ngff.openmicroscopy.org;Moore et al, 2021).
To address this issue before it became urgent, the EOSC-Life open calls included a consultation process to advise the many small teams (typically two to five members per participating institution) who were applying.As part of the project submission process, experts from EOSC-Life analysed the needs of each team and provided a roadmap with recommendations to ensure successful project implementation, even if the funding application was unsuccessful.The applicants acknowledged the utility of the consultation process, and it helped several teams improve FAIR data management strategies that could be adopted even in absence of funding (Rybina et al, 2023 in Appendix 1).This process is, however, resource intensive and relies on the availability of experts to provide consultations and the relevant training materials Clearly, reinventing the wheel or developing new solutions inferior to existing ones may not be the best use of funding resources.There are four critical phases of resource development requiring specific financial support: conception, proof of concept development, stabilisation and community adoption with long-term sustainability, dissemination and maintenance [R6].It is challenging to explain to funders and policy makers that the last two are crucially important [R12].But until there is recognition that maintaining existing resources is critical and should be a priority for investments in RIs, maintenance will continue to rely on volunteer efforts and in-kind contributions, solutions that are clearly not sustainable.
Therefore, to guarantee sustainability for successful project outcomes, individual RIs need to engage early on in any project with national and European funders, as well as through existing stakeholder networks such as the LS RI Strategy Board and ERIC Forum

Recommendations
Not all data infrastructures can or should be supported indefinitely.Financial sustainability is achieved when funders (public or private) are willing to prioritise necessary long-term investments, based on the expectation that impact, in terms of science and innovation, can be achieved, but also that societal threats can be tackled, such as pandemics and other disasters.While the COVID-19 pandemic has proven the value of LS data and models for practical assessments and policy decisions, their wider potential still remains largely under-exploited.More efforts are needed to broadcast the many ways in which LS data can be used and to help government representatives, journalists and members of the general public understand LS data.Therefore, new activities may be necessary that not only ensure that these data are sustainable but also that they are impactful and considered as indispensable.All the work and experiences collected during EOSC-Life allow us to propose the following list of recommendations (Fig 5).
BE RECOGNISED: Focus on strong credibility and recognition for the research done:   Even if many of these sustainability recommendations seem to be obvious, the practical understanding and actual implementation may be challenging especially when its importance and overall impact on short and long-term perspectives are poorly understood from the start of a collective community work.The recommendations provided can, however, be explicitly and bidirectionally linked to the FAIR principles and concrete processes related to data and tools handling, sharing and reproducibility.This approach was also taken by the IMI FAIRplus project (https://fairplus-project.

Conclusion
The EOSC has formed "a federated and open multi-disciplinary environment where users can publish, find and re-use data, tools and services for research innovation and educational purposes" (https://www.eosc.eu/sriamar).Through the EOSC-Life project, we set out to create an "EOSC for the Life Sciences", connecting, and where necessary, further developing data resources, analysis tools and services that allow research communities to collaborate across national and thematic borders.A core part of the project and its sustainability strategy was to work in close partnership with the broader life-science community, via open calls for partnerships.This strategy has been successful.As EOSC-Life comes to its end, the services that emerged are being carried forward by a range of applied projects: BY-COVID is consolidating an Open-Science platform for pandemic Box 8. How our recommendations support (and are supported by) sustainable FAIR principles compliance.
Most of the FAIR principles (F1, F2, A2, I1, I2, R1.1, R1.2, R1.3) strongly depend on an initial "agreement", at a minimum at the domain level (Jacobsen et al, 2020).Expertise support [R1] is a pillar for improving the implementation of each principle and for the planning, support and design of training as well as for FAIRness assessment.Additional pillars important for recommendations in the sustainable implementation of the FAIR principles are: the organisation of sustainable community building [R9, R11] and the governance and acknowledgement of community agreement processes [R8, R10].
"Metadata makes FAIR" [R5] is the key component for the application of FAIR principles.Sustainable data sharing and reuse supported by appropriate and sufficient metadata must be on the one hand explained and driven by community experts and data stewards [R1], and, on the other hand, illustrated through real case studies [R3], implemented iteratively through hands-on training [R4] and adapted taking into account specific and arising needs of communities [R9].(Meta)data annotation and curation [R7] are also essential to ensure (i) informative and rich metadata (F2, R1), especially when specific to a particular research area (R1.3), (ii) relevant, interoperable and updated vocabularies (I1, I2) and (iii) cross-domain qualified references to other (meta)data (through interoperable metadata; F3, I3) enhancing semantic harmonisation and convergence and helping build the Web of Linked Data.Communication through publications [R2] helps promote, multiply ways, diversity and opportunities of linking, reuse, and naturally accelerates the understanding of (meta)data.The publication of several types of metadata should be prioritised to allow effective and fruitful dissemination (re: [R2]).This is a central step in the implementation of the FAIR principles (F2, F3, and finally F4).The "metadata indexation" principle (F4) is, in our experience, often omitted or its importance misunderstood.Our recommendations also promote the improvement of the (current) practices around reuse of data and tools: the Openness recommendation [R11] advocates the usage of open licences, preferably CC-0 or CC-BY ones, for ease of reuse (R1.1).Providing provenance metadata and provenance tracking methodologies (R1.2), especially in complex information systems, requires early preparation and curated recording [R7], adequate metadata (F2, R1) and pre-validated (FAIR) interoperable, unambiguous vocabularies (I2).Our recommendation [R6] Be prepared, agile and act timely, also highlights and underlines the importance of applying FAIR principles as early as possible.Even if some of the recommendations may not be directly linked to the FAIR principles (e.g.A1-A1.1,A1.2: open free access protocol, or A2: metadata are accessible even if data are no longer available), systematic FAIR implementation will be encouraged by dissemination of efficient tools and good practices [R2], and well-governed reward mechanisms [R8, R10].The recommendation [R12] could be interpreted as "Treat innovation and FAIR principles independently" and to consider the FAIR implementation as an evident part of sustainability.The FAIRification process, even more through "data sharing" will most likely create new collaborations, especially between cross-disciplines, therefore facilitating and enhancing innovation.Finally, we would like to stress that the implementation of all the FAIR principles are necessary for and will catalyse action on, the 12 Be SUstainable REcommendations "Be SURE".
preparedness; EOSC4Cancer (https://eosc4 cancer.eu/)will adapt several EOSC-Life solutions for use by the Cancer Mission; and EuroScienceGateway (https://galaxyproject. org/projects/esg/) will continue to develop the tools and software ecosystem initially developed in EOSC-Life together with experts from the earth, environmental and physical sciences.
EOSC-Life has thus helped the life sciences to take significant steps towards turning the EOSC vision into a reality, but much work remains to be done.How can our experiences shape the future development of EOSC and the data environment for Europe's LS RIs?Even if EOSC-Life underlines the importance of close partnerships, and open calls are becoming even more challenging to implement than pre-defined use cases in our experience, these calls are critical for developing and advancing practical applications driven by user needs and for promoting scientific discoveries.Future EOSC developments should build on the capacity of wholedomain projects such as EOSC-Life (and the four other European research clusters), which serve as a nucleus for managing data and connecting interdisciplinary communities.In addition to the INFRASERV projects mentioned above, canSERV (https://www.canserv.eu/)connects experimental facilities that support the analysis of biomedical data in cancer.Other projects are promoting the agroecological transition (AgroServ, https://emphasis.plant-phenotyping.eu/european-infrastructures/cluster-projects/agroserv) and supporting AI-powered image analysis methods (AI4Life, https://ai4life.eurobioimaging.eu).All of these projects build on the tools and experiences supported by EOSC-Life, including RO-Crate, ISA tools (https:// isa-tools.org/),FAIRsharing and the FAIR Cookbook.These will continue to populate EOSC with data, workflows and other tools and to sustainably connect different disciplines.
Above all, EOSC-Life has given a legacy in the form of the professional, intra-and interdisciplinary development of competences.The project has helped establish new data management capabilities in several RIs, and by providing training and support helped build skills in user communities.To be successful, EOSC needs to further develop networks of data managers and skilled data analysts throughout the European research community: Open data and Open Science can only generate value when the people are able to make use of the available opportunities.

Disclosure and competing interests statement
The authors declare that they have no conflict of interest.
(https://www.miappe.org/;Minimum Information About a Plant Phenotyping Experiment) standard (Papoutsoglou et al, 2020) [R2].As regulatory and ethical requirements have to be met, harmonising and simplifying data-related legal frameworks across EOSC would also reduce frictions in data access and re-use [R9].

Figure 1 .
Figure 1.Position and organisation of EOSC-Life in the EOSC and LS RIs ecosystems.EOSC-Life is one of five ESFRI Science Cluster projects (the other four clusters are PaNOSC, ENVRI-FAIR, ESCAPE, SSHOC).EOSC brings together 13 LS RIs and domains experts to populate EOSC with LSs FAIR data, tools, resources, harmonised solutions, policies and guidelines and concrete user projects altogether facilitating interoperability in the EOSC (respective interactions and impact are illustrated by arrows).

Figure 2 .
Figure 2. EOSC-Life operational framework, extending from the end user to the EOSC ecosystem.Dashed lines indicate use of and contributions to the framework, while thick arrows indicate process flow within the framework.

Box 6 .
BioImage data analysis training supported by EOSC-Life open calls.The NEUBIAS training school (May 2023; https://www.eosc-life.eu/news/neubias-defragmentationtraining-school/)provides one example of RIs benefiting from the EOSC-Life Training Open Call.This course was the continuation of an initiative that started in 2017, aiming to bring BioImage Data Analysts closer to recent solutions for computing and workflow-based image analysis in the cloud.For the NEUBIAS (https://eubias.org/NEUBIAS/venue/)project, EOSC-Life support has been essential for sustaining, further developing and running this valuable training initiative.Image-based data are ubiquitous across multiple LS-RI domains.Because of the multimodality and the large volumes of data, linkage of image data to publication analysis workflows and the sustainability of access and hosting are key ongoing challenges.Training in this field has been provided regularly through EOSC-Life partners and project leads, including FAIR training from European Molecular Biology Laboratory (EMBL)-EBI (Image Analysis and Machine Learning; https://www.ebi.ac.uk/training/events/microscopy-data-analysis-0/) or FAIR Data training run by national and international consortia, e.g.Euro-BioImaging (https://www.eurobioimaging.eu/news/euro-bioimagings-guide-to-fair-bioimage-data/; Kemmer et al, 2023).Box 7. Sustainability of knowledge through EOSC-Life training assets.By supporting cross-disciplinary sharing of best practices and methodologies, the consortium contributed to sustainability of knowledge across LS communities through: i Training on EOSC-Life and RI cloud-based open access solutions and resources, services and expertise in the dissemination, practical uptake and targeted adoption of tools.ii Training related to Open Science practices and FAIR guidelines, harmonising the requirements across a spectrum of LS disciplines whilst addressing needs in individual branches.iii Establishment of expert groups to provide guidance and consultation to broader communities within the framework of Open Science and FAIR Science, resulting in a network that can continue operating after the project.iv Development of collaborative cross-disciplinary training approaches, modules and materials, as well as formats for effective training seminars and workshops, including the joint creation of best practice solutions for efficient remote training.v Integration of 27 Demonstrators, namely Open and Internal Calls user projects from different disciplines, into EOSC-Life and across the RI landscape by providing consultation and guidance for the teams in the form of advice from EOSC-Life experts and consortium partners.vi Designation and training of the so-called EOSC-Life consortium translators, to create a cohort of professionals that understand the jargon, needs, work culture and drivers of professionals working in other areas of expertise or RIs.vii Creation of a training community with a practical understanding of EOSC but also with conceptual and technical knowledge in different LS domains.
Validating the adaptability of sustainability components developed in EOSC-Life for use in other projects is key to ensuring the project's long-term impact.The recent EC's INFRASERV (https://rea.ec.europa.eu/funding-and-grants/horizon-europe-researchinfrastructures/research-infrastructure-servi ces-support-health-research-accelerate-greenand-digital-transformation_en) programmes represent both a significant challenge to models of sustainability for FAIR LS resources and an opportunity to create more impactful cross-RI initiatives.By focusing on ISIDORe, (https://isidore-project.eu/) which is centred on pandemic preparedness (Richard et al, 2022; David et al, 2023b), to provide an INFRASERV perspective on the topics raised in this paper, we showed the critical importance of improving FAIRness Literacy, setting up incentives, training researchers and providing FAIR expert support for sustainable data management [R1], [R4], [R5], [R10] (Appendix Supplementary Information S11).

Figure 4 .
Figure 4. Key component driving, facilitating and challenging the sustainability process-lessons from the EOSC-Life project.
[R11], to present clearly and repeatedly the outcomes and messages outlined in this paper [R2].

Figure 5 .
Figure 5. EOSC-Life Recommendations for ensuring and facilitating sustainability.Recommendations for ensuring and facilitating sustainability are based on 5 pillars shown here, clockwise from the top: to base the network on experts, and disseminate on a broad-scale; and to demonstrate, to act and plan training and improve metadata; to be prepared to act now, but remain adaptable, and curate data as soon as pos- sible; to strengthen the community with federated governance, harmonised and integrated RIs, and sustainability actions rewarded; to be inclusive and open-minded, and to treat innovation and sustainability independently.
Through open calls, and working with existing RI-managed data resources, EOSC-Life created practical solutions that met the real needs of life scientists.Examples of outcomes, developed or enhanced during EOSC-Life, are listed in the EOSC Association and the RDA Key Exploitable Results report (Nardello et al, 2022 in Appendix 1; Table 1) [R2].
ii Work with scientists who develop computational methods and packageable analysis tools and connect the latter to data.iiiDevelop guidelines and tools to be able to exploit sensitive data from human research participants in a secure manner.

Table 1 .
Key exploitable results from EOSC-Life used by communities (Life Sciences Data Resources & Tools, harmonisation of Access & Policies-in EOSC).
ontologies by RIs and the need to develop Application Programming Interfaces (APIs) for data exchange.As a pilot for interoperable resources, graph databases and knowledge visualisations built around aligned metadata standards were developed for COVID-19 and Monkeypox (Karki et al, 2023) [R5].Technical sustainability was addressed by including outcomes in enduring resources such as the FAIR Cookbook (https://faircookbook.elixir-europe.org;Appendix Supplementary Information S4) and RDMKit (https://rdmkit.elixir-europe.org/; ELIXIR, 2021 in Appendix 1), which integrate findings from multiple academic and industrial projects, provide recipes for FAIRification and tools for Research Data Management (RDM), respectively.These community projects support Open-Science and ensure that access to materials is sustained beyond a single project [R2].EOSC-Life also reached out across communities to promote an inclusive framework [R11] (Fig 2) supporting the interoperability and portability of software tools and computational workflows from different domains, through the use of a common set of technologies, such as software containerisation.Gaps in the FAIRification of computational workflows were addressed by creating the WorkflowHub (https:// workflowhub.eu),a registry for workflows in a highly federated ecosystem of different workflow managers and communityowned repositories (Goble et al, PART-2: technical components developed to support sustainability EOSC-Life sustainable Open-Science toolkit for FAIR RDM services FAIR interoperability services ensure that (meta)data use a formal, accessible, shared and broadly applicable language to represent knowledge, typically relying on vocabularies or ontologies that follow FAIR principles and make qualified references to other (meta) data.They are needed for the re-use of data and tools, but also to address the complexity of the biomedical domain, ranging from the diversity of samples and data-generating technologies to variable granularity of datasets.Ontologies should be accessible, e.g.via the Ontology Lookup Service (https:// www.ebi.ac.uk/ols4;OLS; see Appendix Supplementary Information S5), and the annotations must have web-resolvable Persistent and unique IDentifiers (PIDs), as recommended by the EOSC Association Task Force on PID Policy and Implementation (https://www.eosc.eu/advisory-groups/pidpolicy-implementation)[R5].Mappings between ontologies, vocabulary schema or coding standards are important to ensure the quality and sustainability of the metadata.EOSC-Life implemented interoperability services such as the Ontology Cross Reference Service (https://www.ebi.ac.uk/ spot/oxo) in the form of a suite of semantic services, deployed using cloud principles and technologies (Appendix Supplementary Information S5) [R9].
Box 3. Collection and resource management.Some of the collections described by open standards are provided in support of EOSC-Life Demonstrators and Open Calls projects Delivering early and efficient curation processes that reduce human effort through Weise et al, 2021)was developed as a blueprint that can be deployed, e.g. for educational purposes [R11].OSSDIP clarifies the rather complex underlying concepts and reduces the initial burden of deploying such an infrastructure, giving researchers access to sensitive data in return.