A practical guide to bioimaging research data management in core facilities

Bioimage data are generated in diverse research fields throughout the life and biomedical sciences. Its potential for advancing scientific progress via modern, data‐driven discovery approaches reaches beyond disciplinary borders. To fully exploit this potential, it is necessary to make bioimaging data, in general, multidimensional microscopy images and image series, FAIR, that is, findable, accessible, interoperable and reusable. These FAIR principles for research data management are now widely accepted in the scientific community and have been adopted by funding agencies, policymakers and publishers. To remain competitive and at the forefront of research, implementing the FAIR principles into daily routines is an essential but challenging task for researchers and research infrastructures. Imaging core facilities, well‐established providers of access to imaging equipment and expertise, are in an excellent position to lead this transformation in bioimaging research data management. They are positioned at the intersection of research groups, IT infrastructure providers, the institution´s administration, and microscope vendors. In the frame of German BioImaging – Society for Microscopy and Image Analysis (GerBI‐GMB), cross‐institutional working groups and third‐party funded projects were initiated in recent years to advance the bioimaging community's capability and capacity for FAIR bioimage data management. Here, we provide an imaging‐core‐facility‐centric perspective outlining the experience and current strategies in Germany to facilitate the practical adoption of the FAIR principles closely aligned with the international bioimaging community. We highlight which tools and services are ready to be implemented and what the future directions for FAIR bioimage data have to offer.

FAIR principles for this data type.While we include generally applicable aspects of RDM throughout the text, we mainly refer to 'bioimaging RDM' or 'bioimage data management' in this article.
The above-mentioned experiences have stimulated core facilities to initiate activities to establish bioimaging RDM practices.One of the first examples in Germany is the open discussion group Research Data Management for Microscopy (RDM4mic), focusing on the data management system OMERO (OME Remote Objects). 11,12With support from GerBI-GMB, these activities led to additional projects.The Information Infrastructure for BioImage Data project (I3D:bio, https://www.i3dbio.de)started with a focus on facilitating the implementation of OMERO at universities and research institutions in Germany. 13oreover, more than 20 German institutions formed a consortium dedicated to bioimage data management within the National Research Data Infrastructure (NFDI). 14This consortium, NFDI4BIOIMAGE, aims at fostering new standards for image data handling and metadata curation (https://nfdi4bioimage.de) and is embedded within a network of consortia (https://nfdi.de).These and more projects (e.g., FoundingGIDE, https://cordis.europa.eu/project/id/101130216) aim to align strategies for bioimaging RDM while accounting for local and discipline-specific needs.Based on the experience from these projects, this article • summarises current challenges for bioimaging RDM from the perspective of core facilities (PART I), • provides action items for core facilities to advance their bioimaging RDM capabilities and services (PART II), • invites to collaboratively drive the cultural change toward expanding and acknowledging the role of corefacility-supported RDM and bioimage data stewards in scientific discovery (PART III).

PART I: THE BIOIMAGE DATA LIFE CYCLE
Research data have the potential for scientific discovery beyond its original acquisition purpose when handled according to the FAIR principles. 9,15To structure all F I G U R E 1 The research data life cycle as applied to bioimage data.As outlined in the main text, the researcher-centred life cycle highlights six steps for conducting bioimaging experiments.Core facilities play a role in all steps of the life cycle, starting with planning an experiment.However, we have highlighted four steps in which core facilities are mainly involved as essential partners, as indicated by the colours.From publication to long-term archiving, the responsibilities are increasingly shared with external providers or generic RDM support services, as the gradient indicates.Around the life cycle, tools, systems, and other aspects relevant to researchers and core facilities for applying data management practices are positioned in arbitrary order.Many of these aspects are relevant for multiple life cycle steps, as indicated by the colour code.
relevant RDM topics throughout every step of experimental research, we use the data life cycle concept applied to bioimaging data. 14,16,17We highlight issues commonly experienced in core facilities and describe our view on present and potential future roles for core facilities in bioimaging RDM.We exemplify tools and describe future directions for data handling.
(i) The first step of a typical bioimage data life cycle is planning.When FAIR image data are openly accessible in public repositories, searching and reusing existing image data can be considered at this step next to planning the following ones: (ii) data acquisition, (iii) data storage and organisation, (iv) data processing and analysis, (v) publication and (vi) data archiving (Figure 1).9][20][21][22] Usually, core facility staff helps with instrument choice.As soon as new bioimage data are to be acquired, they advise users how data should be handled and moved from their qualitycontrolled acquisition (2.2, image acquisition) to storage location(s) (2.3, storage and organisation) over processing and analysis (2.4, image analysis), to archiving or public deposition (2.5, publication, and 2.6, archiving).potential issues and relevant stakeholder interactions.Problems are averted if the core facility is consulted upfront, which oftentimes does not happen, either because users underestimate the complexity of the techniques and the hurdles to be taken to generate reliable results 23 or simply because the core facility's capacity may be limited to devote enough attention and resources to the project. 5Scientists with little bioimaging experience might insufficiently consider the consequences of the suboptimal choice of equipment, instrument settings and sources of bias. 24,25These factors influence whether an image accurately represents the biological phenomenon and the necessary workload from image acquisition to publicationready results.Improvements in planning and preparation avoid acquiring excessive and unusable data.Additionally, when users do not consider the requirements for image analysis before acquisition, image data sets are likely to be difficult or unsuitable to process for technical reasons. 26,271.1 Role of a core facility for experiment planning Experienced core facility staff can help to acquire highquality imaging data and avoid storage-intensive imaging modalities when they are not needed, 28 for example, favouring a high-content plate-based acquisition over repeated acquisitions with classical slides.Core facility staff can help identify and overcome technical feasibility concerns, 2 for example, by guided instrument choice after per-project consultation and by estimating required storage volumes (2.2, data acquisition).Ideally, users draft a data management plan (DMP). 5,29However, researchers and core facility staff seem to have limited usage of DMPs in practice. 7,30DMP templates that have been approved or provided by the core facility, if available within an institutional DMP tool, can guide scientists to questions regarding image data handling appropriate to their local IT environment.Core facility staff are familiar with the institutional resources for bioimaging, including acquisition machines, IT resources for storage and data transfer, workstations for processing and analysis and services by academic libraries such as providing persistent identifiers (PIDs).Thus, investing in DMP templates can help structure project consultations, alleviating the workload in the long run and taking the prior experience of the researcher planning an experiment into account.Writing a DMP does not mean that everything is fixed.A DMP is regarded as a living document adapted as necessary in a transparent manner. 31Tools for writing a DMP can be found online (https://rdmkit.elixir-europe.org/data_management_plan).
Core facility staff can teach researchers about the potential for data reuse.3][34] Preceding own experiments, researchers can search for data with similarities to their own project and suitable bioimage analysis tools to test one's own analysis workflows (2.4,image analysis).Searching data in repositories and archives also educates about the importance of metadata for finding and retrieving data after publication.Both technical and biological metadata documentation can be planned before an experiment. 35,36Core facility staff are experts in technical metadata, but it is out of the core facility's scope to provide all research domain-specific metadata standards.However, users can learn from core facility staff about community-established guidelines.Highlighted examples are the Recommended Metadata for Biological Images (REMBI), 37 the NBO tiered guidelines 38 and recommendations for certain fields and modalities like MITI for multiplexed tissue imaging, 39 MIHCSME for high-content screening, 40 3D-MMS for 3D volume microscopy 41 or the Brain Imaging Data Structure (BIDS). 424][45] Tabular metadata collection formats are often used. 7,40Optionally, core facilities can enforce metadata enrichment, especially for large data otherwise hard to manage.A mandatory metadata checklist could become part of the core facility's usage policy, for example, as recommended for high-content screening (HCS) data within the Dutch bioimaging network. 40s data management planning and data handling best practices are accumulated locally over time, they can become a useful resource for new facility users.Due to their mostly well-established cross-institutional networks, core facility staff can also guide towards third-party resources where needed (2.3, storage and organisation, and 2.4, image analysis).Many federated resources exist that researchers may not be initially aware of.Some examples from the European area are the European life science infrastructure ELIXIR, the German Network for Bioinformatics (de.NBI), and the developing European Open Science Cloud (EOSC).Core facility professionals disseminate new and updated community-approved practices, leverage peer-to-peer support or moderate help requests in forums like https://forum.image.sc.Table 1 summarises important aspects of data management planning.

Data acquisition
Ideally, when users collect data, they have planned their experiment enough to ensure the data are comprehensive TA B L E 1 Common issues and solutions regarding bioimage data management planning.

Common issue Proposed solution
Underestimating the complexity of bioimage data handling Consulting and taking advice from core facility staff.Using DMP templates that have been approved or provided by core facilities.
Reusing existing data is not considered Data search and review in public repositories and institutional archives can become part of experiment planning.Reusing data from repositories can help users familiarise themselves with the data types and test workflows before acquiring their own data.Suitable archives can be found on re3data.org or on FAIRsharing.org.
Bioimaging-specific archives include, for example, the BioImage Archive (BIA) and the Image Data Resource (IDR).Institutional archives may be another data source.

Lack of adaptation strategies for changing demands throughout the experiment
Writing a data management plan and using it as a living document to adapt where required, and record changes.

Suboptimal instrument choice
User training and guidance on instrument choice and setup, as well as recommendations on sample preparation.Consulting the core facility and taking advice on instrument and software choice.
Image analysis and data processing are not planned prior to the acquisition Consulting with core facility staff, image analysts and data stewards before starting a new project.Searching example data and protocols that have established the desired procedure previously to test image analysis a priori.Planning suitable data structures and metadata for analysis and running a pilot experiment before adjusting and scaling up.
The data volumes, storage space and data transfer requirements are not considered before the acquisition Gathering comprehensive information about typical image sizes of the core facility's instruments.Highlight limits to memory load and data transfer with large files.Estimating expected data volumes and required storage and ensuring resource availability.

The importance of metadata collection is overlooked
Preparing a metadata annotation checklist for the data even before acquisition, for example, based on REMBI.Optionally, enforce annotation as part of a usage policy.
Lack of a strategy for the publication of image data Considering data publication, candidate public repositories, and reviewing submission guidelines.Defining authorship and licensing criteria with collaboration partners early in the project.
and reliable by recording all the relevant (meta)data that describe the experiment and the (biological) sample.
The complexity of this task may not be fully appreciated upfront, for example, when applying an advanced microscopy technique to an unknown sample for the first time.Reliability presupposes valid quality measures to ensure an appropriate instrument setup and calibration, for which most researchers rely on their core facility.Some scientists state that they would not share and reuse data because they do not trust the quality of a dataset they did not acquire. 46While not standard practice today, a record of the quality measures (e.g., a QC and calibration protocol) would optimally be linked via a persistent identifier (e.g., a DOI).Most commercial acquisition software records technical metadata in nonstandard formats. 47ue to proprietary file formats, this metadata -like the measurement data itself -is often only accessible with vendor-specific software or through file format conversion or translation.However, the translation fidelity differs between file formats, and the maintenance of libraries such as Bio-Formats does not scale with the variety of proprietary formats (https://www.openmicroscopy.org/2016/01/06/format-support.html). 47What's more, onthe-fly translation creates a computational bottleneck. 48or nonbioimaging specialists, these issues are usually a challenge to overcome.

2.2.1
Role of a core facility in data management for image acquisition Many core facilities have adopted regular quality control measures and instrument calibration as a standard operating procedure -an essential contribution to data reliability and reusability.International standardisation of quality control measures and the provision mentioned above of those quality measure records would be an asset. 49,508][59] This technical metadata should be available along with the shared FAIR data.Core facilities are essential in providing training on these aspects. 7,8ith its fast storage component, the usually vendorprovided acquisition computer functions as the initial location to write data to disk.Using a network drive for initial storage is not recommended, as network instability could result in data corruption.Subject to individual core facility policies, users should clear the acquisition computer and transfer data after acquisition to a suitable storage location optimally through network transfer (2.3, storage and organisation), avoiding portable drives, which would increase the risk of spreading malware. 60endors usually optimise their proprietary software and file formats for fast writing-to-disk.Many vendors offer free versions of their software to open their file formats.However, data accessibility and interoperability are enhanced when data can be recorded in or faithfully converted to open file formats. 21Bio-Formats-based conversion to, for example, OME-TIFF has been the de facto standard solution for a long time. 47But new file formats are required for working with large files in cloud and object storage environments, for which the OME-Zarr format is under development. 48,61Core facility staff should familiarise themselves with next-generation file formats (NGFF) like OME-Zarr, learn to use conversion tools in relevant use cases, and stay abreast of the development.Thanks to NGFF, data chunking makes large multidimensional images cloud-compatible and ready for streaming.As such, OME-Zarr allows users random access to any plane or part of large multidimensional images over the Internet without the necessity of loading the whole image.This is particularly important for modalities like light-sheet microscopy, HCS, volume electron microscopy, etc., and bears strong potential for a broadly adaptable new standard. 61,62

Storage and organisation
Users face various challenges regarding the requirements for storage efficiency, interoperability, computational per-formance, accessibility and data transfer, particularly for handling data from advanced bioimaging techniques like light-sheet microscopy, HCS or whole slide imagers. 22,60ffective storage and organisation of large-scale data require hardware infrastructure, workflow optimisation and sharing protocols.Network availability and bandwidth are pivotal for facilitating large file transfer and access.High-speed networks and efficient data transfer protocols minimise latency and maximise throughput, particularly in distributed environments.Specific prerequisites may depend on project-specific factors like collaboration requirements, data volumes and the computational tasks performed on the data.Storage technology for various applications has evolved significantly over the past decades, 63 and a large heterogeneity of storage modalities is found in the research landscape.While individual research groups still often rely on NAS servers or hard drives for data storage, diverse institutional offers might exist, for example, file systems, tape systems for cold storage, distributed or virtual storage and object storage.Aspects regarding storage reliability technologies (e.g., RAID or erasure coding) are to be considered. 60,63The complexity behind the growing trend towards working 'in the cloud' is sometimes underestimated by scientists without in-depth IT knowledge. 64,65esides providing sufficient storage space and network capacity, the storage modality must be appropriate for the data type to allow more than merely holding the data in a certain location.Image data formats and software can influence how efficiently large multidimensional images, for example, 3D time-lapsed multichannel images, are accessed.Especially when working with data through a network, large N-dimensional arrays may require a whole file download, depending on the format.Bioimage data, like other research data, are frequently accessed throughout a project (also referred to as hot data, as opposed to archival data, referred to as cold data).For processing and analysis, regular computer RAM (random access memory) capacities readily fall short. 48,66Central services (e.g., Nextcloud instances) may be inadequate for sharing large data sets.In bioimaging, data volumes can reach terabytes within one or a few acquisition sessions -exceeding most standard file-sharing solutions' capabilities. 22esearch processes encompass numerous steps involving various data types.Data conversion may be necessary for data integration across different platforms.Processing and analysis software produces output files in diverse formats, including images, tables, texts and annotation files.Compressed images are generated regularly for presentations or reports.This multifaceted nature of data handling contributes to potential data fragmentation, making it challenging to maintain a comprehensive overview.Keeping track of all copies and linked files while TA B L E 2 Common RDM issues and solutions for data acquisition.

Common issue
Proposed solution

Quality control and instrument calibration are not documented
A core facility can make the QC measures and records of instrument calibration accessible for the users or provide links to published protocols that are applied for QC at the respective instrument.

Ad-hoc metadata documentation during acquisition is not transformed into structured annotations
Raise awareness to document technical metadata and validate its completeness.If new metadata items are collected, the DMP and structured annotations (e.g., in OMERO) should be amended.

Users leave data on acquisition machine computers
The core facility policy should enforce data transfer from the computers.Orphaned data can be moved, uploaded to a data management system like OMERO offering a period for users to handle the data, or be deleted.

Sample preparation metadata is not added to acquired image files as structured annotations or linked metadata files
Sample preparation can be linked to data, for example, by linking to ELNs, or public protocols in the metadata.For example, in OMERO, the dataset description or the Key-Value Pairs could contain such links.
Technical metadata record correctness is not validated Sometimes, technical metadata is incomplete or not correctly conserved in file format translations.Essential metadata for the experiment should be checked for newly handled formats.
navigating through processing, sharing, figure drafting and publication can be an error-prone task. 67,68ost imaging facilities allow users to access and use the instruments after initial training and are, hence, termed 'self-service facilities'.In other cases, imaging experiments are performed by facility staff, who have control over data acquisition and data provision to users.Especially in self-service core facilities, the prevailing practice is to transfer data to users' storage locations immediately after acquisition, rendering users solely responsible for organising and storing the data. 5Some research groups specify their data management individually per project.Hence, data structuring and labelling can vary significantly within institutions, and this variability hampers data reusability.Repeatedly, new core facility users must recapture images when attempting to restart previous projects because the original data are no longer identifiable.
In our experience, only a small number of users, usually with a background in bioinformatics or computer science, have a deeper understanding of storage in a way as outlined above.Most users ask for sufficient space and data security but understandably rely in full on their IT department or core facility for any details.This creates a challenge when neither the user nor the IT staff know how these details can affect working with the researchers' specific data types.

2.3.1
Role of a bioimaging core facility in storage and organisation Full-service or hybrid-service facilities, but also self-service core facilities, are confronted with storage provision and data organisation challenges.To this end, some core facilities favour gaining full autonomy with respect to their IT resources. 28One can purchase commercially available packages of acquisition systems, analysis software and workstations, including storage.However, besides a vendor lock-in risk, any additional hardware and software offered by the core facility creates a maintenance overhead that requires a sufficient budget. 2,5ost facilities inevitably rely on central resources.In our experience, it is important to seek strategic alignment at the institutional level to reduce costs, facilitate interoperable solutions and share maintenance responsibilities.Besides researchers, relevant partners regarding storage, electronic lab notebooks (ELNs), or Laboratory Information Management Systems (LIMS) are other core facilities, IT departments, academic libraries, and central administrations.Core facilities can estimate the storage needs based on the user number, the instrument bookings and the average file sizes generated per system.Creating custom in-house storage solutions might seem appealing, but a fragmented landscape of solutions can impede usability and interoperability. 69A bioimaging-specific RDM platform like OMERO, combining storage and organisation, is an asset acknowledged by scientists, although such platforms have not been broadly implemented so far. 71][72][73][74][75][76][77][78][79] For example, while OMERO or XNAT could (but not necessarily should) be installed by a core facility alone, a more generic system like iRODS (https://irods.org) is instead an option as an institution-wide strategic decision for overall RDM of all data types.The most widely adopted platform for bioimaging in Germany is OMERO 7 which can serve as a long-term TA B L E 3 Important considerations about storage and organisation for bioimage data.

Common issue Proposed solution
Fragmentation of data Core facility staff can allocate data to storage and linkage pipelines or workflows.A DMP can help to keep track of data.
The file formats impede data integration Data can be converted into suitable formats with the help of core facility staff to avoid loss of information regarding data and metadata.
Hurdles to data integration due to the design of the infrastructure.Different data types/modalities cannot be accessed and analysed together Using a holistic approach that addresses hardware infrastructure, network capacity, workflow optimisation and sharing protocols.
Avoiding isolated local systems and ensuring the compatibility of RDM solutions or defining bridges between different solutions.Standardisation of integration processes can avoid confusion and inefficiency.Ensuring that the performance and infrastructure scalability do not become a bottleneck for data integration.
Performance bottlenecks when accessing data Core facility staff can help close the gaps in mutual understanding between IT and researchers in terms of the architectural structure of image data and the resulting infrastructure requirements.
Lack of central (bioimaging) RDM infrastructure Core facility staff can educate users to adopt community-approved RDM strategies in file systems or can contribute to aligning strategies at the institution to save costs, facilitate interoperable solutions and share maintenance responsibilities.
storage depending on the total data volumes.OMERO's (and other platforms') advantage is that data are not frequently moved or copied, but users work collaboratively on the data in the same location and export only when necessary. 80OMERO's user management allows defining data ownership and access rights, which helps avoid orphaned data.If required, connecting an OMERO server via a high-speed cable connection to image analysis workstations can meet user needs for fast computing with large data files. 81Importantly, finding data in any location depends on metadata.OMERO allows adding metadata as structured annotations while making the instrument metadata accessible, too.In our experience, an OMERO installation on a flexible resource management backend, like a virtual-machine-based Open Stack environment with scalable mounted storage, is an asset for core facilities and researchers alike.
Where a bioimaging RDM platform is lacking, core facility staff can help users to adopt community-approved data structuring in file systems.Examples are the ISA framework, including tools, data model and serialisation for (meta)data, 82,83 or BIDS for Microscopy. 42Version control software (e.g., Git, Gitlab, DataLad) might be an option for computer-affine users. 84File naming conventions with tokens can be discussed in project consultations.Where sufficient storage with fast data access is not available, users might consider using thirdparty, federated storage and compute resources that exist at the regional, national and the international level (e.g., Galaxy https://imaging.usegalaxy.eu,de.NBI https:// www.denbi.de,ELIXIR https://elixir-europe.org,EUDAT https://www.eudat.eu,EOSC https://open-science-cloud.ec.europa.eu,etc.).Table 3 summarises considerations of data storage and organisation.

Image analysis
Bioimage analysis is standard practice to obtain scientific results as objectively, reliably, and reproducibly as possible.
0][91][92][93][94] Notably, data management is crucial in image analysis: accessing large data sets, sharing copies of data or results, maintaining records of processed data and their provenance and associating results with the corresponding raw data for later exploration. 95his becomes even more pivotal in the context of automated analysis pipelines, where the code heavily relies on well-structured data sources.While a common approach involves using file systems with a combination of folders, naming conventions, and inventory spreadsheets to organise data, cross-referencing integrity is not assured.These challenges create a significant workload for core facility staff assisting researchers with their image analysis project.

Role of a core facility in RDM for image analysis
While some institutions build on independent bioimage analysis facilities, 92 core facilities may provide support and training with image analysis.][98] Beyond that role, core facility staff can advise researchers on how to document their analysis, using file structures compliant with accepted standards (ISA, BIDS or similar), recording analysis steps, for example, with the macro recorder in ImageJ/Fiji, 99,100 or using Galaxy Imaging 101,102 for robust and reproducible workflows recorded in RO-Crates. 103Other examples are Bioimage Analysis Desktop (BAND) provided by the EMBL (https://band.embl.de)or the AI4LIFE project (https://ai4life.eurobioimaging.eu).An overview of resources can be found in David et al. 104 and in a 2023 biologist's guide to planning quantitative imaging. 105These, however, do not substitute for data management support provided by core facilities.
To ensure data quality and integrity, database technologies have introduced methods to enhance information validity and accuracy.This includes concepts such as primary keys for unique item identification or normalisation processes. 106Database tables used in RDM systems adhere to these data structuring principles by design.This is why solutions such as OMERO or Cytomine are well suited to handle the complexity of microscopy data: researchers can benefit from the flexibility to view and organise images according to their preferences by leveraging the object-oriented data organisation of such systems, and they can use annotations like tags, key-value pairs (dictionaries) or tables of typed data, each with specific purposes.Such structured data integrates well into image analysis pipelines, eased by the Application Programming Interfaces (APIs) of these tools. 107ur experience with OMERO is that it eliminates the need to transfer data between collaborators, decreasing the total data volume and preventing divergences of processed data versions: the data stored centrally on a server is shared via links and accessed from different physical locations and software clients.1][112] Once an analysis is complete, the results are uploaded back to the server and kept in a format defined by the OME model. 113,114he development of integrated tools in the OMERO.webclient is supported by the bidirectional links between data sources (images) and derived results (ROIs, tables), improving the user experience and simplifying the process of sharing and validating results. 112ROIs are displayed in the image viewer (OMERO.iviewer)and overlaid in figures (OMERO.figure).Tables are plotted, and data points can be traced back to the source to identify outliers (OMERO.parade,OMERO.mdv).Common issues and proposed solutions regarding the dependency of image analysis on data management are listed in Table 4.

Publication
9][120] 'Data is available on request' statements in publications were found to be often unreliable in practice, [121][122][123] and data are not easy to find without a public metadata record available for search engines.
Researchers usually focus on experiment optimisation to achieve the best result during their projects.Making this process understandable and reproducible implies providing access to the complete documentation from sample preparation over data acquisition to publication.5][126] A principle of Open Science is to share data as openly as possible and keep it only as closed as necessary.Where not restricted by law, open data with metadata linked to a research publication enhances trust in research quality, also indicated by higher citation impact. 127,128Scientists face several challenges: How can images and the deduced findings be represented accurately in a manuscript?How to write the methods section for microscopy experiments correctly?How to enable access to the original data with little to no restraints?What access and reuse rights should be granted, and who owns the data to choose an appropriate license?These aspects can be complicated if, for example, the data producer has left, the data went through many hands, losing essential metadata, or because guidelines on image data publication are not known or lacking.

Role of a core facility in image data publication
It is the researcher's and author's duty to present data accurately in a publication.Helping researchers follow rigorous bioimaging RDM practices from the start avoids frustration and reduces the workload at the publication stage.With improving capabilities of repositories, new journal submission guidelines, the development of new microscopy techniques and novel standards in research domains, good publication practice keeps evolving.Uncertainty about data ownership is an impediment to data sharing, 7 and core facility staff can help to identify who is authorised TA B L E 4 RDM aspects around image analysis.

Common issue Proposed solution
Tracking of data to process with folders, file names and spreadsheets Using a bioimaging RDM platform to communicate the data status to process and exchange results

Analysis pipelines depend on a rigid data organisation in folder structures
Using established standard file organisations and naming conventions or leveraging object storage and structured annotations to interrogate data flexibly via APIs without duplicating data Analysis pipelines each implement their own data input and output formats Analysis pipelines access data in management platforms through an API for data input and uploading analysis output Image files, analysis files and annotations lack links and structure Using standard file structures or leveraging database links of RDM platforms for bioimaging (e.g., OMERO) preserving the consistency of the relations between data and analysis results

Analysis results exploration requires dedicated software installation
Data exploration from a remote location and through a browser that only requires an Internet/intranet connection (e.g., OMERO.web) to license the data, which in turn depends on the collaborators, data privacy concerns, intellectual property rights and funding agencies' regulations. 129Institutional policy can reinforce open data sharing. 130,131A trend towards more open data sharing and reuse is apparent among early career researchers. 132,1335][136] Core facilities help to disseminate such knowledge and can crosscheck the proper data representation.Only presenting 'representative examples' in a manuscript should be avoided. 122,127Instead, scientists should publish the original data behind figures, for example, full data sets used to derive quantitative results by image analysis. 32,122Many journals and funding agencies now require public data deposition.Data publications -even independent of classical publications -are also becoming an option for large image data sets due to the advancement of repositories (for a list, see Ref. 120).Core facility staff can aid with data formatting and repository choice.
Beyond mentioning core facilities in acknowledgement sections, there are reasons to regard a core facility staff's work as a scientifically relevant contribution.Organising and annotating bioimage data improves processing and analysis and fosters scientific rigor, reliability and reusability.Hence, we advocate that core facility staff should become coauthors of the data publication entry in a repository (e.g., BIA) where appropriate, even if they are not coauthors of the journal publication on the overall findings. 137,138This will help strengthen the core facility's position to the benefit of researchers by showing a measurable impact of core facilities not only as generic support facilities but as partners in scientific discovery.For core facility staff members, it supports building their career in scientific support infrastructures based on a track record.

Archiving research data
Good scientific practice requires that the source data of a research publication be preserved over the long term, for example, at least ten years after publication, according to the German Research Foundation's code of conduct. 139If the primary purpose is to hold a faithful data record, local archiving appears sufficient.But data can become trapped in institutional silos, not accessible for more research.In some cases, research data are only stored on hard drives, where data are inaccessible to others and at risk of permanent loss due to hardware failures.Since researchers move between positions while building their careers, retrieving and understanding 'old' data becomes challenging for principal investigators and colleagues. 140ata privacy restraints or economic considerations may prohibit open data publication.But even if kept behind closed doors, data archiving is more than storing data for a long time.Archiving must ensure that data can be found, retrieved, and reused upon demand.Only then will 'cold' data that are not frequently accessed remain preserved as FAIR data. 141This requires data selection and cleaning, a strategy for data structuring and metadata annotation, and regulations regarding access and reuse by institutional policy.Original data must usually be conserved.Derived data may be deleted if it can be regenerated from raw data using the accurately documented analysis pipeline, highlighting the importance of reproducible image analysis again (2.4,image analysis).Many institutions provide central archiving systems through the IT department or academic libraries, but some systems may be insufficient for large bioimaging data.Special software implementations may be in place to orchestrate research data archiving, for example, Tivoli Storage Manager (now IBM Spectrum/Storage Protect) for tape storage. 142Often, institutions establish long-term preservation systems compliant with the Open Archival Information System (OAIS) reference model integrating data ingest, storage, management, access and administration (https://www.iso.org/standard/57284.html).A combination of solutions may be implemented in local or federated environments, for example, in EUDAT B2SAFE (https://www.eudat.eu/service-catalogue/b2safe).Example solutions are iRODS, Coscine (https://about.coscine.de/en)or the EOSC project ARCHIVER (https://www.archiver-project.eu).Implementing these systems is beyond the scope of an imaging core facility but should be pursued as an institutional strategy.

2.6.1
Role of core facilities in image data archiving Data archiving goes beyond the specific needs for bioimaging data.Solving this task is not primarily a core facility's responsibility.Core facility staff can support archiving in several ways: by emphasising proper archiving practices towards users, by guiding to the institutional archiving systems, and by communicating bioimaging-specific needs to research data policymakers and infrastructure providers.An example would be a strategy to identify 'cold' data in an imaging data management system like OMERO 11 or Cytomine 70,143 and transfer it to archival systems.Technical solutions for this task are under development, such as an integration between iRODs and OMERO (https://github.com/irodscontrib/irods_working_group_imaging).The development of FAIR Digital Objects for bioimaging to wrap data, metadata and accompanying research assets into shippable packages is ongoing. 144As noted above, core facilities can guide metadata annotation, include quality control protocols and support data cleaning and selection.For example, open file formats like OME-TIFF or OME-Zarr are more suitable for long-term archiving than proprietary vendor formats that may not be supported in the future. 134Finally, the desired best practice would be archiving data in a suitable public archive as the authoritative faithful data record (2.5, publication).Table 5 summarises issues and proposed solutions for both publication and archiving.

Closing the life cycle
6][147] The reuse potential of quality-controlled, annotated data that are findable and accessible in open, interoperable formats was demonstrated by Williams et al. 118 and FAIR open data are expected to advance bioimage informatics. 148Especially in the era of AI-based analysis, segmentation and hypothesis building, well-annotated and quality-controlled research data are essential for training models and obtaining reliable results, 15,149 and data have been reused for novel algorithm development. 150,151

PART II: ACTION ITEMS FOR CORE FACILITIES TO ADVANCE IN BIOIMAGING RDM
We explore strategies and actions for core facility managers, core facility staff and science managers tasked to advance the capacity and capability for bioimaging RDM.While one-size-fits-all solutions are unavailable, we report on considerations from practical experience within the framework of GerBI-GMB and the funded RDM projects in which we are active.

Foster deployment of a bioimaging RDM platform in your institution
The complexity and sizes of bioimage data constitute specific requirements for bioimaging RDM platforms, and several systems were developed by imaging communities, often with global support.OMERO has a history of more than 20 years in development, is among the best-known and most widely used systems, and is supported by developers worldwide. 7,11,12Cytomine, established in the field of histopathology, offers comparable features. 70,143BISQUE is a combined image organisation and analysis platform 72 but appears to have a smaller user base as compared with, for example, OMERO 7 (https://forum.image.sc/t/data-management-bisque/42370/2).Originating from the Neuroscience field, XNAT (https://xnat.org) is a versatile image data management tool predominantly used in medical and preclinical imaging. 75,152Moreover, commercial offers exist to enable microscopy data management.
In Germany, many core facilities chose OMERO as an aligned approach to implementing bioimaging-specific data management. 7,13,80,81Our recommendations are, thus, largely based on the experience with setting up and introducing OMERO instances in Germany while the scheme may apply more generally.
Review the status: Establishing and disseminating new practices among researchers takes time, needs trust, and costs resources.At the start, core facilities need to understand where they stand and where to go.
• What are the largest RDM concerns in a facility?Pointed examples should be selected so that nonexperts can understand the nature of the issues.
TA B L E 5 Concepts and considerations for publishing and archiving bioimage data.

Common issue Proposed solution
Copies of raw data are left on acquisition machines The core facility's user policy defines and enforces interim data storage, transfer and deletion.

Data is only stored on hard drives together with paper notebooks as documentation
Enriching data with essential metadata to find, identify, understand and potentially reuse it.Contacting institutional archiving providers or uploading data to a repository.
Data is published or archived 'as is' Data must be curated, selected and quality-controlled before archiving.The data to be archived can be planned upfront and noted in a data management plan.File formats and metadata should be considered.Community checklists can be used for publication.
Valuable data are only archived locally Considering public repository deposition of the data with or independent of a research article to enable findability and reusability.
The capacity or quota of the local archival system is insufficient for large bioimage data files Upload to a suitable public repository as the faithful data record after publication.
• Gain an overview of general concerns versus special cases.Intrafacility communication is important for this task.

Design the infrastructure concept
With the stakeholders identified, discrete aspects of a robust and scalable infrastructure can be defined.Various software options and commercial services can be evaluated, and a decision can be made based on the stakeholder process and the difficulty of tasks to solve.We propose a scheme for defining technical details and responsibilities for the topics to address (Figure 2).It focuses on operational implementation, running and maintenance.Therefore, the stakeholders in focus are only those that will be involved over the long term when the infrastructure is installed and an operational service is established.
• Storage and server location: Central server infrastructure is typically offered by the IT department.Software providers specify minimum requirements and depend on the expected use.These influence the installation's design (physical server or virtual machine, load balancing, network bandwidth, etc.).We recommend a virtual machine environment, for example, OpenStack, where the required storage is mounted and setups are adapted dynamically.Core facility staff can learn to manage the virtual machine and estimate initial storage needs.With OMERO, the option to mount researchers' own responsibly maintained storage to the server has advantages (costs, storage space allocation) and disadvantages (risk of breaking links is on researchers' side, responsibility for backup and security).• Network architecture: The need for frequent access to data should be discussed with the researchers and communicated to the network provider.To load centrally hosted large data via the network, several bottlenecks slowing down the process can be avoided.Using file servers mounted to a bioimaging RDM system (so-called in-place import) is an option to consider when working with large amounts of data.For image analysis, a cableconnected, remote analysis workstation with software preinstalled is an option. 81A large number of simultaneous connections might challenge the system.On the other hand, microscopes generate a large amount of data that require sufficient network bandwidth for upload.might be used, or cable-connected remote stations need to be installed.• Data life cycle management: Considerations include, for example, identifying unused data that should be moved to long-term archiving, or OME-TIFF/ OME-Zarr conversion before uploading to the bioimaging RDM system.Storage usage, annotations, data access, etc. can be regularly monitored, but storage quotas cannot be directly enforced in OMERO.• Compliance and regulations: User management, server configurations and access influence the system's security level.For example, is the upload of privacy-protected data possible?How and when data are deleted?Who is responsible for compliance with data privacy regulations?Compliance with the usage policy, institutional data policy, and general laws have to be considered.In some cases, institutional groups like the staff council have to be involved.• Repositories and persistent identification: Depending on configuration and policy, a system like OMERO can function as an institution-internal or even public repository.Academic libraries might be important partners, for example, for persistent identifiers. 154Data might also be moved from an OMERO instance to a public repository to increase findability or comply with publisher requirements.
• Sustainability: Bottlenecks and limitations of personnel capacity, and direct costs for hardware and software must be considered from the start.Sustainability strategies should be discussed.Synergies with other stakeholders, such as central RDM teams and third-party funding, can be leveraged.

Planning the process
Implementation should comprise (i) a consultation phase to establish stakeholder contacts and understand perspectives.A task team should lead the efforts once a funding plan is agreed on.(ii) A setup phase where infrastructure is installed and tested by the team to get familiar with systems and assign responsibilities.Updating OMERO, installing extensions and plugins, changing data ownership, or approving new users can be performed by an IT staff member, by a core facility member, or by a data steward, depending on availability and individual skills.Supporting annotation and best practices of image analysis with OMERO will, in contrast, rather become a core facility task.The test instance allows to make mistakes and to learn before the operational instance is installed.(iii) A pilot phase in which a limited number of voluntary users with a limited variety of data types test the prototype operational system for applicability within their research.The trustful collaboration with pilot users and their acknowledgement is very important.For the core facility, it is a key phase to understand the dynamics of the system, to establish the training needs, and to learn to manage user expectations.The pilot phase duration is flexible but should not be too short, for example, a pilot phase of one year, during which gradually more pilot users are admitted.(iv) Launch of the operational system.Opening the infrastructure for all users is the most relevant but not the ultimate milestone.With increasing user numbers, new concerns are to be expected.No system is optimal for all cases.(v) Evaluation and re-adjustment.Reviewing how using the bioimaging RDM system has changed the previously documented user concerns provides success indicators.A successful implementation benefits the researchers and the strategic position of the core facility.

PART III: DATA STEWARDSHIP FOR BIOIMAGING DATA IN CORE FACILITIES
Image acquisition and analysis are typically aided by imaging experts and image analysts in core facilities (in small facilities, this is often a dual role for one person).Additionally, the size and multidimensionality of many image data formats require expertise in handling data infrastructures, which is usually acquired by core facility staff over the years.Most core facilities have established local storage facilities for image data that allow central access.At the local level, this may satisfy criteria for data accessibility and, to some degree, reusability.To achieve public accessibility and reusability, additional aspects of FAIR data management must be integrated into routine image data handling.This is to transform the theoretical FAIR framework and guiding principles into discrete, actionable practices. 10,155The international bioimaging community facilitates FAIR data globally, and data stewardship is a key component. 156The role of the 'data steward' has developed with various understandings depending on data types and research fields.][159] Best practices in bioimage data management are developing and must be tested and 'negotiated' in the global research community.We regard data stewards as a vital component in implementing novel practices so that the community can evaluate workflows, new tools and standards in everyday research practice.In our experience, such support increases the readiness of researchers to invest in FAIR data management.One reason for not sharing data is the perception of a lack of skills, time and technical resources. 7,46,132Data stewards train scien-tists at all career levels and, thus, document and gain experience with best practices that they can apply to new use cases.Examples of data reuse for new research have to be collected to showcase the scientific benefit of bioimaging RDM.Initiatives at the national and international levels, like Germany's NFDI, Euro-BioImaging and bioimaging communities worldwide, focus on advancing the above-mentioned implementing and iterative testing in the disciplinary and methodological research communities.They provide cross-institutional personnel capacity for bioimaging-specific data stewards who foster FAIR data sharing in collaboration with core facilities.Example studies supported by NFDI4BIOIMAGE and I3D:bio are Nöth et al. 138,160 or Jannasch et al. 161 We do not assume that individual core facilities must sustain data steward positions individually.However, data stewards will likely remain key to the research process at large.Hence, new ways for sustainable funding of data stewards at yetto-be-determined domain-specificity levels will become important for research infrastructures.

CONCLUSION AND OUTLOOK
Research data management and the FAIR principles are no 'end in itself'.There is an initial risk of perceiving RDM practices primarily as a burden, apparently without immediate comprehensible benefit.However, the potential that lies in professionalising bioimaging RDM is enormous: It supports researchers to get more out of their own data, facilitating the systematic handling of (large amounts of) data, thus enabling the democratisation of the research data value if such data are publicly available.It enables the integration of different data types and, therefore, more comprehensive perspectives on a research question.FAIR data management is an essential contribution to successfully applying advanced image analysis approaches, including artificial-intelligence-based procedures.This applies in particular to complex and large imaging data, where new and unbiased algorithms might reveal unexpected new patterns.Thus, FAIR image data management paves the way to a new understanding of spatiotemporal analysis, which can be described as integrative image data science.For core facilities, increasing their capacity and capability for professional RDM in bioimaging alleviates the workload that today partially stems from the lack of RDM routines.But importantly, it comprises an additional layer underlining the value and impact of core facility work for rigor and trust in scientific discovery.Core facilities have a strategically important position at the interface between relevant stakeholders to make FAIR bioimage data state-of-the-art in research routines.

A U T H O R C O N T R I B U T I O N S
CS, TB, JD, SK, EFM and SWP drafted the concept, wrote and edited the manuscript or sections of it.SK and JD drafted the figures.TW and RN contributed to the concept and reviewed the manuscript.

A C K N O W L E D G E M E N T S
We thank the I3D:bio use case partners for collaborating on implementing bioimaging RDM systems within the project, the members of the RDM4Mic group, German BioImaging -Society for Microscopy and Image Analysis, and international partners.We also thank the members and supporters of the NFDI4BIOIMAGE consortium.In particular, we thank Josh Moore, Janina Hanne, and Michele Bortolomeazzi for discussing the article's scope and content.We express our gratitude to the reviewers for feedback and comments to improve the manuscript and to the Scientific Editor, Kurt Anderson, for support.
Open access funding enabled and organized by Projekt DEAL.

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare no conflict of interest.

R E F E R E N C E S
Maintenance plan: Sustained support and hardware maintenance should be secured by IT.A plan to monitor performance issues must be conceived.Maintenance responsibility for the bioimaging RDM system and image analysis stations can lie with core facility staff.Responsibilities must be clearly defined to avoid misunderstandings regarding long-term operation and support.• Data types: Different file sizes and structures of bioimaging data affect the performance of remote access and computing.Data integration with links to other storage environments might be required.Different storage architectures exist with different strengths and limitations.• Access and user management: Depending on the configuration, remote access may be enabled over the Internet or restricted to intranet connections.The user management in OMERO allows default groups and the integration of institutional user and identity management (e.g., LDAP).• User integration and training: Users need training to adopt new solutions.Immediate benefits should be highlighted (viewing images, making figures), and reusable training material can be leveraged, for example, the OMERO guide, https://omero-guides.readthedocs.io/en/latest, or I3D:bio's OMERO training material. 153• Data acquisition and integration: Best practices for uploading newly acquired and existing data can be provided.Large data might require a command-line toolbased supported upload and an appropriate network connection.• Data analysis: Users' needs for image processing and analysis must be discussed.APIs often enable interoperability with OMERO.If commercial software is required or the data transfer is limiting, in-place-imported data F I G U R E 2 Designing a bioimaging RDM concept.The figure focuses on the operational aspects of installing and maintaining the infrastructure.Different stakeholders may formulate requirements and take over responsibilities for the various tasks depending on local circumstances, configurations, and personnel availability and skill.Responsibilities may lie with different stakeholders or change over time.

Table 2
summarises data management aspects related to image acquisition.

•
What works well?The ability to implement new solutions is underlined by previous success.Colleagues with prior experience can support.One can learn from examples and understand local peculiarities as opposed to others.Contacts within national or regional core facility networks (e.g., German BioImaging, BioImaging North America, Euro-BioImaging, France BioImaging, BioImaging UK, BioImaging NL, Microscopy Australia, etc.) are an asset.• Central RDM teams: If available, local RDM and Open Science teams are valuable partners.At some institutions, general-level data stewards can be partners in implementing a new system (Part III).• Administration and Management: With defined goals and established partnerships, approaching management boards can help to make the case.The management board's perspective plays a role with respect to sustainability and support.What is the future strategic plan, and how would establishing a bioimaging RDM infrastructure align with it?Outline that RDM goals could include finding tangible solutions for the important aspects: • Reducing costs through improved data flow: centralised maintenance of storage capacity and organisation, data security and interoperability with other stakeholder resources.• Building on established solutions avoiding isolated silos with lock-in problems.• Reacting dynamically to changing demands.• Facilitating scientific excellence by improving data quality, integrity, reach, collaboration and compliance with funding agency and publisher guidelines.