The Open Brain Consent: Informing research participants and obtaining consent to share brain imaging data

Abstract Having the means to share research data openly is essential to modern science. For human research, a key aspect in this endeavor is obtaining consent from participants, not just to take part in a study, which is a basic ethical principle, but also to share their data with the scientific community. To ensure that the participants' privacy is respected, national and/or supranational regulations and laws are in place. It is, however, not always clear to researchers what the implications of those are, nor how to comply with them. The Open Brain Consent (https://open-brain-consent.readthedocs.io) is an international initiative that aims to provide researchers in the brain imaging community with information about data sharing options and tools. We present here a short history of this project and its latest developments, and share pointers to consent forms, including a template consent form that is compliant with the EU general data protection regulation. We also share pointers to an associated data user agreement that is not only useful in the EU context, but also for any researchers dealing with personal (clinical) data elsewhere.


| GOAL AND BACKGROUND
Petabytes of brain imaging data are collected for research purposes every year, yet only a small fraction becomes publicly available despite evidence for the benefits of sharing such data sets (Milham et al., 2018). One reason, among others, is that openly sharing human brain imaging data requires conforming to established ethical and legal norms, in particular with respect to ensuring that research participants' privacy is respected. Ethical and legal requirements are usually validated by institutional review boards (also known as research ethics committees), which operate under national, federal, and/or supranational regulations. In the case of brain imaging, ethical and legal In some scientific disciplines, for example, genetics (Khan, Capps, Sum, Kuswanto, & Sim, 2014), consent is widely discussed and analyzed, and templates for participant consent forms are available and commonly used, for example, for clinical trials (https://www.who.int/ ethics/review-committee/informed_consent/en/). To date, similar work has not been undertaken for brain imaging studies. The goal of the Open Brain Consent initiative is to facilitate brain imaging data sharing by providing practical tools that enable data sharing while respecting research participants' privacy. It consists primarily in providing widely acceptable information/consent forms allowing processing and deposition of data into appropriate archives for future (re)use. Additionally, the project website references tools/pipelines to minimize the risk of re-identification and provides additional information about the various regulations to help brain imaging researchers.

| PROJECT HISTORY AND CONTRIBUTION MODEL
The Open Brain Consent project was started in 2014 to provide (a) a collection of existing samples of consent forms allowing data sharing, (b) a reference "ultimate" consent form, and (c) tools helpful for pseudonymization, making brain imaging data easier to share. The goal of having a template consent form was, and still is, to establish a recommended wording for a consent form based on collected examples that represent community wide expertise. At that time, the OpenfMRI archive (later developed into OpenNeuro) (Poldrack et al., 2013) was confronted with issues related to the rights to share the growing number of data sets being submitted. To address them, OpenfMRI established a recommended wording which was contributed to the Open Brain Consent project in 2015. Since then, many researchers have joined the project to provide translations to a number of languages and to expand the list of sample forms and tools. In 2018, the advent of the European General Data Protection Regulation (GDPR: https://gdpr-info.eu) left many researchers unsure about the sharing of brain imaging data, since anonymous data can be shared freely, but personal data cannot. An online discussion ensued concerning the status of brain imaging data, and work began to revise the "Ultimate" Open Brain Consent form to make sharing brain imaging data, GDPR compliant. This work took place in particular during the Organization for Human Brain Mapping (https://www.humanbrainmapping.org)

| ETHICAL CONCERNS WHEN SHARING BIOMEDICAL AND BRAIN IMAGING DATA
As more brain imaging data and biomedical data are shared openly, concerns have been raised in several publications about risks to data privacy. From a legal and ethical standpoint, risks about research participants' privacy must be identified and mitigated. This necessitates, on one hand, that procedures for data de-identification are in place (from pseudonymization to full anonymization) along with means for individuals to exercise control over the use of their personal data. On the other hand, it requires retaining as much as possible information in the data, allowing researchers to use the data to answer specific research questions. Thus, a balance needs to be struck and that balance is influenced, in part, by the risks of re-identification based on current technological possibilities and limitations. For instance, it has been shown that it is possible to identify participants in the 1,000 Genomes Project by combining publicly available demographic information from the American census and public information from the peoplefinder.com website with anonymized genomic data sets (Gymrek, McGuire, Golan, Halperin, & Erlich, 2013). This work, however, relied on having been given secured access to the genomic data and being able to code and use advanced cryptographic algorithms; hence, it can be argued that the risk of identification remains low. By contrast, Rocher, Hendrickx, and de Montjoye (2019) (Rocher et al., 2019) estimated the likelihood of re-identification of individuals at around 95% by combining biomedical data and information from postcodes and census using relatively simple statistical models available in open source packages like R or Python. The cost and knowhow, in that case, is low and the risk of re-identification is thus higher.
Brain imaging data are often collected along with a range of associated biomedical and/or clinical data which represent additional identifying features. Even if additional biomedical data are not provided, there are brain imaging specific concerns, especially for magnetic resonance imaging (MRI) data. From a standard anatomical MRI of the participants' head, the facial features can be reconstructed in 3D and matched to publicly accessible photos. Various approaches have been proposed to "deface" MRI data, from blurring to zeroing (some e.g., of defacing algorithms are presented in Figure 1). Such approaches cause data loss and, if performed too coarsely, can affect the outcome of analysis pipelines (de Sitter et al., 2020). In addition, recent advances in machine learning have cast doubt on the efficacy of this approach.
Abramian and Eklund (2019) have been able to "reface" single slice data with relative success (60 to 75% success) using machine learning (employing a Generative Adversarial Network), and it is reasonable to anticipate that methods like these will improve and become more widely available in the future. Beyond re-identification using direct identifiers, GDPR highlights that singling out is a precondition to identification, and it should therefore be minimized. Identification can be straightforward with an anatomical MRI in which the face is available since faces are likely unique (Sheehan & Nachman, 2014), but singling-out individuals from defaced data is also possible based on the gyral patterns that are unique to every individual (Duan et al., 2020), like fingerprints. From MRI data that do not include facial information or detailed anatomy, such as functional MRI data, it is still possible to single out individuals. For example, Ravindra and Grama (2019) were able to single out participants across multiple data sets, using task performance and connectivity patterns, with a success of 90%. Altogether, these results suggest that biomedical data and brain MRI in particular, are at risk of re-identification-that is, can in all likelihood not be fully anonymized-and should therefore be considered as personal data under the GDPR. Acknowledging that risks to personal data privacy exist for brain imaging data, identifying them and putting mechanisms in place to mitigate them are therefore essential, as is informing each participant throughout the process: these are core steps in the Open Brain Consent working group. The key elements are to (a) have a consent form that only deals with data sharing; (b) inform participants about the data storage, privacy measures (e.g., pseudonymization procedure) and control over usage (e.g., withdrawal) and; (c) provide information on how data will be shared, specifically outside the EU. These key elements must be included to promote secondary use of the data (Staunton, Slokenberga, & Mascalzoni, 2019). The main difference with the non-EU specific consent form is that further information about privacy and usage control is provided. For researchers from the EU and affiliated countries, we therefore recommend having, in addition to their study consent form, a separate data sharing consent form based on this template.

| Data user agreement
As part of information on how data will be shared, we recommend using a data user agreement (DUA) rather than a license, and a template DUA is also provided. Both, the consent and the DUA, are avail- the ones deciding who has access to them (Bishop, 2016). Having said that, there are also practical and legal reasons for not using automated systems, for example, how to ensure the identity of a signatory of the DUA. If the DUA is not correctly signed by a duly identified controller, then this may render the DUA legally invalid. There are, however, solutions to this as well, for example, using electronic signatures or registered user accounts.

| DISCUSSION
The Open Brain Consent project aims at facilitating human brain imaging data sharing. By sharing these data as openly as possible, researchers are confronted with ethical and legal issues. While ethical issues are internationally recognized and discussed, they are legally translated differently across countries creating confusion. Here we tried to reconcile these two aspects by offering two generic consent template forms that should help with the law in most situations.
Recent technological advances, not only in gathering data and linking databases, but also from statistical modeling and machine learning, increase the risk of re-identification of pseudonymized data.
As a result, it is essential to provide up-to-date information to research participants about data privacy (both privacy risks and right of control) which are included in the consent forms. Within the EU context, data that were previously thought to be anonymous are now considered personal. Although pseudonymization of biomedical data is still necessary and encouraged, it does not change the data status from personal to anonymous. Thus, compliance with the GDPR is required and, depending on national regulation, secured access (with or without a DUA) might be necessary. We provide information/ consent templates and a DUA template for these different cases, which we believe will improve researchers' likelihood of getting approval from their institutional review boards/ethics committees to share brain imaging data on web-serviced data repositories.
More recent data platform technologies rely on distributed data storage and/or processing models. A data set collected at multiple sites could be stored and processed at multiple locations, and yet accessed via a single query given a user is authorized to access the data (see e.g., http://datalad.org). It remains to be seen how a DUA could be implemented for such a distributed model. In other cases, data analysis can be performed (with local or remote execution) using algorithms implementing federated learning (Sheller et al., 2020) and differential privacy concepts (redaction threshold, noise addition, query limitations, Plis et al., 2016). In such scenarios, privacy concerns are greatly reduced and the consent template should be modified accordingly, in particular regarding data confidentiality. Finally, other initiatives rely on local data processing and sharing of aggregate/ derivative data only (Plis et al., 2016;Thompson et al., 2014). If individuals cannot be singled out in the shared results, a DUA is not necessary since raw/individual data remain with the data processor and re-identification becomes impossible.
While we believe standardized templates such as these from the Open Brain Consent working group play an important role in advancing transparent research practices, they do not provide a complete solution to the complex challenges involved in sharing research data.
For example, are data from brain imaging techniques other than MRI also at risk or re-identification? Since many brain imaging data sets include various demographics, clinical metadata, and perhaps even multimodal imaging data, these are likely at risk too.

CONFLICT OF INTERESTS
We declare no conflict of interest related to this work.

DATA AVAILABILITY STATEMENT
All material used here is distributed freely under CC-BY license.