Consent and anonymization in research involving biobanks

Differing terms and norms present serious barriers to an international framework


Biological samples—such as tissues, blood and cells—are an increasingly important tool for research into human diseases and their genetic and physiological causes. To ease their storage and access, many of these samples are now stored in biobanks. The number of human biological samples in such collections amounted to several hundred million in 1999 in the USA alone—about one sample per US citizen (Eiseman, 2000; Azarow et al, 2003)—and is increasing rapidly. Three-quarters of the clinical trials that drug companies submit to the US Food and Drug Administration (Rockville, MD, USA) for approval now include a provision for sampling and storing human tissue for future genetic analysis (Abbott, 2003). At the same time, there is a boom of population biobanks, as more and more countries establish new sample collections (Kaiser, 2002). Among the best known are: the Icelandic Health Sector Database; the Estonian Genome Project; the UK Biobank; the CARTaGENE Project in Quebec, Canada; the Banco Nacional de ADN in Spain; the International HapMap Project; and several US biobanks, such as the National Children's Study, the Marshfield Clinic's Personalized Medicine Research Project and the National Health and Nutrition Examinations Surveys.


This boom of biobanks has spawned a ‘boomlet’ of regulations and guidelines, which has created controversies, particularly about the importance and definition of informed consent. The consent of participants is usually required before biobank samples can be used in research, but the nature of this consent, and how it is obtained, vary widely. Many European guidelines take the view that general consent is acceptable to use samples for future, as yet unspecified, research projects; US and Canadian policy follows a more rigorous standard of consent. Until 2004, both Europe and the USA considered coded and linked anonymized samples—in which a code links the sample to its donor—as identifiable and therefore requiring participants' consent to future use. However, in 2004, the US Office for Human Research Protections (OHRP; Rockville, MD, USA) reduced the domain requiring consent to use samples and/or data that are identifiable, thus expanding the definition of non-identifiable samples to include those that have been coded. The growing lack of international consensus interferes with the efficiency of biomedical research that makes use of biobanks in several countries. We therefore argue in favour of using general consent, together with the right to opt out and approval by Institutional Review Boards (IRB), and against an enlargement of the term ‘non-identifiable’.

The term biobank is relatively new. It appeared in PubMed for the first time in 1996 (Loft & Poulsen, 1996) but was not used with any frequency until 2000. Although the term is used to describe various biological repositories, it originally referred to large population banks of human tissue and related data. In this article, biobank refers to any collection of human biological material—organs, tissue, blood, cells and other body fluids—that contains at least traces of DNA or RNA that would allow genetic analysis.

The growing lack of international consensus interferes with the efficiency of biomedical research that makes use of biobanks in several countries

The storage of tissue samples and data either linked to the samples or derived from them needs to be clearly distinguished. These data comprise information about the donor of the material, such as demographic characteristics, the type of disease associated with the sample, the outcome of the disease, treatment and so on. In addition, DNA and RNA represent information, which has led to the terms ‘genetic database’ and ‘population database’ that are regularly used as synonyms for the term ‘biobank’.

Not only have the ethical issues related to these biobanks raised extraordinary passions (Barbour, 2003) and stimulated an increasing number of publications, but also the creation of new biobanks and the expansion of existing repositories have spawned new guidelines. One direct reaction to the heated controversies surrounding the Icelandic biobank is the Declaration on Ethical Considerations Regarding Health Databases, produced by the World Medical Association (WMA; Ferney-Voltaire, France; WMA, 2002). Other examples are guidelines from the UK Medical Research Council (MRC, 2001), the US National Bioethics Advisory Commission (NBAC, 1999), the Council of Europe Committee of Ministers (COE, 2006), OHRP (2004) and the Australian National Health and Medical Research Council (1999). Various scientific associations put together their own guidelines about DNA and tissue banking, and the Council for International Organizations of Medical Sciences, recognizing the importance of biobanks for epidemiological research, revised its guidelines to integrate relevant issues from the biobank debate (CIOMS, 2005). The United Nations Educational, Scientific and Cultural Organization (Paris, France) adopted the International Declaration on Human Genetic Data in October 2003 (UNESCO, 2003), and France, Germany, Canada and Switzerland have all issued their own guidelines for biobanks or genetic databases (CCNE, 2003; Nationaler Ethikrat, 2004; Commission de l'Éthique de la Science et de la Technologie, 2003; Schweizer Akademie der Medizinischen Wissenschaften, 2006).

These guidelines contain clearly divergent recommendations in important areas, which interfere with international collaboration. Not only do different systems exist for the collection of data and the processing of samples, but also the guidelines reflect fundamentally different ethical frameworks (Knoppers, 2005). Consequently, scientists have criticized the little that has been done to maximize the potential of biobanks by ensuring that samples and information can be shared between them (Pearson, 2004). What are the reasons for this profusion of guidelines, and why is it apparently so difficult to devise a single universal framework? As such a framework exists for clinical research ethics, why is the regulation of biobanks so varied?

Not only do different systems exist for the collection of data and the processing of samples, but also the guidelines reflect fundamentally different ethical frameworks

It should be noted that the most important ethical questions are different for prospective biobanks as compared with existing biobanks, which contain samples stored before the discussions on ethical issues started. A characteristic of most prospective biobanks is that samples and data are collected for long-term future use, not just for a single project. Typical examples are the UK Biobank and the Marshfield Clinic's Personalized Medicine Research Project in Wisconsin, USA, both of which are used to study gene–environment interactions. Typical research projects that make use of these biobanks will use DNA from blood or other tissues, data from the participants' present and future medical records, and data from screening questionnaires or physical and laboratory examinations. This is combined with information about lifestyle and environmental factors that can be regularly updated by sending participants new questionnaires. When establishing such collections of samples and related data, it is often impossible to anticipate what studies might emerge, which leaves the matter of participants' consent to such future studies very much in the air. Indeed, a major ethical problem for prospective biobanks is how to assure participants' consent when it is not known what they are consenting to in terms of future research. The question of the importance and meaning of informed consent is one main reason why international guidelines on biobanks lack any consensus.

The doctrine of informed consent has been a central component in research ethics since human-rights abuses—such as experiments on concentration camp inmates in Nazi Germany, and the Tuskegee Syphilis Experiment in which US physicians left victims untreated to study the course of the disease—resulted in worldwide abhorrence and regulation. The idea of informed consent—the requirement to inform participants in a research study of all planned experiments—has accordingly become the gold standard of research ethics (Kegley, 2004).

However, when research participants provide tissue and information to prospective biobanks, they cannot give informed consent to future research projects that have yet to be specified. Consequently, according to classical research ethics (Annas et al, 1995), participants should be contacted to give consent for each new research project after having been informed about the details. This approach is not only costly (Korn, 1999), but also endangers the scientific value of the entire biobank project, as it is highly probable that a considerable percentage of participants will be lost for future studies. Either participants cannot be located or they do not respond for various reasons, including the simple nuisance of reading all the details and reacting to repeated letters asking for new consent.

Many European guidelines take the view that general or broad consent, although distinguished from blanket consent, is acceptable for “unspecified future research use” of samples (CDBI, 2006). For example, the Council of Europe's Steering Committee on Bioethics states in an explanatory memorandum that “When biological materials of human origin and personal data are collected it is best practice to ask the sources for their consent to future use, even in cases where the specifics of the future research projects are unknown” (CDBI, 2006). German guidelines similarly endorse general consent (Nationaler Ethikrat, 2004), as do the recommendations of the UK Human Genetics Commission (2002) and laws in Sweden, Iceland and Estonia, in which a “broad description of the purpose is allowed” (Kaye et al, 2004).

This form of general consent is considered acceptable if two conditions are fulfilled: the approval of all future projects by a research ethics committee or “competent body” (COE, 2006) and the participants' right to withdraw samples at any time. The European Society of Human Genetics writes “… individuals may be asked to consent for a broader use. In that case, there is no need to recontact individuals although the subjects should be able to communicate should they wish to withdraw” (European Society of Human Genetics, 2003). The document describes withdrawal, sometimes also called ‘opt out’, as “[i]ndividuals should be given the right to withdraw at any time from the research, including destruction of their sample”. Similarly, the Recommendation of the COE Committee of Ministers states that individuals have the right to withdraw consent at any time (COE, 2006). The Ethics and Governance Framework of the UK Biobank (2003a) has built in these three elements: general consent, approval by an ethics committee and the right to withdrawal.

…a major ethical problem for prospective biobanks is how to assure participants' consent when it is not known what they are consenting to in terms of future research

Clearly, this approach, which can be dubbed the ‘European solution’, changes classical health research ethics. When it comes to biomedical research using biobanks, classical informed consent is abandoned in favour of general consent—a less strict standard. Some Asian countries have a similar approach: Japanese guidelines, for example, contain the idea of “comprehensive consent” (Council for Science and Technology, 2000). Not so in the USA.

Although there is some support for general consent (Grizzle et al, 1999), the prevailing opinion in the USA maintains the classical standard of informed consent, which can be called the ‘American solution’. In the USA and Canada, the model that is most often recommended is so-called multi-layered consent, which asks research participants to make different choices on a detailed form. There is a tendency to obtain limited consent, related to one disease or to any specific description of future research projects. An example is the Framingham Heart Study, directed by the US National Heart, Lung, and Blood Institute (Bethesda, MD, USA), which obtains consent for DNA testing. Following legal advice, additional consent forms were designed to obtain new, written informed consent from all participants to allow RNA testing.

As one would expect, the strict requirement of informed consent is a burden for research. It is thus not surprising that in the American context, there has been a search for alternative solutions. In 1999, the guidelines of the US National Bioethics Advisory Commission (NBAC) proposed, among other things, a strategy of waivers, the criteria for which are already defined by federal regulations. According to the proposal, the requirement of informed consent can be waived if: “1) [t]he research involves no more than minimal risk to the subjects, 2) [t]he waiver or alteration will not affect adversely the rights and welfare of the subjects, 3) [t]he research could not be practicably carried out without the waiver or alteration, and 4) [w]henever appropriate, the subjects will be provided with additional pertinent information following their participation” (NBAC, 1999).

However, a waiver alone is not sufficient to forego obtaining participants' consent. If a waiver is granted, “it is still appropriate to seek consent in order to show respect for the subject, unless it is impracticable to locate him or her in order to obtain it” (NBAC, 1999). As an “additional measure of protection”, the NBAC mentions the possibility for donors to withdraw from a study that has been granted a waiver of informed consent.

In 2004, the US Office for Human Research Protection proposed a different solution by broadening the definition of ‘non-identifiable’ (OHRP, 2004). US federal regulations contain similar provisions to the Declaration of Helsinki, which states that “Medical research involving human subjects includes research on identifiable human material or identifiable data” (WMA, 2004). It follows that any research using non-identifiable samples does not create an obligation to obtain informed consent and approval of the protocol from an IRB or a research ethics commission.

What does ‘identifiable’ mean in the context of biomedical research? Again, European and North American standards differ. An analysis of various guidelines for a definition of ‘identifiable’ reveals a multitude of different terms (see sidebar; Knoppers & Saginur, 2005). Almost every guideline uses separate terminology, although there are some traditions: for example, the Council of Europe's recommendation adopts terminology from previous guidelines of the UK Medical Research Council (MRC), which is not surprising given that the primary author of the first CDBI draft works with the MRC. In the American tradition, the OHRP uses terminology proposed in the NBAC guidelines. Clearly, there are communication barriers where the same term is used with a different meaning in different guidelines (Fig 1), and readers should examine with caution how the terms ‘anonymized’ and ‘coded’ are defined in different texts.

Terms used for either data or samples

Non-exhaustive list of terms used in the literature to describe different degrees of anonymization of samples and data relevant to biobank research.

Completely anonymized

Unlinked anonymized

Irréversiblement anonymisé

Irretrievably unlinked to an identifiable person

Anonymously coded






Permanently de-linked

Réversiblement anonymisé

Not traceable



Identifiably linked



Unlinked to an identifiable person





Directly identified

Fully identifiable


Linked to an identifiable person


Personal data

Figure 1.

Communication barriers: the same terms are used with different meanings in various guidelines and journal articles (see COE, 2006; OHRP 2004). COE, Council of Europe Committee of Ministers; OHRP, US Office for Human Research Protections.

The terminology used by the European documents (CDBI, 2006; COE, 2006) is on the basis of five levels of anonymization for human samples: anonymous, unlinked anonymized (French translation: irréversiblement anonymisé), linked anonymized (réversiblement anonymisé), coded and identified. If samples contain any trace of DNA, they are not truly anonymous, because it is always possible to identify the donor through DNA fingerprinting—comparing DNA sequences at only 30–80 statistically independent single nucleotide polymorphisms will uniquely define a single person (Lin et al, 2004). Anonymous is therefore an appropriate term only for archaeological samples. The term ‘anonymized’ means that biological material is stored alongside associated information, such as the type of tumour, medical treatment, donor's age and so forth, but all information that would allow identification of the research participant or patient is stripped, either irreversibly (unlinked anonymized) or reversibly (linked anonymized). In the case of linked anonymized samples, identification is possible by a code, to which researchers or other users of the material—as part of the definition of the term ‘reversibly/linked anonymized’—do not have access. Coded samples have the same characteristics as linked (reversibly) anonymized samples, the only difference being that researchers and users have access to the code. Finally, samples are considered to be identified if the information that allows identification—name, address and so on—is associated directly with the tissue, such as when the patient's nametag is attached to the sample. This is, for example, how pathology departments usually store clinical samples.

In European documents, the term anonymized could mean either unlinked or linked anonymized. In most US and English Canadian texts, anonymized refers only to unlinked anonymized samples. Interestingly, however, the guidelines from Quebec have adopted the terminology from the French translation of former versions of the CDBI guidelines (CDBI, 2002) and distinguish between reversibly and irreversibly anonymized samples. According to the European terminology, ‘coded’ always means that researchers or other users have access to the code, whereas the OHRP uses the term to refer to what Europeans and French Canadians call ‘linked anonymized’ samples—a link exists but researchers do not have access to the code (Fig 1; OHRP, 2004). These discrepancies are not limited to different definitions of the same term, but even more seriously, involve the whole regulatory framework.

Until the OHRP revised its guidelines in 2004, all important regulations in the USA and Europe agreed on one point: coded and linked anonymized samples were considered to be identifiable both by the NBAC (1999) and in Europe (CDBI, 2002; COE, 2006), because in both cases a link exists. Only if this link is irreversibly destroyed are samples and data considered unidentifiable and, thus, research using such samples was not considered human subject research in accordance with the Declaration of Helsinki. However, in their new guidance, the OHRP enlarged the definition of non-identifiable in the following way: “OHRP considers private information or specimens not to be individually identifiable when they cannot be linked to specific individuals by the investigator(s) either directly or indirectly through coding systems” (OHRP, 2004). This is the case if “the investigators and the holder of the key enter into an agreement prohibiting the release of the key to the investigators under any circumstances, until the individuals are deceased (note that the [Department of Health and Human Services] regulations do not require the IRB to review and approve this agreement).” It is also the case if “there are IRB-approved written policies and operating procedures for a repository or data management center that prohibit the release of the key to the investigators under any circumstances, until the individuals are deceased” or if “there are other legal requirements prohibiting the release of the key to the investigators, until the individuals are deceased.”

OHRP also specifies that “[t]his guidance applies to existing private information and specimens, as well as to private information and specimens to be collected in the future for purposes other than the currently proposed research.” The advantage of enlarging the definition of non-identifiable is obvious: researchers can maintain high standards of informed consent, but are provided with a simple means to escape strict regulations by entering agreements that prohibit them from access to the code, without having to destroy the link. Through these simple arrangements, any type of future research is authorized without the need for consent or IRB approval.

As we have shown, the challenge produced by biobanks is immense: after more than 50 years of classical health research ethics, regulatory agencies have begun to question fundamental ethical milestones. Europeans have abandoned informed consent in favour of general consent and Americans have enlarged the definition of what constitutes non-identifiable samples and data. As the two continents have chosen different ways to change research ethics, a global framework is becoming impossible.

What are the possible ways to solve this dilemma? We propose that an analysis of the arguments in favour or against each solution will help to find the most acceptable one. Enlarging the definition of ‘non-identifiable’ clearly facilitates research: there will be no costs to obtaining informed consent, and no delays as there will be no need to obtain IRB or research ethics committee approval. Finally, and perhaps most importantly, it is possible to maintain the high standard of informed consent at least for identifiable data and tissue.

…after more than 50 years of classical health research ethics, regulatory agencies have begun to question fundamental ethical milestones

Conversely, there are a considerable number of arguments against an expanded definition of ‘non-identifiable’ and in favour of a less elevated standard of consent. First, biomedical research involving biobanks implies risks for identifiable groups and communities, because the anonymity of the individual does not imply the anonymity of groups. Second, if researchers use coded samples without having access to the code—to use the European terminology, if they use linked (reversibly) anonymized samples—as suggested by the OHRP, this means that a link exists. Using this link, it is possible to contact the donors at any time. Those who have access to the code might find it difficult not to contact the donor if it could prevent future harm—for example, by alerting them to a hidden medical condition.

Third, there is the theoretical possibility that the code could be broken for less justifiable reasons. As a result, the ethical questions are not the same as in the case of unlinked (irreversibly) anonymized samples, and simply circumventing present regulations is not an adequate response to this problem. Fourth, approval by an IRB or ethics committee for future research projects is desirable to ensure efficient use of resources such as tissue. Fifth, one might question the sense of a solution whose main goal is to escape existing regulations so that most biobank research can take place without further surveillance.

Last, but not least, the preferences of the tissue providers must be considered. Several empirical studies have shown that 90% of patients and research participants find general consent adequate (Wendler, 2006; Stegmayr & Asplund, 2003). For instance, many participants that are eligible for the UK Biobank have endorsed general consent, because they do not want to be re-contacted repeatedly (UK Biobank, 2003b). Empirical studies also show that a substantial percentage of research participants and patients want to be able to approve the use of their samples, even if they are anonymized (Wendler & Emanuel, 2002). Existing and ongoing legislation reflects these preferences. In several European countries, the prevailing opinion is that a form of consent should be required to use anonymized samples, at least for genetic testing. Legislation in The Netherlands, Switzerland and France, and pending legislation in Belgium, contain provisions that require informing tissue donors and giving them the possibility to opt out as a prerequisite for any re-use of their anonymized samples (Trouet, 2004). An example of such a law is the Swiss Federal Law on the Genetic Testing of Humans. Article 20 of this law, which regulates the re-use of biological material, states that a sample shall be re-used only for the purposes already approved by the donor. Genetic tests shall only be carried out using biological material obtained for a different purpose if the material is anonymized and the donor (or his/her legal surrogate) has been informed about his/her rights and has not expressed any opposition to its re-use (Swiss Confederation, 2004).

…one might question the sense of a solution whose main goal is to escape existing regulations, so that most biobank research can take place without further surveillance

In summary, the arguments seem to be in favour of using general consent together with the right to opt out and IRB approval, and against an enlargement of the term ‘non- identifiable’. It is evident that the lack of international consensus about the regulatory framework of biobanks interferes with the efficiency of research and delays projects (Barbour, 2003; Frank, 1999) or even halts them (Abbott, 2004; Normile, 2003). Thus, international collaboration becomes increasingly difficult. To maximize the benefit of biobanks and genetic databases for both research and public health, a single ethical framework is essential, which requires a harmonization of the terminology about anonymity. The arguments as laid out above speak in favour of allowing general consent and against an enlargement of the term ‘non-identifiable’. There is a clear need to discuss and formulate such a framework on an international level. The only way to achieve future progress in biobanking is through harmonization of the key terms and key norms.


Funding was provided by the Swiss National Science Foundation. This article has benefited from the first author's work as a member of a research collaboration between the Institute of Bioethics at the University of Geneva, the Institute for Medical Ethics of the Charité, Berlin, and the Department of Ethics, Trade, Human Rights, and Law (WHO), funded by the Geneva International Academic Network, entitled ‘Human Genetic Databases: towards a global ethical framework’, and in particular from conversations with the members of this project: Nikola Biller-Andorno, Andrea Boggio, Alex Capron, Agomoni Ganguli, and Alexandre Mauron.


  • Image of creator

    Bernice S. Elger is in the Département de Médecine Communautaire, Médecine Légale (Institut de Bioéthique), Centre Médical Universitaire in Geneva, Switzerland. E-mail:

  • Image of creator

    Arthur L. Caplan

    Arthur L. Caplan is Director of the Center for Bioethics, University of Pennsylvania Medical School, Philadelphia, PA, USA E-mail: