Ethical aspects of participation in the Database of Genotypes and Phenotypes of the National Center for Biotechnology Information

The Cancer and Leukemia Group B Experience




The rapid pace of genetics research, coupled with evolving standards for informed consent, can create ethical challenges regarding future use of tissue or information from completed clinical trials. The Cancer and Leukemia Group B (CALGB) Oncology Cooperative Group was faced with an ethical dilemma regarding sharing genetic data from a completed genome-wide association study (GWAS) that was conducted as part of a large, multicenter breast cancer clinical trial with a national database: the Database of Genotypes and Phenotypes National Center for Biotechnology Information (dbGaP).


The CALGB Ethics Committee conducted a series of multidisciplinary meetings and teleconferences involving patient advocates, bioethicists, clinical researchers, and clinical oncologists to evaluate the ethical issues raised by this case and to identify lessons for improving informed consent to future genetics research in oncology trials.


The Ethics Committee recommended that GWAS data be provided to dbGaP consistent with documented consent for future use of tissue among trial participants. Ethical issues, including adequacy of informed consent to future research, limitations of privacy in modern genetics research, the potential impact of population-based genetics research on health disparities, and recontact of research participants for clinical care or further research, were identified as major ethical considerations in this area.


Although modern standards for informed consent should not prohibit research or sharing of data consistent with participant's intent and the public interest, there is an urgent need for national consensus on the appropriate use of archived tissue and standardized informed consent for future research among cancer clinical trial participants. Cancer 2012. © 2012 American Cancer Society.


In the rapidly moving field of cancer genetics, previously collected research samples can provide valuable tools to define the molecular basis of disease and predict response to therapy. Ideally, the participants in any study would provide informed consent both to the general principle of future use of their tissue and to the specific research that is planned, with an understanding of potential risks and benefits from that research. However, not all questions that scientists may want to address in the future and not all potential risks or benefits of future research can be anticipated at the time research samples are obtained. Although standards for informed consent for future use are evolving,1, 2 researchers can face uncertainty when considering the use of research samples or data collected from prior studies with inadequate or incomplete consent by today's standards.

This issue emerged regarding the use of specimens and genetic information obtained from a large randomized trial of adjuvant chemotherapy involving patients with early stage breast cancer conducted by the clinical trials cooperative group Cancer and Leukemia Group B (CALGB).3 Consent for participation in a correlative genome-wide association study (GWAS) was obtained from trial participants. The consent offered participants a chance to opt in or out of the GWAS component of the research and specified that the research involved preserving patients' DNA for use in future research studies to learn more about cancer. However, the consent did not indicate that data obtained by DNA analysis could be entered into a large database accessible by approved investigators from around the world.

Because of the enormous amount of information collected using GWAS and other similar, high-throughput genetic technologies, coupled with the potential for this information to provide new insights into disease etiology and response to therapy, in 2006, the National Institutes of Health (NIH) established a genetic data repository, the Database of Genotypes and Phenotypes of the National Center for Biotechnology Information (dbGaP), and required that all investigators receiving NIH support for GWAS studies contribute their data to this common resource. In establishing this resource, the NIH sought to develop a common repository for genetic information that would encourage collaboration and further scientific discovery. This mandatory data-sharing requirement has obvious scientific benefits, but it raises novel ethical and social issues regarding the balancing of research support through access to genotype-phenotype data and the protection of individual autonomy and privacy for those who consent to participate in protocol-specific genetic studies.

The optimal means of educating and informing participants regarding genetic and other forms of biospecimen research is the subject of considerable debate both nationally and internationally.4 Ideally, individuals who participate in studies, including GWAS, would be informed that data may be provided to dbGaP or similar data-sharing resources and would be given the opportunity to provide specific informed consent for this potential future use of their genetic information. Consent language as well as appropriate standards and procedures in this area have been refined and are undergoing continued improvement by engaging patient advocates, ethicists, and clinical researchers.1, 5-16 Recently proposed changes to the common rule calling for a standardized process for consent to future use of tissue and clarifying that new standards should not be applied to previously collected research samples may help alleviate some of the uncertainty in this area.17 Currently, however, there is a lack of clearly defined standards, and an unresolved challenge involves the management of GWAS results from trials that were conducted before the inclusion of specific consent for dbGaP data submission. When older consent language describes the potential for future research but does not adequately address all current concerns regarding privacy and limits of deidentification with regard to genetic data, should information be shared; and if so, under what conditions?

The CALGB Ethics Committee confronted these issues when asked to provide guidance on the ethics of sharing data within the dbGaP from an NIH-supported GWAS trial that was conducted before implementation of the current standards for detailed consent language on biospecimen research, genetics, and future use. In the current article, we present our view of this challenge as well as the rationale for our proposed solution, and we suggest a framework for considering this issue in future cases.


The Science of Genome-Wide Association Studies and the Role of the Database of Genotypes and Phenotypes

Genome-wide association studies

A GWAS is a study of the association between specific germline genetic variables and observable traits (ie, weight or height), conditions (eg, cancer or diabetes), or clinical outcomes of interest (eg, response or resistance to a given therapy). A GWAS relies on the identification of single nucleotide polymorphisms (SNPs) that define variability among individuals and can be associated with the risk of developing complex diseases. For example, GWAS research has identified SNPs associated with an increased risk of developing breast cancer and prostate cancer.18-22

A GWAS consists of 2 phases. In the discovery phase, a set of SNPs that correlate with the phenotypic variable of interest are identified, and these are subsequently tested in the validation phase among a separate study population.23 Because a GWAS evaluates potential associations between phenotypes of interest and as many as 500,000 or more SNPs in each study, significant rigor must be applied to the statistical analysis. In addition, because susceptibility loci can be uncommon, a GWAS may require samples from thousands of individuals with a phenotype (cases) to be compared with thousands of individuals without this phenotype (controls).23 Large, well characterized patient cohorts are required to accurately identify SNPs associated with a disease or other clinical characteristics and to avoid false-positive results.24

Therefore, clinical trials conducted by the National Cancer Institute-funded cancer cooperative groups are ideal for GWAS, because data on disease, comorbidities, and outcomes are routinely collected and are available for linkage to the genomic characteristics of the participants. In addition, participants are treated in a controlled manner with designated agents, increasing the value of GWAS to guide future research.

Database of Genotypes and Phenotypes

The dbGaP was developed by the NIH to collect and house the vast amount of new genetic information generated through GWAS and similar studies. The policies governing dbGaP facilitate data sharing that is required to advance research. For example, the dbGaP permits open access to aggregate information, which can be used to generate hypotheses and assess whether information at the individual patient level will be useful for further research. Access to individual patient genomic results that can be linked to a phenotype is limited to investigators who have an approved protocol to use this level of dbGaP data. Researchers who obtain access to individual deidentified information must be authorized, obey data-use restrictions, and comply with Data-Use Certification requirements.25 The DbGaP contains 4 types of data: 1) study documents; 2) phenotype; 3) genetic data from individuals, including pedigree information and fine mapping results; and 4) statistical analyses, including association and linkage analyses when available.25 Because GWAS studies generate a large volume of information26 that is valued for a wide range of scientific research, open-access publications of association results have been encouraged.27

Because of their size and depth of information at the individual patient level, dbGaP and other large national and international data banks are essential tools for understanding how genes influence our physical and psychological traits.13, 28 These analyses can be used to improve patient care. For example, the identification of patients at high risk for developing cancer could lead to advances in screening or chemoprevention.18-20, 22, 29-32 In addition, cancer treatment may be improved when correlations between genotype and clinical outcomes identify subgroups of patients who respond differently to conventional therapy, and these differences may be exploited to individualize choice of therapy or to select optimal doses based on individual differences in drug metabolism or other factors.33-36

Ethical issues raised by genome-wide association studies and the role of the Database of Genotypes and Phenotypes

Although the science of GWAS and the collaborative potential offered by dbGaP should contribute to advances in our understanding and treatment of cancer and other diseases, several ethical issues must be considered.


Issue 1: Informed Consent for Future Unspecified Research

It is very difficult for informed consent procedures to adequately address future research because of the rapid pace of technical progress in science. Important questions may arise long after patients initially provide consent and enroll in a clinical trial. Research questions, techniques, and even the potential risks to participants cannot always be completely defined in advance. Are we then limited to pursuing only those studies that can be well described at the time of consenting, or can we ethically inform study participants using a more generalized, 1-time consent for future research? Survey evidence suggests that many participants are comfortable with the concept of providing tissue for future, unspecified research.37 The NIH, CALGB, and many other organizations have evaluated policies for informing patients and have pursued the development of improved templates for biospecimen consent language.38-40

In a national study of US households, 84% to 90% of eligible participants consented to have their blood samples and DNA included in a national repository for genetic research (National Health and Nutrition Examination Survey), although participation was somewhat lower (73% to 83%) among women and African Americans.41, 42 Studies of research participants, family members, and Medicare recipients have demonstrated that >85% are comfortable providing informed consent to future unspecified research, supporting the development of consent forms offering a binary choice between authorizing and refusing participation to future unspecified research.37, 43, 44

Trinidad and colleagues specifically addressed public perceptions, beliefs, and attitudes regarding GWAS and the dbGaP mechanism in a 2008 focus group-based study among members of a Seattle health plan.45 The groups included research participants in a large dementia trial, surrogate decision makers for trial participants with cognitive decline, and health plan members unaffiliated with the study. This large study indicated that trial participants understood and were supportive of sharing deidentified outcome and genetic data in the interests of public benefit, efficient research, and potential benefits to future patients. Surrogate decision makers were supportive of study participant's preference for broad sharing of data but were less uniform in their support of this practice. Most focus group members were not in favor of providing repository data to for-profit entities.45 That study and others demonstrated broad public support, both for the general concept of consent to future unspecified use of biospecimens and genetic data and for the specific practices, such as data sharing, used by dbGaP.

Issue 2: Risks to Privacy in Modern Genetics Research

A second issue is how to ensure that research participants understand the potential for loss of privacy inherent in modern genetics research. The ability to conclude identifying characteristics from GWAS data raises important issues regarding consent for these studies and the sharing of information in a national database. The extent of data gathered during a GWAS makes the private information coded in the human tissue potentially identifiable. An individual can be identified within the large set of public data from a GWAS, using only a small subset of an individual's genome.46 Greenbaum et al noted that a unique individual can be identified with as few as 75 SNPs. Theoretically, information in the dbGaP could be identified through comparison with genetic information obtained from a personal item, a separate database (such as the National Geographic Genographic Project), tissue, or even by comparison with DNA from a relative.47

Germline genetic traits (eg, SNPs) are inherited from parents and shared with siblings and other family members.30 SNPs measure not only heritable events but can also identify race and geographic birthplace of the individual.48 Consequently, genetic data to be included in the dbGaP may have implications for others in the individual's family, racial, or ethnic group.49 A recent analysis of published aggregate results demonstrated that these data also may be used to reconstruct the original disease status of an individual participant.46, 50-53 Unfortunately there is no standard consent form language for all studies associated with genetic database research.11 Several approaches to informed consent are currently being considered, and each has potential advantages and disadvantages, as presented in Table 1.

Table 1. Categorization of Consent for Future Research
Types of Consent for Research on Biologic SamplesBenefitRiskReferences
Implied consent with an option to opt outMay facilitate research by increasing participation; passive consent assumes willingness to participate and requires action to alter this default optionResearch will be conducted without the clear, expressed interest and informed consent of the participant; some individuals may not recognize or understand the opt-out optionvan Diest 2002,54 Coebergh 2006,55 Furness 2003,56 Bryant 200857
One-time general consent for future researchAllows greatest opportunity to use banked tissue or genetic information to address important research questions as they arise; studies suggest strong public support for 1-time general consent for future researchParticipants are not aware of specific uses of their tissue or genetic information; any discussion of risks of research is necessarily vagueFurness & Nicholson 2004,8 Rothstein 2005,14 Chen 2005,37 McQuillan 2003,41 Wendler 2006,58 Jack & Womack 2003,59 Stegmayr & Asplund 2002,60 Wheeler 2007,61 Malone 2002,62 Hamajima 199863
One-time specific consent with opportunity to re-contact for future researchAllows research within a prespecified area or question; greater opportunity to discuss potential risks and benefits with participants; facilitates opportunity to address additional questions by obtaining consent for recontact at time of initial tissue collectionMay be burdensome to patients and researchers, and potentially requires contact multiple times over many years; potential to cause distress in family members if individual is dead when investigators seek consent for new studyHelft 2007,12 Caulfield 200364
Separate consent at the start of each research projectInsures informed consent for each studyGreatest burden for investigators and may hamper future research; may raise ethical challenges regarding providing research results of unclear clinical significance at the time of recontactFurness & Nicholson 2004,8 Vermeulen 200965

Issue 3: Potential for Harm to Under-Represented Populations

For research that may define disease risk or benefit from therapy, it is important to be aware of those who participate and those who decline participation in a study. If a patient population is under-represented, then the results of genetic studies may not be applicable to that specific group.66 This can result in disparities in health outcomes if a genetics-based strategy for screening or treatment is adopted but is confined to, or only benefits, a subgroup of patients with specific genetic characteristics.67 Failure to include a diverse patient population in a GWAS also may miss opportunities to identify genetic determinants of response to therapy among minority populations that can change outcomes, as demonstrated for Native American children who were included in a GWAS for acute lymphoblastic leukemia.68 In that study patients, with Native American genetic ancestry had a greater risk of relapse that was eliminated among those who received treatment with an 8-week course of delayed intensification chemotherapy.

Issue 4: Previously Unidentified Health Risks Uncovered as a Result of Study Participation

Finally, GWAS and other genetic studies always raise the question of what to do if knowledge of a previously unrecognized health risk is identified among participants in the course of research. If a patient consents to future use of their blood sample and research later determines that they carry a genetic risk for early development of Alzheimer disease, then should they be contacted? Does the answer change if we identify a risk for disease with clear screening or treatment interventions as opposed to 1 that simply conveys prognostic information?39 The clinical validity and relevance of the information, the specifics described in the process of informed consent, and the preferences of the research participant are all important considerations.69-71

The purpose of research is not to provide results to individual participants but to advance science. When considering the disclosure of results to individual participants, the most important question is whether the information generated in the research has clinical significance. In a comprehensive review of this subject, Botkin concluded that research data that have not been validated or that have unknown or no clinical significance do not need to be disclosed.49 However, these decisions must be made on a case-by-case basis.72, 73 If disclosure of research results is sought by investigators, then institutional review board approval must be obtained, and such results should be disclosed by a clinician or investigator who is competent to discuss any clinical implications with the recipient at that time.74

Issue 5: Specific Challenges With Older Data

While consent language and procedures are being addressed prospectively, what do we do with data from studies with inadequate consent by today's standards? On the most basic level, we must choose between using the data in the interest of science and potential benefit to future patients and withholding the data in the absence of clear consent by the research participants. In withholding data, theoretically, we may be protecting the interests of the research participants by honoring the letter of their consent; however, whether this truly best promotes the interests of those who consented to some form of future use of their genetic information is debatable. The stakes for this decision are particularly high when the question is whether or not to share genetic information in a national database (dbGaP) that expands both the potential benefits and risks of the use of genetic information.

The Case of Cancer and Leukemia Group B Trial 40101

In 2009, the CALBG Ethics Subcommittee was asked to provide an opinion concerning whether GWAS information from CALGB study 40101 could and should be provided to the dbGaP. This question emerged in part because of the establishment of an NIH policy in January 2008 requiring all investigators who receive NIH support for GWAS and similar studies to submit descriptive information about their studies to the NIH GWAS data repository, with exceptions granted only on a case-by-case basis. Although criteria for the approval of funding without participation in the dbGaP are not specified, the NIH has recognized that the language of informed consent from prior studies may be vague, may suggest a need for reconsent, or may preclude sharing with the dbGaP.75 Submitters to the dbGaP are compelled to identify any limits on research use of the data that are specifically set by the individual research participants through their informed consent. Although the Data Access Committee of the US National Center for Biotechnology Information provides stringent oversight of the dbGaP and controls who can view and use the data, it does not independently review the informed consent documents for individual studies that contribute to the database. The burden of ensuring that sharing data within the dbGaP is consistent with the informed consent of individual participants falls on those who conduct the initial research and analysis and the institutional review board responsible for oversight of the research.

CALGB 40101 was initiated in 2001, 5 years before establishment of the dbGaP. The consent form did not anticipate the requirement for a national data repository and, consequently, did not specifically request consent for sharing of the participants' data in this manner. The informed consent document, however, did specify participation in potential future genetic and pharmacogenomic studies that would be conducted by CALGB investigators. Specifically, it informed participants that, because CALGB could not possibly know which breast cancer studies may be appropriate in the future, CALGB would like to store the participant's DNA for future studies; that future investigators would have to apply to CALGB, have their proposed research study reviewed, and obtain the approval of CALGB; that there would be no charge for participating in future research studies; and that participation in the described study and/or in the DNA specimen bank for future studies would be entirely voluntary.

The question facing the Ethics Committee was whether sharing genetic and phenotypic data within the dbGaP would violate the rights of CALGB 40101 participants as research participants. To address this question, a multidisciplinary committee, composed of patient advocates, researchers, clinicians, and bioethicists, carefully debated the relative interests of research participants and their families, society, and future patients and also considered the interests and responsibilities of the CALGB. Although the committee assumed that there was substantial public interest in sharing of data within the dbGaP, it started with the premise that the primary obligation of CALGB was to the interests of the research participants, and that neither the organization's nor the public's interest in science would be served by violating their trust. The committee also considered the finding that the participants in question had voluntarily chosen to have their samples used for genetic analysis in the hopes of contributing to improvements in understanding and treating breast cancer and other diseases, and that failure to maximize the potential of their contributions also could be deemed a violation of their interests. Thus, the committee concluded that several issues are ethically relevant: 1) the terms and conditions of informed consent to the GWAS component of CALGB 40101, 2) the extent to which the potential risks and consequences of genetics research had been clearly explained to participants, and 3) the current feasibility or desirability of informing and reconsenting individual participants before sharing their data with the dbGaP.

On the first point, the consent for CALGB 40101 clearly stated that participation in the GWAS component was voluntary. Trial participants were given the opportunity to opt in or out of tissue sample collection for genetic analysis and were specifically asked whether samples could be used for future research. In addition to consenting at the beginning of the trial, they were advised of the right to discontinue participation at any time.76 They also were informed specifically that any future research would be reviewed and approved by CALGB.

Second, participants were informed of some potential risks of genetics research in terms of loss of privacy. In its review, the CALGB Ethics Committee determined that the information in the consent form was consistent with standards for genetics research at the time of the study. Despite this, the committee also observed that the participants in CALGB 40101 were not adequately informed of potential data sharing in the dbGaP, privacy risks, or potential for recontact regarding previously unidentified health risks, according to current standards.

On the third point, the Ethics Committee initially considered it feasible to recontact and reconsent a significant number of CALGB 40101 participants. This was because the population consisted of women with early stage breast cancer who had a relatively low risk of recurrence and who were undergoing ongoing follow-up with the study. Although it is recognized that recontacting research participants would be expensive, time-consuming, and potentially disquieting among patients or families who have experienced an adverse outcome, it would provide the greatest opportunity to ensure that the actions of researchers are consistent with the understanding of the research participants.77 Another very important concern, however, argued against the plan for reconsenting. If this were undertaken, then reconsenting inevitably would result in inclusion in the dbGaP of only a subset of data from CALGB 40101 participants; ie, those available for recontact and willing to provide informed consent. Seeking reconsent several years after the study intervention may select for those who were doing well from a disease or toxicity perspective or possibly may select for those who had symptoms of disease recurrence or severe toxicity. This potential for bias and a nonrandom impact of important clinical outcomes on willingness to participate in sharing data with the dbGaP made reconsent highly undesirable from a scientific perspective. Therefore, the committee concluded that researchers would need to include all of the data, or none, in the dbGaP submission.

In defining a course of action that was most consistent with the expressed consent of the research participants, the committee agreed on 2 major points. All agreed that CALGB 40101 study participants had donated tissues for future genetic research of a nature that was consistent with the goals of the dbGaP research resource. In addition, the committee concluded that there was no clear requirement for additional consent to share data within the dbGaP, provided that the terms for future use of data, including ongoing oversight by CALGB, would be honored. After careful review and deliberation, the committee unanimously agreed that sharing deidentified data with the dbGaP was consistent with the participant's consent to future research and consistent with the obligation of CALGB to pursue scientific benefit at minimal additional risk to the research participants. In addition to providing public benefit, the committee also recognized that participants or their families might directly benefit from knowledge gained in the course of wider use of these data, particularly because the majority of these patients were expected to be free of breast cancer recurrence and able to potentially benefit from any further advances in knowledge concerning prevention or treatment. The committee recommended informing CALGB 40101 GWAS participants that their data would be shared in a deidentified fashion with the dbGaP and inviting them to contact the investigators with any concerns or questions, because that this novel use of data was not clearly detailed in the initial consent forms.

The members of the CALGB Ethics Committee also affirmed that any future cancer clinical trial planning to submit participant data to the dbGaP must involve an informed consent process that fully informs participants of the risks, rationale, and requirements of this type of research. A collaborative effort among patient advocates, ethicists, and clinical researchers to create reasonable consent form language for future use of biospecimens is urgently needed, and such efforts are underway.78 Moving forward, the possibility remains that any current consent form may seem archaic and insufficient in the future; however, if future use is planned, then reasonable efforts must be made to try to inform today's participants of tomorrow's uncertainties.

In conclusion, this case review illustrates ethical challenges in ensuring informed consent for genomic research. The use of tissue or information from older studies for the dbGaP or for other scientific purposes unforeseen at the time of initial consent must be considered on a trial-by-trial basis. Key considerations for such evaluation are presented in Table 2. The initial consent form should be reviewed by a team of qualified individuals representing diverse perspectives (including patient advocates, ethicists, and researchers) and should be evaluated for consistency with the current proposed use of tissue. Although not all groups may have access to a multidisciplinary group for consultation on these issues, all dbGaP submissions must be approved by the responsible institutional review board, which will have community and scientific representation. The release of information from genetic research to a national database is strongly in the public interest. Although there are many potential benefits of enhanced access to research results, there are also risks. Thus, as genomic technology advances, there may be a time when individuals can be identified by their genomic information alone. This inherent risk of participation in modern genetic research must be disclosed in future studies.

Table 2. Key Considerations for Sharing Data From Older Studies
1. Is there potential societal benefit from sharing data from the study?
2. Is there informed consent for future use of tissue or data?
3. Is the uncertainty of risks from future unspecified research explained in the consent form?
4. Is potential for sharing data with other investigators explicitly acknowledged or prohibited?
5. Is data sharing consistent with the terms of the initial informed consent?
6. Is reconsent of participants in the initial study feasible?
7. Will scientific bias be introduced if a subset of participants decline to reconsent?
8. Will expressed interests of participants be better honored through data sharing or withholding data?

The recommendations of the CALGB Ethics committee were based on an effort to respect the rights and interests of research participants in the context of evolving standards and techniques for genetics research and in the absence of clear guidance on the use of genetic information from older studies. We present this case and our deliberations and recommendations in an effort to spark national debate on this subject. There is an urgent need to establish a national consensus on the use of archived specimens and to improve informed consent policies to address the use of specimens with future, unforeseen technologic advancements.


Dr. Peppercorn is supported by the Greenwall Foundation Faculty Scholars Program in Bioethics.


The authors made no disclosures.