An article by Melissa Gymrek and colleagues, published this January in Science, described how the researchers used surname inferences from commercial genealogy databases and Internet searches to deduce the identity of nearly fifty research participants whose supposedly private data were stored in large, publicly available datasets. This news comes just months after the Presidential Commission for the Study of Bioethical Issues published a report that expressed serious concerns about personal privacy and security in whole genome sequencing. The bioethics commission (on which we serve as chair and vice-chair) highlighted the importance of reconciling the enormous public benefits anticipated from research in this area with the potential risks to individuals’ privacy, and it offered several policy proposals to help balance the potential of scientific progress with privacy and respect for persons.
The human subjects research protections laid out in the federal regulations are triggered by the identifiability of data. The participants in the Gymrek et al. study were not “readily identifiable”; however, the data proved far more easily identifiable than expected. With rapidly evolving technology, a precise definition of that notion may be impossible. But if we move the debate from the rhetoric of identifiability to the ethical principles of public beneficence and the centrality of respecting all persons, we find that the real ethical focus must be on promoting generalizable progress while at all times respecting individual privacy.