Privacy and Information Policy
One traditional answer to the challenge of surveillance is found in the information science, computer science, legal, and ethics literature focused on privacy. Participatory personal data projects gather, store, and process large amounts of information, creating massive databases of individuals’ locations, movements, images, sound clips, text annotations, and even health data. Location information can reveal habits and routines that are socially sensitive (would you want such information shared with a boss or friend?) and may be linked with, or have an impact on, legally protected medical information. There are three major branches of literature that address privacy problems relevant to participatory personal data. The first are behavioral studies of users' interactions with personal data and data collection systems. The second are conceptual, ethical, and legal investigations into privacy as a human interest. The third are design and technical methods for ensuring privacy protections.
Much of our understanding of current privacy norms stems from work that analyzes people's privacy preferences and behaviors (Altman, 1977; Capurro, 2005; Palen & Dourish, 2003). Westin's (1970) foundational research benchmarked American public opinion on privacy. Over several decades, Westin used large surveys to confirm a spectrum from “privacy fundamentalists” (very concerned) to pragmatic (sometimes concerned) to unconcerned. More recently, Sheehan (2002) confirmed a similar spectrum among Internet users. The Pew Internet & American Life Project has produced several reports of privacy preferences based on large U.S. surveys of adults (Madden, Fox, Smith, & Vitak, 2007) and teenagers (Lenhart & Madden, 2007). These reports continue to find privacy concerns, even among teens (who popular wisdom assumes have abandoned privacy as a value). Pew finds, for example, that teens practice privacy-preserving behaviors such as limiting online information and taking an active part in identity management. A number of information science studies have attempted to describe such behaviors using more detailed scales for online privacy preferences. For example, both Yao, Rice, and Wallis (2007) and Buchanan, Paine, Joinson, and Reips (2007) suggest factors by which to measure online privacy concern. Yao et al. (2007) focus on psychological variables, while Buchanan et al. (2007) incorporate different social aspects of privacy such as accessibility, physical privacy, and benefits of surrendering privacy.
A persistent problem in such surveys of privacy preferences, however, is that individuals frequently report preferences that they do not act upon in practice. There is evidence that many privacy studies prime respondents to think about privacy violations, making them more likely to report privacy concerns (John, Acquisti, & Loewenstein, 2009). These studies also make problematic assumptions that people act on a rational privacy interest (Acquisti & Grossklags, 2008). Studies that observe people's real-world use of systems attempt to correct these problems. Raento and Oulasvirta (2008), for example, present results from field trials of social awareness software that used smart phones to show contacts’ locations, length of stay, and activities. The authors found that
… users are not worried not [sic] so much about losing their privacy rather about presenting themselves appropriately according to situationally arising demands. (Raento & Oulasvirta, p. 529)
This demonstrated concern for contextual privacy and identity management has been reiterated in both theoretical (Nissenbaum, 2009) and descriptive research (boyd & Hargittai, 2010). Nissenbaum (2004) labels this concern for fluid and variable disclosure “contextual privacy” and argues that its absence not only leads to exposure, but also decreasing individual autonomy and freedom, damage to human relationships, and, eventually, degradation of democracy. Nissenbaum (2009) suggests that individuals’ sense of appropriate disclosure, as well as understanding of information flow developed by experience within a space, contribute to individual discretion. Contextual privacy suggests that individuals may be willing to disclose highly personal information on social networking sites because they believe they understand the information flow of those sites (Lange, 2007).
Because of complexities and inconsistencies in individual privacy behaviors, policy and legal researchers have sought to move away from user choices and individual risks and toward new regulations to encourage social protections for privacy (J.E. Cohen, 2008; Rule, 2004; Swarthout, 1967; Waldo et al., 2007). U.S. law does not interpret personal data to be owned by the subject of those data. Instead, legal regimes give control of, and responsibility for, personal data to the institution that collected the data (Waldo et al., 2007). Fair information practices are the ethical standards for collection and sharing that those institutions are asked to follow. Originally codified in the 1970s, these practices are still considered “the gold standard for privacy protection” (Waldo et al., 2007, p. 48), and they have been voluntarily adopted by other nations as well as private entities. Fair information practices such as notice and awareness, choice and consent, access and participation, integrity and security, and enforcement and redress can certainly apply to participatory personal data. But if privacy concerns expand to include processes of enforcing personal boundaries (Shapiro, 1998), negotiating social situations (Camp & Connelly, 2008; Palen & Dourish, 2003), and portraying fluid identities (Phillips, 2002, 2005), fair information practices formulated for protecting corporate and government data may be insufficient for personal collections.
Other researchers suggest that concerns about data capture extend beyond the protection of individual privacy. Curry, Phillips, and Regan (2004) write that data capture makes places and populations increasingly visible or legible. Increasing knowledge about the actions of people and their movements can lead to function creep. For example, collections of demographic data can enable social discrimination through practices such as price gouging or delivery of unequal services. Could participatory personal data gathered to track an individual's health concern or document a community's assets be repurposed to deny health insurance or set higher prices for goods and services?
All of this cross-disciplinary attention points to the fact that building participatory personal data systems that protect privacy remains a challenge. Human–computer interaction research considers ways that systems might notify or interact with users to help them understand privacy risks (Anthony et al., 2007; Bellotti, 1998; Nguyen & Mynatt, 2002). Computer science and engineering research innovates methods to obscure, hide, or anonymize data in order to give users privacy options (Ackerman & Cranor, 1999; Agrawal & Srikant, 2000; Fienberg, 2006; Frikken & Atallah, 2004; Ganti, Pham, Tsai, & Abdelzaher, 2008; Iachello & Hong, 2007). Anonymization of data, in particular, is a hotly debated issue in the privacy literature. Many scholars argue that possibilities for re-identification of data make anonymization insufficient for privacy protection (Narayanan & Shmatikov, 2008; Ohm, 2009). Other researchers are pursuing new anonymization techniques to respond to these concerns (Benitez & Malin, 2010; Malin & Sweeney, 2004).
Privacy approaches for participatory personal data systems draw on a number of these developments (Christin, Reinhardt, Kanhere, & Hollick, 2011). These include limiting sensing by granularity, time of day, location, or social surroundings; providing capture and sharing options to match diverse user preferences; methods to collect and contribute data without revealing identifying information; data retention and deletion plans; and access control mechanisms. All of these methods focus on privacy by design: building features into systems to help users manage their personal data and sharing decisions (Spiekermann & Cranor, 2009). Privacy by design is a promising avenue of research for participatory personal data, and advocacy organizations such as the Center for Democracy & Technology are currently pushing mobile application developers to take responsibility for privacy in their design practices (Center for Democracy & Technology, 2011).
Privacy, of course, is only a relative value, and can frustrate other social goods. As Kang (1998) points out, commerce can suffer from strong privacy rights, as there is less information for both producers and consumers in the marketplace. Perhaps worse, truthfulness, openness, and accountability can suffer at the hands of strict privacy protections (Allen, 2003). Research using participatory personal data directly confronts this trade-off between privacy, truthfulness, and accuracy. For example, researchers are developing algorithms for participatory personal data collection that allow users to replace sensitive location data with believable but fake data, effectively lying within the system (Ganti et al., 2008; Mun et al., 2009). What is good for privacy may not always be good for accuracy or accountability. Investigating privacy and policy for participatory personal data will include weighing these tradeoffs. It will also include integrating elements from contextual privacy and useable system design to present a range of appropriate privacy-preserving choices without unduly burdening participants. And finally, it will mean crafting new policy—institutional as well as national—to protect participants from function creep or discrimination based on their data.
Information Access and Equity
Privacy is not the only research tradition in information science that can inform a discussion about participatory personal data. These data also raise challenges for information access and equity. Who controls data capture, analysis, and presentation? Who instigates projects and sets research goals? Who owns the data or benefits from project data? Accumulating and manipulating information is a form of power in a global information economy (Castells, 1999; Lievrouw & Farb, 2003). Participatory personal data projects invoke this power by enabling previously impossible data gathering and interpretation. How do participatory personal data project designers, clients, and users decide in whose hands this power will reside?
The relationship between information, power, and equity has long been a topic of interest in the information studies literature (Lievrouw & Farb, 2003). A large literature on the digital divide has focused on access to information, and ways that social demographics limit or enhance information access (Bertot, 2003). Participatory personal data evoke these basic questions of accessibility. Anecdotal evidence from popular news reports suggests that hobbyist self-quantifiers are largely American and European, white, and upper-middle class (Dembosky, 2011). Participants in health or urban planning data projects, however, may span a much greater socioeconomic range. Indeed, mobile devices are some of the most accessible information technologies on earth (Kinkade & Verclas, 2008), spanning national, ethnic, and socioeconomic groups.
However, there are challenges beyond accessibility. Lievrouw and Farb (2003) suggest that researchers concerned with information equity take a different approach, emphasizing the subjective and context-dependent nature of information needs and access, even among members of one social group. How do participatory personal data answer context-specific information needs? When individuals are generating the information in question, equity comes to hinge on who benefits from this information capture. Will it be individuals and informal communities, or more organized corporations and governments? Sociologists have proposed that loosely organized publics help to balance power held by formal organizations and governments (Fish, Murillo, Nguyen, Panofsky, & Kelty, 2011). But the rise of participatory culture has challenged this traditional understanding, organizing publics and intermeshing them with organizations. For example, participatory personal data projects exhibit elements of both organizations and publics. Research organizations such as UCLA's Center for Embedded Networked Sensing (CENS, now part of Mobilize Labs: http://mobilizelabs.org/) partner with community groups to actively recruit informal groups of participants into participatory personal data projects. Examples include health projects that recruited young mothers not only for data collection, but for focus groups about research design, as well as a community documentation project that engaged neighbors in Los Angeles’ Boyle Heights community. Will organizations like CENS hold the power that design, data, categories, and social sorting can bring, or can it be distributed back to the publics who collect the data? Because data collection methods using mobile devices can range from participatory to opportunistic, it is unclear how much control individuals will have over what data are collected, how they are stored, and what inferences are drawn.
It is important to note that increasing measures for participation does not solve problems of power and equity. As Kreiss, Finn, and Turner (2011) point out, there are limits on the emancipatory potential of peer production. And participatory projects have been criticized for a range of failures, from struggling to create true participation (Elwood, 2006) to being outright disingenuous in their approach and goals (Cooke & Kothari, 2001). The intersection of information systems, values, and culture is also important to consider. Cultural expectations and norms are deeply embedded in the design of information systems, shaping everything from representation of relationships within databases (Srinivasan, 2004, 2007) to the explanations drawn from data (Byrne & Alexander, 2006; Corburn, 2003). The design process is never value-neutral, and questions of what, and whose, values are embodied by software and system architecture have been controversial for decades (Friedman, 1997). Affordances built into a technology may privilege some uses (and users) while marginalizing others. Design areas where bias can become particularly embedded include user interfaces (Friedman & Nissenbaum, 1997), access and input/output devices (Perry, Macken, Scott, & McKinley, 1997), and sorting and categorization mechanisms (Bowker & Star, 2000; Suchman, 1997). The intersections between culture, meaning, and information systems have spurred researchers to experiment with culturally specific databases, media archives, and information systems for indigenous, diasporic, and marginalized communities (Boast, Bravo, & Srinivasan, 2007; Monash University School of Information Management and Systems, 2006; Srinivasan, 2007; Srinivasan & Shilton, 2006). Such “alternative design” projects seek to investigate, expose, redirect, or even eliminate biases that arise in mainstream design projects (Nieusma, 2004).
Participatory personal data projects, however, often adopt a universal rather than relativist vision, taking “everyone” as intended users. What does it mean to design for everyone? As Suchman (1997) points out, designing technology is the process of designing not just artifacts, but also the practices that will be associated with those artifacts. What do designers, implicitly or explicitly, intend the practices associated with participatory personal data projects to be? And how will such practices fit into, clash against, or potentially even reshape diverse cultural contexts?
Management, Curation, and Preservation
Privacy, access, and equity challenges are all affected by an overarching information concern: how participatory personal data projects are managed, curated, and preserved. Metadata creation and ongoing management are necessary to ensure the access control, filtering, and security necessary to maintain privacy for participatory personal data. Accessibility and interpretability of the data by individuals as well as governments and corporations will be dependent on its organization, retrieval, and visualization. And whether and how data are preserved—or forgotten—will be dependent on curation mechanisms heavily reliant on metadata and data structures (Borgman, 2007).
Participatory personal data echo many of the same management concerns found in large scientific data sets (D. Cohen, Fraistat, Kirschenbaum, & Scheinfeldt, 2009; Gray et al., 2005; Hey & Trefethen, 2005). Participatory personal data may consist of large quantities of samples recorded every second or minute for days or months. The data are frequently quantitative measurements dependent on machine processing and descriptive metadata for human comprehension. These characteristics require new techniques for organization and management. Developing such methods for organizing, analyzing, and extracting meaning from large, diverse, and largely quantitative datasets is an emerging challenge for information sciences (Borgman, Wallis, Mayernik, & Pepe, 2007).
Always-on, sensitive data capture brings up a number of theoretical and normative questions about whether and how these data should persist over time. Where and how will these data be curated and preserved? What are the benefits of preserving people's movements, habits, and routines? And what problems might the ubiquitous nature of this memory raise? Participatory personal data present an institutional and logistical challenge for preservation. Like all digital material, methods for long-term preservation are costly and labor-intensive (Galloway, 2004). The distribution of these data across multiple stakeholders, including individuals, research organizations, corporations, and governments, also challenges traditional preservation models based upon clearly defined collecting institutions (Abraham, 1991; Lee & Tibbo, 2007). It is also unclear what institutions will be responsible for preserving data held by individuals and loosely organized publics. Determining who is responsible for authoring, managing, and curating data distributed among individuals will challenge our existing notions of institutional data repositories and professional data management.
Perhaps more difficult is the question of whether we should preserve participatory personal data at all. Historically, a major role of archival institutions was selecting records, keeping only a tiny portion of records deemed historically valuable (Boles, 1991; Cook, 1991). But the explosion of data generation paired with cheap storage and cloud computing raises the possibility of saving much more evidence of daily life. This possibility has become a subject of both celebration (Bell & Gemmell, 2007) and debate (Blanchette & Johnson, 2002). Collections of granular personal data have been invoked to promise everything from improved health care (Hayes et al., 2007, 2008) to memory banks that “allow one to vividly relive an event with sounds and images, enhancing personal reflection” (Bell & Gemmell, 2007, p. 58). And new kinds of personal documentation could help to counteract the power structures that control current archival and memory practices, in which the narratives of powerful groups and people are reified while others are marginalized (Ketelaar, 2002; McKemmish, Gilliland-Swetland, & Ketelaar, 2005; Shilton & Srinivasan, 2007).
As more data are collected and indefinitely retained, however, there may be pernicious social consequences. Blanchette and Johnson (2002) point out that U.S. law has instituted a number of social structures to aid in “social forgetting” or enabling a clean slate. These include bankruptcy law, credit reports, and the clearing of records of juvenile offenders. As information systems increasingly banish forgetting, we may face the unintended loss of the fresh start. Drawing on this argument, Bannon (2006) suggests that building systems that forget might encourage new forms of creativity. He argues that an emphasis on augmenting one human capacity, memory, has obscured an equally important capacity: that of forgetting. He proposes that designers think about ways that information systems might serve as “forgetting support technologies” (2006, p. 5). Mayer-Schoenberger (2007) presents a similar argument, advocating for a combination of policies and forgetful technologies that would allow for the decay of digital data. Information professionals interested in questions of data preservation will find difficult challenges for appraisal and curation in participatory personal data. Determining the nature and organization of the cyberinfrastructure that will support participatory personal data will affect many of these questions about privacy, equity, and memory.