Racial Bias in Policing: Why We Know Less Than We Should


Phillip Atiba Goff, Department of Psychology, University of California, Los Angeles, 1285 Franz Hall, Los Angeles, CA 90095-1563. Tel: (310) 206-3467 [e-mail: goff@psych.ucla.edu].


There is a shocking dearth of scientific certainty about how to assess racial bias in policing. Specifically lacking is an examination of the causal relationship between officer psychological attitudes and their interactions with minority suspects. Do officer racial attitudes lead to more racially biased police behavior? Why do we, as psychologists and scientists, know less than we should about psychological attitudes and their effects on police behavior in the field? To answer this question, we first review what researchers have learned given the available types of existing data: crime data, officer data, and public opinion data. Next, we discuss how insufficient access and lack of rigorous design have detracted from thorough research on racial bias in policing. Finally, we detail how new opportunities for social scientists have the potential to overcome these barriers and conduct rigorous psychological research on equity in policing.

It would not surprise even casual observers of the U.S. racial politics to learn that there is still a profound gap between Whites, Blacks, and Latinos in employment rates (DeFreitas, 1991; Schwartzman, 1997; Wilson, 1996). Nor would it surprise many to be told that there are large racial disparities in wealth accruement (Gilens, 1999; Oliver & Shapiro, 1995; Shapiro, 2004), housing (Jargowsky, 1998; Massey & Denton, 1993), health care (Budrys, 2010), or educational attainment (Bowen & Bok, 2000; Massey, Charles, Lundy, & Fischer, 2003; Steele, 2010). What would be surprising to learn that we did not really know whether or not there is sinister racial discrimination in one of these areas. Given how long racism has been part of our social fabric, it would be shocking to think that there remained uncertainty about how to tell whether or not racial bias troubled one of our most important social institutions.

Yet, that is precisely the position we find ourselves in with regard to racial bias in policing. It would be hard to argue that municipal law enforcement is not an important public institution. Police are often the most visible state representatives, meaning that communities can even experience racial bias in law enforcement as state-sponsored oppression (Alexander, 2010; Muhammad, 2010; Tyler & Huo, 2002). Despite this critically visible role, the best efforts of criminologists and federal agencies have primarily succeeded in documenting limited forms of racial disparities, and have largely failed to produce compelling evidence of racial bias. The inability to distinguish disparity from discrimination has, in turn, hampered both scientists and practitioners wishing to ensure equity in policing. In other words, the dearth of knowledge about racial bias in policing hampers the pursuit of equality.

The Example of Racial Profiling

No topic better personifies the controversies of racism in law enforcement—nor the problems with measuring it—than the topic of racial profiling. Defined broadly, racial profiling is defined as “using race as a factor in conducting stops, searches, and other investigative procedures” (Bush, 2001). The term was coined to refer to law enforcement entities that used a suspect's race to develop a “profile” of who was criminal, and is now used by practitioners to describe factors that lead to racial biases, particularly in police stops. However, to the lay public, the term has become a catchall for policies or practices they feel to be biased. Regardless of the definition, there is not significant scholarly debate about whether or not racial profiling exists, nor much legal or moral debate about whether or not it is good: racial profiling exists and it is harmful (Bush, 2001; Gross & Livingston, 2002). Evidence regarding the existence of racial disparities in law enforcement are ubiquitous: Blacks are approximately four times more likely than are Whites to be targeted for police use of force (Walker, Spohn, & DeLone, 2007); Blacks and Latinos are stopped and incarcerated at significantly higher rates than their representation in the population, particularly for drug-related crimes (Federal Bureau of Investigation, 2009; Sidanius & Pratto, 1999; Tyler & Huo, 2002); and Blacks and Latinos are significantly more likely to fear unjust treatment by the police (Sidanius & Pratto, 1999; Toby, 2000; Tyler & Huo, 2002; Walker, 2005; Weitzer & Tuch, 1999, 2006). A majority of Americans believe that police bias against Blacks is either very or fairly common, and nearly three quarters of Blacks feel that the police treat Blacks more harshly (Weitzer & Tuch, 1999). These concerns with biased policing extend beyond the borders of the United States, with researchers documenting fear of police bias in much of Europe (e.g., Bowling & Phillips, 2003; Hasisi & Weitzer, 2007; Sollund, 2006), Australia (Christie, Petrie, & Timmins, 1996), and Asia (Loper, 2001).

These disparities are accompanied by evidence in the United States (Sidanius, Liu, Shaw, & Pratto, 1994; Sidanius, van Laar, Levin, & Sinclair, 2003) and abroad (Christie et al., 1996; Sidanius & Pratto, 1999) that officers have more racially biased and xenophobic attitudes than the population at large. This suggests that the specter of racial prejudice in law enforcement is not only an American concern, but a fundamental concern of democratic societies (Alpert & Dunham, 2004). Racial bias in police enforcement, therefore, presents a significant threat to legitimacy of law enforcement in all communities (Tyler & Huo, 2002), and on its face constitutes a threat to core democratic principles.

Given the near consensus that racial profiling is real and damaging, debate about how to define the term is far less vexing than the issue of assessing whether or not a given department is engaged in discriminatory practices. The first difficulty in making this determination is that necessary policing data regarding race are often not publicly available. This lack of access to racial profiling data is why, first in 1997, and then in every Congressional session since 2000, members of Congress have tried repeatedly—and failed as often—to pass legislation requiring municipal law enforcement to collect and report the racial demographics of individuals they stop (Traffic Stops Statistics Act, 1997 and 2000; End Racial Profiling Act, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, and 2010). Unfortunately, law enforcement officials are often loathe to collect the data voluntarily or to report them, for reasons that are described later in this article. As a result, private advocates, the federal government, and municipal law enforcement collectively spend hundreds of millions of dollars annual litigating and complying with litigation regarding racial profiling record keeping (Megerian, 2009; Ross & Parke, 2009). Consequently, there is no national database on racial profiling and no reliable estimate of racial disparities in informal police contact across the nation.

However, the lack of publicly available data on racial profiling is only a minor part of the problem involved in measuring racial profiling. The core of the issue centers on the inability of scientists and researchers to answer one basic question: If one had unlimited and complete access to all types of police data, how would one use it to measure racial bias?

Though it is suspected that racial discrimination is present in law enforcement, what we as scientists do not know is how to prove where it occurs, when it exists, or how prevalent it is across police departments. Our inability to make concrete statements about racial profiling is due to the fact that, to determine whether or not any given law enforcement entity is engaged in racial profiling, one must differentiate between racial disparities and racial discrimination (Banks, 2003; Blank, Dabady, & Citro, 2004; Goff et al., 2010; Ridgeway & MacDonald, 2010).

Racial disparities—or numerical differences between racial groups on an outcome of interest such as stops or arrests—may occur for a myriad of reasons. Although it is likely that some police officers and departments stop and arrest more minority suspects due to individual or departmental racial biases, it is also possible that the racial biases of the broader society disadvantage minorities to such a degree that it produces these biased outcomes. In response to societal disadvantage, it is reasonable to suspect that racially stigmatized groups may find crime a preferable—or the only feasible—route to survival. What this assertion means to the measurement of racial profiling is that criminal behavior may well be higher among stigmatized racial groups, which observed racial disparities in stops or arrests of minorities by law enforcement might then reflect (Goff et al., 2010). Therefore, for instance, comparing the percentage of Blacks in Philadelphia's population to the percentage of Blacks stopped by the Philadelphia Police Department may reveal racial disparities, but those disparities would not reveal whether or not the Philadelphia Police Department discriminates against Blacks.

In order to solve this measurement conundrum, researchers would need access to large police data sets and the ability to conduct careful analyses, supplemented by controlled experiments. The necessary access to complete records, required trust between researchers and police departments, and methodological rigor necessary to accomplish this feat has, thus far, been limited. Throughout this article, we detail the reasons why these barriers exist, and, importantly, what can be done to overcome such obstacles.

Outline of This Article

Given the recent attention to acute racial disparities in incarceration rates (Alexander, 2010; Loury, 2008; Pew Center on the States, 2008; Western, 2006), courtroom outcomes (Edelman, 2006; Kennedy, 1998; Mills, 1999; Sidanius & Pratto, 1999; Walker, 2004), and the death penalty (Baldus, Pulaski, & Woodworth, 1990; Eberhardt, Davies, Purdie-Vaughns, & Johnson, 2006; Edelman, 2006; Goff, Eberhardt, Williams, & Jackson, 2008; Ogletree & Sarat, 2006), it seems almost impossible to think there is not similar scholarship about race in policing. What one might not realize is that the system most responsible for entry into the criminal justice system (municipal policing) differs substantially from the systems responsible for individuals after they have entered it. For instance, the important outcomes of courtroom decisions, correctional facilities, and execution take place in public contexts, are recorded systematically by the state, and are matters of public record. None of these are true in policing.

As an example, the decision to stop an individual can be at once terribly consequential and impossible to document. Courtesy during informal street contacts between police officers and citizens plays an important role in community perception that, in turn, can create the context for actual racial conflict (Tyler & Huo, 2002). This interaction, also, is difficult to record without the help of video cameras. Similarly, abuses of police power commonly occur in situations with few witnesses. In these contexts, the officer is unlikely to report on herself and corroborating evidence can be difficult to produce. Even when incidents of officer misconduct are uncovered, it is often difficult to compel departments to produce aggregate data on these issues—if they have them at all—as police and sheriffs departments have compelling interests in keeping certain internal data out of public sight.

Given these obstacles alone, it is easier to understand why we know so much less about racial bias in policing than we do in other public institutions. These obstacles only represent part of the total impediments. In an attempt to navigate this bedeviling lack of information, this article is divided into three sections: (1) What we do and do not yet know, (2) Why we do not know more than we do, and (3) Reasons for hope: What we hope to know soon. The first section outlines what criminologists, federal agencies, and, increasingly, sociologists and economists have been able to divine so far. This section is itself divided into three subsections corresponding to the types of data that are routinely analyzed, namely crime data, officer data, and public opinion data. Within each, we will outline what has been uncovered and what remains unresolved.

In the second section, we review some broader factors that have limited our knowledge of racial bias in policing. This section falls into two subcategories, namely insufficient access to data and insufficient methodological rigor.

Finally, we highlight innovations in the study of race and policing that provide reasons for optimism. The current climate of expanding collaborations between municipal law enforcement and independent social scientists allows for a great diversity of analytic approaches and policy solutions. Taken together, the goal of this article is to make clear the challenges before empirical social scientists that wish to study racial bias in law enforcement, and to suggest at least one model for negotiating them.

What We Do and Do Not Yet Know: Crime Data, Officer Data, and Public Opinion

Despite numerous obstacles to data collection, criminologists have long sought to shed light on the relationships between race and crime (Walker et al., 2007). In that attempt, they have collected data that fall, roughly, into three categories: (1) crime data, (2) officer data, and (3) public opinion data. These three categories will serve as an outline for our review of both “what we know” about bias in law enforcement, and “what we do not yet know.”

Crime Data

Criminologists use aggregate crime data far more than any other kind of quantitative data. Broadly, crime data refer to data collected by municipal law enforcement on the types of crimes that occur within a jurisdiction. Municipal-level data are often crucial for understanding the ways in which racial disparities are created and maintained—and for correcting those biases. Litigating claims of racial discrimination, for instance, would be far more difficult on a national level. Individuals who are targets of biased policing are also more likely to make convincing arguments based on the behavior of local law enforcement, as opposed to national-level trends.

Still, national data are crucial for studying the factors that produce and moderate racial disparities in policing. Without the ability to examine variations in demographics, policies, and outcomes, it is difficult to develop principles and theoretical frameworks with which to predict racial disparities and diagnose racial discrimination. Consequently, the lack of national data on racial disparities is a significant obstacle to the study of bias in policing. What national data sets are available result from municipal data that are gathered in one (or more) of four possible ways: Universal Crime Report format, National Incident-Based Reporting System (NIBRS) format, National Crime Victimization Survey (NCVS) format, and department-specific formats. Unfortunately, each has severe limitations with regard to diagnosing even racial disparities. We will briefly describe each format below.

Types of crime data and their limitations

Municipal law enforcement collects data about types of arrests and reports them annually, in the aggregate, to the Federal Bureau of Investigation (FBI). These data are themselves aggregated annually by the FBI and published as the Uniform Crime Report (UCR; Federal Bureau of Investigation, 2009; Maltz, 1977; Robinson, 1911). As a result of the federal statute that requires the collection of these data, many police departments use the UCR format for the maintenance of their own data. This practice means that, if someone is seeking data from a particular department, they are likely to receive these in a format consistent with UCR reporting standards (Gabbidon & Greene, 2005). Consequently, it is common to refer to these data as UCR data, whether they are at the level of a given municipality or at the national level.

These data have provided significant insights into national-level racial disparities. For instance, UCR data have revealed that Blacks and Latinos are significantly more likely to be arrested for violent and drug-related crimes (Federal Bureau of Investigation, 2003; Sidanius & Pratto, 1999; Walker et al., 2007). Despite the utility of having a relatively uniform reporting system, researchers and practitioners alike acknowledge a number of shortcomings of UCR data, chief among them that (1) they fail to collect information on police contacts that do not fall easily into existing UCR categories or (2) on victims.

To address this first concern, the FBI endorses a new model for collecting crime-related data that would focus on incidents of crime, rather than arrests. The NIBRS requires municipal law enforcement to change the way they collect and report criminal data, with all calls for service and recorded criminal events receiving a unique entry, thereby changing the unit of analysis from the arrestee to the arresting incident instead. Though the FBI and the International Association of Chiefs of Police (IACP) both endorsed this new structure in 1988, to date, only 21 states use the NIBRS format, severely limiting the effectiveness of the innovation (Finkelhor & Ormrod, 2000; Gabbidon & Greene, 2005). Law enforcement's resistance to using NIBRS, in addition to the significant logistical hassle of changing to a new data management system, stems from a concern that, if they switch from UCR to NIBRS reporting then the headline in the paper is likely to be “crime goes up,” despite observational evidence to the contrary (Rantala & Edwards, 2000). Consequently, whatever the potential benefits of NIBRS-style reporting might be, they are mitigated by the perceived costs to law enforcement of changing systems, making them less comprehensive than UCR statistics.

In much the same way that NIBRS addresses concerns of underreporting, the NCVS attempts to address concerns that victims of crime are not considered in UCR data (Mosher, Miethe, & Phillips, 2002). These data have revealed important racial disparities in the rates of criminal victimization, providing the evidence that Blacks and Latinos are more likely to be the targets of violent and property crimes than are Whites (Bureau of Justice Statistics, 2001a; Walker et al., 2007). However, while this data set is the best available, it is neither comprehensive (as it is based on sampling estimates) nor is it linked to NIBRS or UCR data. As a result, researchers must analyze data about victimization separately from data about perpetrators.

These disjoint national data sets force researchers—particularly ones interested in racial disparities—to focus primarily on locally collected data. Though individual departmental records can vary widely, they often have incident as well as victim data tied together in offense reports. Importantly, departmental data often also have information about the number of individuals who are stopped, detained, or searched by police—the most common interactions civilians are likely to have with law enforcement—none of which are captured in any of the national data sets (Bureau of Justice Statistics, 2002; Walker et al., 2007). In other words, only departmental data capture information that would reveal racial profiling.

Similarly, departmental data can be connected to individual officers, allowing researchers to determine whether or not the race, sex, age, or experience of individual officers plays a role in aggregate outcomes—something not possible with the existing national data sets. To the extent that scientists and researchers can obtain access to them, these departmental data have been the most useful in producing new knowledge about issues such as racial profiling and racial disparities in use of force (Goff et al., 2010; Ridgeway & MacDonald, 2010). In fact, at this stage, departmental data offer the best hope of investigating the police biases that scholars, concerned communities, and progressive law enforcement deem most critical (Fridell, 2004; Wilson, Dunham, & Alpert, 2004). For instance, voluntarily reported departmental data on use of force have revealed that Blacks are presently four times more likely than Whites to be targeted for use of police force, down from eight to one a quarter century ago (Bureau of Justice Statistics, 2001b; Walker et al., 2007).

However, despite the persistent racial disparities that all levels of analyses and types of data reveal, it is not as simple in the domain of law enforcement to conclude when and where discrimination enters. If Latinos are arrested at twice their representation in a given population, does that mean that there are too many or too few officers in their neighborhoods? Similarly, if Blacks are stopped at twice their representation in a given population, is that because they are committing more crimes (as those who face discrimination in employment, housing, health care, wealth accruement, and education might), or because the police are biased against them? Alternatively, disparities may arise because those who pass laws and direct police enforcement efforts (i.e., municipal executives) direct law enforcement to engage in functionally discriminatory behavior, or because chiefs and sheriffs choose to deploy their officers more aggressively in Black and Latino neighborhoods.

Although it would be naïve to imagine that officers and departmental policies play no role in the creation of racial disparities, it is quite difficult to distinguish between racial disparities in policing and racial discrimination at the individual officer, departmental, and national levels. That is, is racial discrimination in law enforcement the cause of racial disparities or are those disparities a symptom of racial discrimination in other domains? Many criminologists remain agnostic regarding these questions and some of the best-intentioned professionals are left ill-equipped to identify bias where it occurs. Despite these limitations, creative criminologists (and, increasingly, economists and sociologists) have found ways to sidestep some of the issues that we have outlined earlier. For these scholars, departmental data have offered the most promise of identifying departmental bias. Below, we will review the different ways in which scholars have approached the difficult issue of measuring racial profiling, all of which uses crime data gathered from departments.

The problem of measuring racial profiling

To review the specific methodologies that others have used for analyzing police bias, we return again to the example of racial profiling. As detailed earlier, despite repeated efforts to pass federal legislation that would mandate a national database on racial profiling (e.g., The Traffic Stop Statistics Act, 1997; End Racial Profiling Act of 2010: HR 5748, originally introduced in 2001), so-called “racial profiling data” tend to be kept at the municipal or state level, with 25 states enacting some form of racial profiling data collection (Racial Profiling Data Collection Resource Center, n.d.). Consequently, analyses of racial disparities in stops must take on the idiosyncrasies of the jurisdiction under study. While this narrow scope can be problematic, this limitation is not the largest barrier to quality analyses of racial bias in police stops.

Rather, the largest barrier to an accurate accounting of racial bias in police stops is the difficulty scholars have identifying the appropriate way to analyze the data that are collected. More specifically, while some departments keep racial demographic information on vehicle and pedestrian stops, these data only permit an analysis of racial disparities—not racial discrimination (Banks, 2003; Blank et al., 2004; Goff et al., 2010; Ridgeway & MacDonald, 2010). Again, the difference between observed racial disparities versus racial discrimination as the cause is critical. As described earlier, the central difficulty in measuring racial profiling is that, if one believes a police department is engaged in racial profiling, it is reasonable to assume that they are stopping too many Blacks and/or Latinos. The question then becomes, how does one measure “too many”? How does one know if observed disparities are truly due to officer racial discrimination, as opposed to a plethora of other potential causes?

A seemingly common sense approach to the racial profiling question would involve comparing the racial demographics of the stops to the racial demographics of a population. In other words, one wants to create a fraction, with the percentage of Latinos (or Blacks, etc.) stopped as the numerator and the percentage of Latinos in the population as the denominator. Using this analytic technique, also known as population benchmarking, researchers hypothesize that a municipality with a 25% Latino population will produce vehicle stops that are also around 25% Latino. Any deviation from this ratio of population demographics to police stop rates is assumed to be due to police racial bias or profiling. However, this metric is flawed for several reasons, which we detail later.

First, as argued earlier, if racial discrimination exists in all other important social institutions (i.e., education, employment, health care, housing, and wealth accruement), then it is highly probable that these racial inequalities will produce a disproportionate incentive to commit crime among targeted populations of non-Whites? In other words, if racial discrimination encourages a group to engage in criminal behavior, then it is likely that racial disparities in stops can also be a symptom of wider discrimination, rather than a product of police biases (Goff et al., 2010).

Second, using the general population—or even the residential population of a given area—as a benchmark is problematic for a variety of reasons. For example, when assessing car stops, it is not clear that the residents of a given municipality are represented among the driving population in proportion to their racial demographics or among the pedestrian population in the case of pedestrian stops. Similarly, it is often the case that large urban areas are business and social hubs for surrounding municipalities, meaning that the foot and vehicle traffic in a large city is likely to include large numbers of nonresidents. Moreover, in areas with significant undocumented populations, census data are unlikely to reflect the actual racial makeup of the city. Therefore, the population demographic number used to compare the racial demographics in stops against is itself a flawed comparison metric.

Despite these and other flaws, collecting “racial profiling data” usually means keeping track of the racial demographics of stops, and both municipal and federal “racial profiling analyses” frequently use population benchmarking as a standard technique. Thankfully, because population benchmarking is so imprecise, scholars and police have searched for a replacement to populations as the relevant benchmark—or denominator. This search is also known as the “denominator problem” (Walker, 2001) or the benchmarking problem.

Many researchers have wrestled with the “denominator problem,” and have found creative alternatives to simple population benchmarking. Specifically, six methods, with a range of popularity and effectiveness, use departmental crime data to examine racial profiling and have become popular in recent years. They are (1) adjusted neighborhood benchmarking, (2) arrest data, (3) DMV and vehicle registrations data, (4) so-called “blind enforcement mechanics,” (5) observational data, and 6) consent search or “outcomes tests” analyses. Below, we will outline each of these methodologies, their advantages and disadvantages, and summarize their effects on how scholars approach analyses of racial bias in law enforcement.

Adjusted neighborhood benchmarking

This approach encompasses a number of analytical techniques that attempt to solve the denominator problem by more accurately quantifying the number of residents who might be stopped. This method frequently involves benchmarking stops within smaller geographic areas and using the neighborhood or census tract demographics rather than municipal demographics to produce appropriate benchmarks (Fridell, 2004; Goff et al., 2010; Ridgeway & MacDonald, 2010). This technique is designed to reduce the disproportional impact of targeted enforcement techniques on the racial demographics of stops data. That is, when departments choose to engage in increased vehicle stops within a given neighborhood (often as a crime reduction tool), it is likely to drive up the number of stops in that neighborhood. If the neighborhood is majority Black or Latino, then, even if stops in that neighborhood are racially proportional to the population, the enforcement pattern will result in a higher proportion of Blacks or Latinos stopped citywide than their representation in the population.

An adjusted neighborhood benchmarking approach alleviates this concern by matching stops to neighborhoods rather than to an entire city or county. Therefore, by narrowing the unit of analysis to smaller areas, researchers are better able to account for the racial compositions of the area, yielding a more precise “denominator.” This technique, used in the infamous RAND report on New York City's stop and frisk practices (Ridgeway, 2006), however has many of the same drawbacks as less nuanced population benchmarking techniques. First, it does not take into account commuter or undocumented populations, which are again likely to change the “denominator” used in population benchmarking. Further, neighborhood benchmarking is also unable to distinguish between police bias or previous discrimination as an explanation of the data. That is, if it is established that racial bias are responsible for observed disparities in stop rates, population benchmarking can never answer the question of whether officer racial bias caused the disparities. And, perhaps most importantly, this technique is unable to determine whether targeted enforcement patterns might be motivated by the racial demographics of certain areas—potentially obviating police departments for policies that produce racial harmful results. That is, if department policies dispatch unequal numbers of patrol units to particular racial areas, observed disparities in stops are likely to ensue, but they would not necessarily be indicative of individual officer bias.

Arrest data

The use of arrest data as the denominator for racial profiling statistics was mostly popular among conservative journalists and pundits for a short time, and not with more rigorous scholars (Harris, 2002). Comparing the racial demographics of stops to the racial demographics of arrest data may seem reasonable if one is searching for the ideal denominator—the demographics of individuals who commit crimes. But upon even cursory inspection, this technique violates multiple rules of logic and statistical inference. Since a police “stop” is often a precursor to an arrest, arrest records may be racially skewed. That is, if stops are biased, then arrests may be biased. Consequently, using (street-level) arrest data to benchmark stops is highly suspect.

DMV and vehicle registration data

Some researchers have attempted to solve the denominator problem by restricting the benchmark population to those who are licensed to drive or who own a vehicle. While this seems a reasonable modification for analyses of vehicle stops, it does no better than unadjusted population benchmarking at identifying the appropriate commuter population (i.e., those who live outside a jurisdiction but regularly drive through it), and suffers from similar problems with regard to causal attributions for observed disparities. Of course, this technique cannot be used to address disparities in pedestrian stops, another arena of potential racial profiling by police officers.

“Blind enforcement mechanics

” This technique is a far more promising approach than the ones previous reviewed. By using functionally race-blind mechanisms to estimate the relevant population, it would seem possible to create a better benchmark. Examples of so-called “blind enforcement mechanics” include comparing traffic stops made in the daylight (when officers might be able to tell the race of the suspect) to ones made at night (when this is, ostensibly, harder to infer race of suspects). Using radars, airplanes, and video traps to estimate the racial demographics of those speeding is another example. And, perhaps most promising of all, using no-fault accidents and red-light cameras to estimate the appropriate racial benchmarks have become increasingly popular.

Using “no fault” accident information, for instance, has the advantage of providing a reasonably unbiased account of the demographics of those who are on the road, since “no fault” accidents are, ostensibly, equally likely to affect everyone. Red-light cameras have the ability to record vehicle information from heavily traveled areas which is linked to vehicle owner demographics, thus providing an estimate of benchmark demographics by providing racial information on those who actually break at least one traffic rule.

Again, however, these techniques are limited. “No fault” accidents are likely to underrecord individuals who are in the country illegally or civilians with criminal records who are wary of engaging with law enforcement. Individuals without insurance are also less likely to report accidents, which is likely to skew the racial demographics “no fault” accidents further. Similarly, because red-light cameras are expensive, they tend to be placed in high-traffic areas and dangerous intersections rather than in quieter residential neighborhoods or suburbs, both of which tend to have fewer traffic lights in general (Walker et al., 2007). Consequently, using red-light cameras is likely to produce artificially high numbers of Blacks and Latinos who are concentrated in inner cities (Jargowsky, 1998; Massey & Denton, 1993; Wilson, 1996). These techniques are also ill-equipped to address the issue of pedestrian stops (with street-corner cameras suffering from even more severe limitations than red-light cameras) and suffer from the same inability to distinguish between police discrimination and broader racial disparities.

Observational data

Using a technique that is similar, though more robust, than recordings from red-light cameras, a few dedicated researchers have attempted to estimate the population of potential vehicular law violators by simply observing them—in extraordinary numbers (Engel & Calnon, 2004; Ridgeway & MacDonald, 2010; Walker et al., 2007). If observers are well trained, this approach has the benefits of better location sampling and can be adapted (with some difficulty) to pedestrian contexts. However, creating a benchmark from observational data is exceptionally time-consuming and prohibitively expensive.

Consent search data

One notable exception to many of the limitations found with most crime data analyses of racial profiling is research using consent search analyses or “outcomes testing.” These techniques are considered the contemporary gold standard in both internal benchmarking and aggregate benchmarking analyses. The first of these approaches is “consent search” analyses that focus on officer or departmental “hit rates.” The logic behind these searches is as follows: since consent searches are at the discretion of the officer, they provide a unique insight into potential officer biases. If officers are policing in an unbiased manner, then one would expect that the percentage of searches that produce contraband among Black or Latino targets—the “hit rate”—should be similar to the percentage of searches among White targets. If, on the other hand, hit rates for Black suspects are significantly lower than for White suspects, this ratio might be an indication of officer or departmental bias (Dominitz & Knowles, 2006; Knowles, Persico, & Todd, 2001; Persico & Todd, 2006; Sanga, 2009; Smith & Petrocelli, 2001).

This technique has the potential benefit of providing both initial evidence of an unacceptable disparity and simultaneously demonstrating ways in which police efficiency can be improved. Similarly, other outcomes-based analyses of post-stops data (i.e., number of arrests that result from stops, quality of the interaction, etc.) have a greater degree of accuracy without having to engage with the denominator problem. However, there are still significant limitations of outcomes testing.

Some scholars still bemoan the fact that consent search data (and other outcomes data) often neglects geographic variation in searches (Sanga, 2009). Others argue that racial differences in hit rate can occur for spurious statistical reasons (Bjerk, 2007). Still others suggest that there are concerns with regard to racial differences in when civilians give consent.

For instance, some have suggested that there are racial differences in community awareness that consent searches also require the consent of civilians (Sklansky, 1997). If this is true, then Whites who are hiding contraband may feel more comfortable refusing police search requests than Blacks or Latinos, thus escalating the likely hit rate for non-Whites. Conversely, if officers know that Blacks and Latinos are less likely to refuse searches, this could increase incentives to engage in so-called “pretextual stops,” the practice of stopping someone for a minor infraction in hopes of finding something more substantive during the interaction. This practice, of course, would reduce the hit rate for non-Whites.

Importantly, the strength of this analytic technique is that it does not focus on the decision to stop an individual but, rather, on something that happens after a stop. Consequently, while it may be a superior metric of the racial biases of officers and departments, it does not answer the question of whether or not individuals are stopped because of their race. Similarly, though outcomes testing measurements do not suffer from the same denominator problem as the previously discussed approaches, there are still difficulties interpreting consent search analyses. For instance, while a departmental policy may result in more searches, and proportionally fewer “hits,” within a particular community, this pattern may be the result of a deliberate enforcement strategy rather than an indication of ineffective policing. Similarly, emerging research suggests that non-Whites may feel anxious during police encounters in response to the fear that they will be labeled as criminals—even if they know they are not criminals (Najdowski & Goff, 2011). This apprehension can result in behaviors that would appear to observers as if they are guilty of something—likely resulting in the desire by an officer to search the individual.

Summary of popular approaches to racial profiling analyses

Each of these approaches seeks to approximate the racial demographics of the criminal population—and each does so imperfectly. Yet, despite having so many options available, no scholar nor practitioner has suggested that there is a “one size fits all” solution to the problem. In fact, every significant review of racial profiling analytic approaches has stressed both the need for agency-specific approaches and for the need to look beyond simple stops data to ensure equity in law enforcement (Fridell, 2004; Goff et al., 2010; Harcourt, 2006; Harris, 2002; Ridgeway & MacDonald, 2010; Walker et al., 2007). This assertion represents a scholarly acknowledgment of the methodological imperfections of existing measurement techniques.

It is important to note, also, that each of the above techniques assumes that the proper level of analysis is the level of the institution (Harcourt, 2006).1 Of course, institutions and departments are an important level of analysis. It is much easier for communities to seek redress from a department with a demonstrated pattern of racially disparate treatment than for any individual to demonstrate that a given stop was motivated by racial prejudice. Still, however reasonable it is to suspect that some police departments engage in aggregate discrimination, it is at least as reasonable to suspect that police departments (of a suitable size) contain individual officers who engage in racially biased policing. Yet, none of the above techniques are well positioned to address this possibility. This assertion is not to say that criminologists have not acknowledged both the existence of institutional-level bias and the need to study both “rotten apples” (i.e., biased officers) and “rotten apple barrels” (i.e., biases in police culture and/or policy; Walker, 2001; 2005; Walker & Alpert, 2000). Still, the metrics for studying racial bias in law enforcement are poorly fitted for a quantitative analysis of any given department's level of bias—much less a comparison between departments. In the next section, therefore, we address the relatively smaller literature that addresses the possibility of individual-level biases: research on officer attitudes.

Officer Data

Perhaps due to criminology's origins in clinical psychology, there are numerous personological approaches to policing (Adlam, 1982; Balch, 1972; Bennett & Greenstein, 1975; Evans, Coman, & Stanley, 1992; Fenster & Locke, 1973; Hanewicz, 1978; Hogan & Kurtines, 1975; Lester, Babcock, Cassisi, Genz, & Butler, 1980; McNamara, 1967; Mills & Bohannon, 1980; Niederhoffer, 1967; Sherman, 1980; Sidanius & Pratto, 1999; Skolnick, 1977; Toby, 2000; Walker, 1992). Yet, despite the extensive literature on police personality, there is relatively less literature on individual officer-level biases as opposed to the above literature on profiling at a higher institutional level. What research there is tends to fall in one of two categories: officer racial attitudes research and internal benchmarking analyses.

The first of these approaches involves simply measuring the racial attitudes of officers. Though few and far between, these data have created a consensus that law enforcement in the United States shares the racial biases of civilians, though there is a tendency for law enforcement to be slightly more racially prejudiced than the population at large (Bayley & Mendelsohn, 1969; Eberhardt, Goff, Purdie-Vaughns, & Davies, 2004; Sidanius & Pratto, 1999; but, see Correll et al., 2007 for an exception). This finding would seem to indicate that officers are prone to some level of bias-based policing. But many researchers and practitioners are quick to point out that there is a difference between biased attitudes and discriminatory behavior (Correll et al., 2007; Eberhardt et al., 2004; Ogloff, 2000; Walker et al., 2007).

This is a distinction that is also well known—though too often forgotten—within social psychology, with attitudes traditionally predicting less than 10% of the variance of both behaviors in general and racially discriminatory behaviors in particular (Dovidio, 2001; LaPierre, 1934; Wicker, 1969). Additionally, even if racially biased attitudes produce some level of biased behavior, it is not at all clear how much biased behavior they produce. A lack of real-world behavioral metrics, therefore, prevents the officer racial attitude research from having a more significant effect on the science or policies surrounding racial bias in policing.

The second significant category of research on officers is research on internal benchmarking (Fridell, 2004; Ridgeway & MacDonald, 2010; Walker, 2001, 2005; Walker et al., 2007; Wilson et al., 2004). Internal benchmarking analyses eschew population denominators for officer performance denominators, essentially comparing one officer's in the field performance to her or his peers. These analyses have the advantage of identifying individual officers who are more prone to stop or use force against Blacks or Latinos as opposed to Whites than their peers, effectively permitting researchers to avoid questions of suspect criminality in their analyses by analyzing variance among officers from the same patrols, same neighborhoods, and within the same department.

This technique is essential if departments wish to identify the officers whose behavior needs correction. In fact, many believe that so-called early warning systems—that predict future behavior based on past behavior—are the best hope to ensure equity and effectiveness in policing (Walker, 2005). However, this technique tends not to include predictors (i.e., attitudes) nor to link officer data to institutional-level variables, making it difficult to target an intervention or discern the role that the institution might be playing in the production of the bad behavior. Additionally, because early warning systems vary so widely and cannot be validated without considerable aggregate data, they remain limited in their capacity to predict behavior across departments.

Taken together, while officer data represent an essential level of analysis, researchers have yet to connect predictors (such as attitudes) to real-world behaviors. Consequently, as with crime data, there is scant research that uses officer data to produce advances in our understanding of biased police behaviors.

Public Opinion Data

As criminology evolved from the study of the criminally insane to the study of how crime functions in society (Garland, 1985; Ogloff, 2000; Rafter, 2008), there emerged a growing interest in how the public felt about public safety (Walker et al., 2007). This recognition of the importance of public perception about crime has translated into a contemporary interest in public opinion data, particularly data gathered from national polls.

Public opinion on public safety is an essential part of the policing puzzle, since a department cannot gain public cooperation if the communities being policed feel that law enforcement engage in racially biased policing (Tyler & Huo, 2002; Walker et al., 2007; Weitzer & Tuch, 2006). That said, opinion data are fundamentally about perceptions of racial inequality in policing and not about the realities of racial inequality in policing. In addition, because journalists and/or professional pollsters frequently gather these data, rather than scholars (cf. Walker et al., 2007), there are often questions by rigorous scientists about inappropriate methodologies and an inability to track opinions over time. For the same reason, it also tends to be the case that some large cities (e.g., Washington DC) are overrepresented in public opinion data on race and policing.

Were it tied to data on police behavior, researchers might reveal important relationships between public perceptions and the behavior of officers. However, as this technique has yet to be applied to the issue of race and policing, this relationship remains a question for empirical exploration. As it stands, while public opinion data are crucial information for scholars and practitioners who want to understand how race and policing is lived, it does little to determine whether police are actually engaging in biased practices.

Reviewing What We Know and Do Not Yet Know

Taken together, though criminologists and economists have conducted significant work using existing crime data to increase our knowledge of racial inequalities in policing, there are few examples of research that is able to distinguish between racial disparities and racial discrimination, and essentially none that can specifically tie discrimination to officer racial attitudes. Even the most promising of today's popular techniques, outcomes testing, lacks a national sample and has not been used in a manner that can account for the possible contributions of officer-level and institutional-level variables.

Consequently, as much as we have learned about racial disparities in the criminal justice system, we remain surprisingly uninformed with regard to racial discrimination in policing. Many analyses of race and policing are agnostic with regard to what causes racial disparities simply because the data are ill-suited for strong causal inferences. Therefore, in the absence of causal explanation, social scientists are underprepared to suggest remedies for disparities that trouble our national conscience and the efficiency of our public safety systems (Hochschild, 1995; Tyler & Huo, 2002). Why, then, has this knowledge gap been so difficult to fill and for so long? In the next section, we review why that dearth of knowledge has lasted longer in the domain of policing than in other important public contexts.

Why We Know Less Than We Should: A Failure of Access, Rigor, and Good Metrics

As we have discussed, the type and quality of data that are most widely available to researchers represents a significant obstacle to measuring racial bias. However, problems with data are not the only obstacles researchers and practitioners face. Clashes in culture, a history of mistrust, and conflicting interests continue to trouble researchers and practitioners that want to answer many basic questions about race and policing. In this section, we review the two largest obstacles that move beyond limitations in databases. These are limited independent access to data and the resulting lack of methodological rigor.

A Lack of Access

It seems reasonable that, since departmental data seem to be the only way to get important information about race and crime, the most direct pathway to increasing our knowledge about race and policing would be to aggregate departmental data. While this strategy holds significant promise, previous attempts by researchers have met with mixed results. For instance, law enforcement has recently bemoaned the lack of data available on the use of excessive force by officers (Adams et al., 1999; Alpert & Dunham, 2004). In response to this concern, the IACP, the largest organization of police chiefs from around the world, received federal support to create a national database on use of force incidents, which would serve to produce the most comprehensive review of officer use of force to date.

Unfortunately, despite the fact that a trusted law enforcement affinity group conducted the analyses, that individual agencies were able to have their names redacted, and that law enforcement stood to benefit from the report, only a small minority of agencies provided data for the relative 10-year period. Those agencies that did provide data often did not provide data for each year of the 10 requested for study, which also limited its usefulness. Though the IACP attempted to standardize the data they received, variations in departmental records resulted in some agencies not having relevant data, leaving large sections of the data set incomplete. Ultimately, this left the cumulative IACP report to conclude that, “The information that we are most confident of is of limited value. In many cases, it does not tell us what we really need to know, because it does not focus squarely on the important issues or is subject to competing interpretation” (Adams et al., 1999, p. 2).

Because those administering the project were not able to check the quality of incoming data, the IACP was left with a database that was unable to answer critical questions about the excessive use of force or officers’ discretionary tendencies, much less the issue of racial bias. Given the expense and, at best, tepid results of similar undertakings, the majority of scientific examinations of departmental bias have occurred at the level of a single department, or a small number of departments within a region.

Despite the growing numbers of collaborations (Charlotte-Mecklenburg Police Department, International Unit, 2004; Walker et al., 2007), nearly all law enforcement collaborations with researchers face hurdles simply because of the differences between police culture and academic culture. These differences can derail projects just before data are ready to be presented, thwart projects before they begin, or discourage each side from contemplating them in the first place. Below, we review some of these cultural challenges that make gaining access to necessary data more difficult than in other contexts by reviewing, first, the culture of law enforcement and, second, the culture of the academy.

Law enforcement culture

Municipal law enforcement tends to be organized in a paramilitary hierarchy that prioritizes discipline and obedience among its employees (Crank, 1998; Paoline, 2003; Skolnick 1994; Walker, 1993, 2001; Walker & Alpert, 2000). Because patrol officers are responsible for each other's physical safety, and supervisors are responsible for the safety of their subordinates, there is a natural tendency for police and sheriffs to form close bonds of professional friendship (Crank, 1998; Kelling & Coles, 1996; Paoline, 2003; Skolnick, 1994; Walker, 2001). The result is a tendency for law enforcement to form close-knit organizational cultures that provide support for in-group members and are wary of out-group members (Crank, 1998; Paoline, 2003; Walker, 2001; Walker et al., 2007; Westley, 1970). Consequently, it is no surprise that there is a history of law enforcement's suspicion of outsiders' interests in law enforcement records (Walker, 2001; Wortley & Tanner, 2003).

For many departments, this cultural tendency to be suspicious of outside researchers is amplified by legal and financial concerns. Cities and state governments routinely pay millions of dollars on litigation and the resulting rulings that often begins with adversarial requests for data about race. As of July 1, 2009, the New Jersey State Policy had reportedly spent at least $137.5 million complying with a consent decree requiring it to monitor traffic stops for racial profiling (Megerian, 2009). Similarly, the Los Angeles Police Department consent decree was estimated at a cost of between $30 and $50 million annually, whereas the recent Cincinnati consent decree cost approximately $13 million to set up and over $20 million annually to ensure compliance (Ross & Parke, 2009). These costs are common nationwide and do not include the cost of the litigations and investigations that occur separately and almost uniformly precede the consent decrees themselves. Given polling data that suggest a majority of Americans believe that police departments are racist (Sidanius & Pratto, 1999; Tyler & Huo, 2002; Walker et al., 2007), a department may be culturally, legally, and financially predisposed to limit outsider access to data. These concerns provide a compelling rationale for law enforcement to be wary of researcher collaborations on racial topics, and to want assurances that the research will serve the interests of the department, as well as the community, before any data are shared.

It follows that, when law enforcement do allow formal research partnerships, it is often to provide specific services for the department. For example, police departments often contract with professional consulting agencies in order to answer questions about racial bias. While some of these agencies have excellent research credentials, the fact that they are being paid to conduct research casts a pall over the integrity of their results, leading community members as well as other researchers to distrust any results. If, as is often the case, researchers’ salaries are based off of repeat business, then it invites the suspicion that the results have been tailored to the desires of the chiefs. This association has caused otherwise independent research to be tainted with suspicions that those conducting the research may have been motivated to arrive at particular conclusions—before the research was even undertaken (Baker, 2007). Regardless of the community response, to the extent that private research organizations’ access to law enforcement data excludes independent research access, methodological innovations in this arena are bound to suffer.

Academic culture

Despite the concerns of some law enforcement executives about university research partnerships, the Police Executive Research Forum (PERF), a private research organization, joined with scholars to recommend these partnerships as an approach to analyzing racial bias in policing (Fridell, 2004; Walker et al., 2007). These partnerships have increased in depth and frequency in recent years, but more slowly than many would like. While law enforcement's hesitancy to allow outsiders access has been part of the reason for these slow developing collaborations, academic culture also has had a part to play. Three of the largest such researcher-based impediments are a lack of deliverables, productivity concerns, and ambivalence about the theoretical benefits of research in this area, each of which we review below.

The issue with lack of deliverables is that social scientists are frequently not used to working on projects that require them to produce something more from their analyses than peer-reviewed journal articles or books. Though law enforcement may support these goals in principle, scholarly research projects often require them to devote significant personnel resources to the culling of data while opening themselves up to possible financial and legal liability. Consequently, police and sheriffs departments often enter into research collaborations only with the promise of a targeted report and policy suggestions—if not solutions—on the other side of the project. For academics, this additional work product is sometimes considered prohibitively time-consuming and constitutes an unwelcomed burden that can discourage otherwise interested researchers.

This same cultural orientation toward a need to devote extensive energies to publishing gives rise to the second concern: productivity. The productivity concern is likely to afflict both social scientists used to working with secondary data sets (i.e., economists, demographers, political scientists, and sociologists) and those more accustomed to collecting original data sets (i.e., political scientists, psychologists, and sociologists). Because secondary data sets are scarce or incomplete, it is reasonable to imagine that an interested sociologist may have concerns that potential law enforcement collaborations would result in the production of data that were insufficient to make causal inferences. Similarly, researchers that depend on original survey data for their research may fear that getting access to law enforcement participants is a time-consuming (if not futile) exercise. Both possibilities are likely to cause scholars in traditional social science disciplines to hesitate before engaging the issue of race in law enforcement—particularly if they are early in their career (Dovidio & Esses, 2007; Goff, in press; Pettigrew, 1998).

The lack of existing data on these issues may also have a compound effect on researcher interest in the issue of race and policing by way of creating ambivalence about the theoretical merits of studying race and policing. That is, because many are unaware of what data are available, it may be difficult to imagine what of theoretical interest could be gleaned from studying the problem.

However, taken together, these elements of academic culture conspire with elements in police culture to reduce the number and scope of collaborative research projects that investigate racial bias in law enforcement. Unfortunately, a lack of research within a domain creates an academic momentum of its own, such that those who wish to influence research in their discipline are less likely to engage in policing research because there is not a critical mass of individuals who will read the work. The result is not only our current lack of knowledge, but also an inferior quality of research when it does take place.

Insufficient Methodological Rigor

A critique of the methodological rigor with which social science has approached racial bias in law enforcement is not a critique of the individuals who have engaged in that science. Rather, it is the natural outgrowth of the inferior data that scientists have had available to conduct the appropriate analyses to distinguish between racial disparities and racial discrimination. To conduct the ideal analyses, there would at least be access to racial demographic data from a wider array of departments on a wider array of issues. As a PERF report indicates, there are best-practice recommendations for how to record so-called “racial profiling data” that have yet to be implemented (Fridell, 2004). This lack of implementation has prevented more ambitious research designs that would permit multilevel analyses that move beyond either the institutional-level or officer-level analyses that have been described earlier. These types of analyses would be able to isolate the role of the department and the role of the officer in producing disparities. Instead, because of the data available, papers on base rates or disparities dominate the field, and research is not designed to make strong causal inferences.

This outcome is not only the responsibility of law enforcement, however. Researchers must articulate better the requirements for rigorous research and should collaborate with willing law enforcement partners to make it a reality. This idea requires a vision for grander possibilities in the relationships between social science and law enforcement. These barriers are why it is rare to see researchers even mention or describe the ideal methodology needed to conduct various studies to law enforcement department, much less see those methodologies put into practice. One of the first such attempts at articulating the type of data required to conduct rigorous scientific research on racial bias in policing was in a report issued by the Consortium for Police Leadership in Equity (CPLE) in 2010 when they released the “Contract for Policing Justice” (Goff et al., 2010). More will be said subsequently about this report, but, to summarize, there is a need for research designs that are able to distinguish between disparity and discrimination.

Learning from Each Other

Identifying a problem, of course, is the first step to solving it. For practitioners and researchers to develop a better understanding of how and when racial discrimination occurs in law enforcement, entities that bridge these two cultures are essential. The National Institute of Justice, the Community Oriented Policing Services Office, and the Bureau of Justice Assistance are the established federal entities that have carried this burden thus far. However, limited budgets and the threat of further budget cuts mean that nongovernmental organizations must also do their part. Fortunately, the efforts of concerned practitioners and researchers have begun to pay off in recent years. The following section outlines some of these successes and suggests fruitful directions for the future.

Reasons for Optimism: What We Hope to Know Soon

Given the seemingly intractable differences in cultures and the methodological limitations of existing analytical approaches, perhaps the state of science surrounding racial bias in law enforcement seems bleak. Differences in law enforcement and academic cultures conspire to limit access to data and rigorous research designs. How, then, can progress be made? This hopeless impression would be a gross mischaracterization of the field, however, as innovations and new collaborations have created exciting new scholarship and research opportunities in recent years. This new commitment is driven, primarily, by the desire of law enforcement executives to take advantage of potential social science insights through new partnerships and a commitment on the part of both parties to create access and reach across cultural divides. Recent police/social science conferences demonstrate this growing desire, with chiefs and sheriffs from across the country participating in the Stanford “Policing Racial Bias” conferences in 2004 and 2007, the ongoing Harvard Kennedy School of Government Executive Sessions on Policing and Public Safety, and the CPLE's policing equity summits in 2009 and 2010. These events have allowed over 40 of the largest law enforcement agencies in the United States and Canada to create informal networks with leading scholars on the topics of race and racial bias, including representatives from Denver Police Department, Houston Police Department, Toronto Police Services, Los Angeles Police Department, Los Angeles County Sheriffs Department, Saint Louis Police Department, Salt Lake City Police Department, and San Jose Police Department, to name just a few.

The commingling of these worlds has led to increasing collaborations, particularly between social psychologists and law enforcement. For instance, inspired by the tragic Amadou Diallo shooting in 1999, a number of social psychologists were interested in the possibility that officers might misidentify harmless objects as weapons—when Blacks held the objects (Correll, Park, Judd, & Wittenbrink, 2002; Kahn & Davies, 2011; Payne, 2001). This research revealed that undergraduates playing a computer game were more likely to shoot “unarmed” Black suspects than “unarmed” White suspects when under time pressure. This finding suggested the possibility that implicit bias could lead to split-second differences that cost Black civilians their lives. However, testing undergraduates in the laboratory is far different from testing sworn officers.

Consequently, coinciding with the first Policing Racial Bias Symposium at Stanford, Correll and colleagues were able to demonstrate that officers were actually less likely to produce race-related errors than were civilians on the same shooter simulation (Correll et al., 2007). In other words, officers were less likely to shoot unarmed Black men than were civilians, and revealed no racial bias in “shooter” errors. Plant and Peruche suggested that this reduction in errors seems to be related to officers learning that there is not a correlation between race and danger (Plant & Peruche, 2005; Plant, Peruche, & Butz, 2005). On the other hand, if officers have had significant negative interactions with Blacks, they are likely to show heightened bias on simulation tasks, suggesting the potential for police bias in contexts with tense police/community relations (Peruche & Plant, 2006).

The ability to recruit actual officers in research also bolstered previous research on the link between perceptions of race and criminality. For instance, Eberhardt and colleagues demonstrated that the association between Blacks and criminality is so strong that officers actually misidentified criminal suspects in favor of more stereotypical (rather than less stereotypical) targets (Eberhardt et al., 2004). This demonstrates that officers use race as a cue to determine criminality—just as undergraduates had in laboratories previously.

In addition to providing evidence that contemporary forms of bias influence important policing outcomes, these collaborations suggested that the principles of social psychological experimentation could apply to the world of policing. This insight opened a new door for understanding racial bias in police behaviors on the street. It was with this goal in mind that the Denver project was undertaken.

The Denver Project

As of 1999, the Bureau of Justice Statistics declared that, “The impact of differences in police organizations, including administrative policies, hiring, training, discipline, and use of technology, on excessive force is unknown” (Adams et al., 1999, p. ix). It was with this sentiment in mind that the Denver Police Department (DPD) began reaching out to social psychologists with the purpose of understanding the role of racial prejudice in everyday policing. Rather than focusing research on behaviors that might or might not constitute bias, the goal was to detect the role of objectionable factors in producing troubling behavior. That is, if an officer's level of anti-Black racial prejudice is predictive of that officer's aggressiveness toward Black citizens, the DPD would find that to be objectionable and would seek to eliminate it. The first step in this process, then, would be the identification of sociocognitive and situational elements that were associated with problematic behaviors (such as differential stop rates or use of force).

The DPD contacted researchers at the University of Colorado, Boulder, in order to test the possibility that race might play a factor in their officers’ behavior. The study tested DPD officers and citizens of Denver in a computer task (Correll et al., 2007). Participants were shown images of Black and White male target holding either guns or harmless objects (i.e., a wallet, a cell phone, etc.). The object of the computer task was to press the “shoot” button whenever an image contained someone holding a gun, but to avoid mistakenly shooting images containing someone holding harmless objects.

The study permitted two measures of implicit bias. The first was a reaction time measures. Participants who were faster to shoot the Black target with a gun than a White target with a gun demonstrated a bias in their reaction time (as did those who were slower to shoot Blacks than Whites). On this measure, officers were as biased as citizens against Black targets. The second measure of bias was found in the decision to shoot. Mistakenly shooting more Black targets (who were unarmed) than White targets also demonstrates a bias. Importantly, while citizens mistakenly shot more Black targets than White targets, DPD officers showed no racial bias in the decision to shoot. Taken together, the study revealed that, while DPD officers do possess racial biases, they seem inhibited while making an important policing decision.

Having established the existence of implicit prejudice, but the ability for officers to avoid acting on that prejudice in the lab, the DPD then contacted another set of researchers to conduct research on how officer attitudes influenced their behaviors in the field (Goff, 2011). The first step of the project was executed in two phases and required unprecedented access to police personnel and records. The researchers conducted psychological experiments with officers (measuring racial prejudice among a number of other psychological factors) and then linked individual officer's psychological data to data in their personnel files including, but not limited to, citizen complaint history, disciplinary history, performance evaluations, and use of force history. This study marked the first time that researchers have been able to pair psychological predictors with police behaviors. By linking psychological indicators with problem behaviors, the research team was able to determine what factors were associated with racially disparate outcomes—distinguishing between racial disparities and discrimination in actual police behavior. In this design, racial bias can be measured simply by looking at the officers who are outside the normal distribution of behaviors as determined by their fellow officers—an internal benchmarking technique that provides a statistical test of so-called “bad apples.”

Although this analytic technique is not able to account for the possibility that an entire department holds high levels of police bias, it is able to identify the factors most strongly associated with elevated levels of bias within a department—something that was not possible previously. Additionally, if racially disparate application of an important behavior—such as higher rates of force used against Blacks—is associated with racial prejudice then this would indicate that the behavior is subject to racial bias and the department should devote resources to remedying the current state of affairs. Importantly, this research design sidesteps the issue of base rates for the most part by focusing, instead, on the degree to which associated factors (such as prejudice) are predictive. The degree to which these factors are, on their face, objectionable determines the degree to which a department may feel morally obligated to address the issue with its officers. Importantly, regardless of whether or not a factor is morally objectionable, by adopting a social psychological perspective to the question of disparate treatment, it is possible to reduce disparate treatment independent of casting blame on individual officers or police agencies.

The second phase of the research involved an examination of police training. Having revealed several factors (discussed, in part, below) that were strongly associated with biased policing, the researchers then tested newly selected police recruits on those same factors. Recruits were tested before they had started their training in the academy, after they had completed their academy training, and again, after they had completed their field training. This longitudinal data collected over several academy classes have allowed us to determine the degree to which police training augments or inhibits police bias. By testing a representative sample of Denver citizens, the research is also able to determine whether or not the DPD is recruiting new officers who are more or less likely to engage in unbiased policing than the average citizen. It is important to note that the DPD training targeting biased policing is the “Tools for Tolerance” training designed by the Simon Wiesenthal Center—the most popular diversity training module in U.S. law enforcement. Results in this second phase can then be used to inform future trainings both in Denver and across the country and—for the first time—create diversity-related trainings that target particular outcomes.

Taken together, this two-phase research design allowed the DPD and investigators to come to an understanding of what seems to predict problem behavior, and whether or not police training is helping or hurting. This design allows for direct testing of the relationship between bias and police discrimination—as well as establishing some key criteria for discrimination. These criteria are the “bad apples” criteria, regarding officers who engage in behavior that is statistically outside the norm for their department, and the “strong association” criteria, regarding objectionable predictors (i.e., prejudice) that are strongly associated with problem behaviors (i.e., disparate use of force).

While data from the training portion of the data collection are still being analyzed, one key finding from the officer psychological attitudes and in the field behavior matching bear repeating here. The first is that, while racial prejudice was a factor in predicting racially disparate police behaviors, it had less of an influence than officer vulnerabilities to identity-based threats. Specifically, officer's concern with appearing racist (Goff, 2011) and male officer's concern with demonstrating their manhood were better predictors of differential use of force against non-White citizens. It may seem illogical to suggest that a police officer's fear of being seen as racist could cause him or her to use excessive force against Black citizens, but this finding can be explained by the fact that officers are trained to remain in control of a situation in order to maintain a safe environment. Officers have two forms of authority at their disposal to affect that control, their moral authority, which is the authority that officers believe that they should occupy the moral high ground in nearly all interactions, and that citizens tend to respect both that perception and the authority inherent in the officer's position. When a citizen fails to respect that moral authority, police are empowered to use physical means to control a situation. Of course, when an officer is confronted with a group of Black or Latino suspects who accuse him or her of being racist, moral authority is not an option. Consequently, interactions between officers and non-White suspects may become physical as a result of the officer's fear of having lost moral authority—an ironic consequence of an officer's egalitarian concerns in this policing context. This result suggests that situations, rather than prejudice, may be responsible for racial disparities in the field.

These intriguing early successes of the Denver project, however, should be treated cautiously. A combination of laboratory experiments and field data must continue to shed light on the relationship between psychological factors and police behaviors. Just as importantly, as promising as these data may be, it is crucial to determine how broadly the findings in Denver generalize to the broader culture of policing.


Having heard about the successes in Denver and facilitated by the efforts of research innovators like Jennifer Eberhardt, police chiefs began to reach out to social psychologists (and quantitative social scientists in general) to conduct similar research. To coordinate these collaborations, a number of researchers and law enforcement executives created the CPLE. The aim of the research consortium was to conduct rigorous and original empirical research on equity issues within law enforcement contexts.

The CPLE was also organized to avoid some of the common pitfalls outlined earlier, particularly around culture clashes between law enforcement and the academy. For instance, most CPLE collaborations are structured by legal documents that protect the rights of police departments and researchers. Partner law enforcement departments are able to spell out concrete deliverables that they would appreciate receiving from the collaboration, and are protected from researchers speaking out about the process in the press without the consent of law enforcement. Similarly, researchers usually provide advanced notice to police departments (in advance of submitting research for publication), and departments are able to opt to have their names removed from the publication prior to anything being submitted. Researchers, on the other hand, are protected against law enforcement interests attempting to influence how data are described or what, when, or where the data are published. In an attempt to demonstrate objectivity to both the scientific and municipal communities, the CPLE also does not accept funding from municipal law enforcement partners.

This collaborative model has many advantages. For instance, with multiple partner departments (currently, the CPLE lists 14 partner departments),2 the attitude-behavior matching approach undertaken in the Denver project will be replicated in multiple cities throughout the country. This expansion will permit both a test of the generalizability of the Denver example and a possible hierarchical modeling approach to the problem—eventually permitting an analysis of departmental influence on racial disparities. Similarly, with access to such large and diverse data sets, it is possible to tie racial attitudes to outcomes tests, analyze individual officer search decisions using signal detection analysis, and link officer attitudes about their department to community perceptions and officer behaviors. Considering the wealth of opportunities occasioned by such broad law enforcement/researcher collaborations, deciding what to do first may be as important (and difficult) as conducting the research at all.

This is why, in the summer of 2010, 26 of the largest law enforcement agencies in the United States and Canada, along with researchers, federal stakeholders, and advocacy groups, convened to set a research agenda on equity in law enforcement (Goff et al., 2010). The resulting document, The Contract for Policing Justice, provides a roadmap for research in three areas: racially disparate impact, immigration policy issues, and organizational equity concerns. Within the first category, researchers and law enforcement executives agreed that there was not a single “silver bullet” method to use when assessing racial bias in policing. Rather, an approach that uses multiple indicators seems superior. By examining the relationship between consent search “hit rates,” officer psychological dimensions, and careful benchmarking techniques, it may be possible to diagnose whether or not (and/or the degree to which) a department engages in biased policing. This symptomological approach would require access to large numbers of officers and departmental data within a significant number of departments—something the signatories of The Contract for Policing Justice have already secured.

Taken together, this promising emerging research, the resulting research agenda, and countless budding police/research collaborations are reason to hope that the road ahead contains an increase in our knowledge about racial equity in policing. With continued dedication, perhaps both the cultures of law enforcement and the academy may evolve to make collaboration even easier. In the meantime, it is heartening to see that, with continued persistence, methods for distinguishing disparities from discrimination may be in sight. And, with the continued support of both researchers and police leadership, we may soon know something closer to what we should.

What's Next? Policy Solutions for a More Equitable Police Profession

From 2001 to 2005, the Community Oriented Policing Services Office commissioned three reports on racial profiling that outlined both practical issues in racially biased policing and much of the best criminological research on the topic (Fridell et al., 2001; Fridell, 2004, 2005). These guides represented the beginnings of a federal initiative to pool the knowledge of practitioners and researchers on racial profiling for the betterment of both. That project, however, remains unfinished. New research on social cognition and contemporary psychological mechanisms of racial bias must still be incorporated into policing best practices (Goff, in press; Goff et al., 2010). Similarly, the burgeoning overlap between traditional social scientists and law enforcement practitioners is still in need of infrastructural support to aid in carrying out the extensive research plans.

To that end, expanded federal and private funding for police/researcher collaborations will be crucial to the growth of our understanding about police bias. Additionally, a series of federally funded workshops to elaborate on the recently negotiated national agenda for equity in law enforcement would capitalize on the growing momentum in the sector. Out of the glare of Congress, it is possible that practitioners (who want to reduce racial bias in their departments) and researchers (who want to measure it) could produce databases that would be less politically contentions than the one called for by the End Racial Profiling Act, and yet more productive in the measurement and reduction racial bias in policing. Similarly, a well-funded independent commission on bias in law enforcement (which would not be tasked with identifying levels of bias within individual agencies, but rather as central agency to promote productive collaborative research) could provide both the methodological tools and the representative sample necessary to solve the measurement problems outlined earlier.

We recommend the following concrete steps for law enforcement to take in order to improve data on racial bias in policing and for practitioners to develop more informative research on racial disparities and discrimination in law enforcement:

  • 1Without demographic information about stops and other important contact outcomes (i.e., arrests, use of force, complaints, etc.), it is not possible to measure or reduce racial disparities. Consequently, police departments should uniformly collect demographic data for every relevant statistic kept.
  • 2The attitude/behavior matching approach used in Denver is an important tool for understanding the relative role of officer bias in the production of racial disparities. A representative national sample using this approach could provide invaluable insight into if, when, and how much officer biases influence civilian outcomes.
  • 3Similarly, a multisite investigation of officers’ attitudes and behaviors could be paired with an analysis of departmental policies to produce a hierarchical model of racial disparities. To the degree that policies predict disparity—as opposed to efficiency—they could also be identified and altered.
  • 4In both of the above two research methodologies, disparities could be measured using multiple outcomes tests such as consent search analyses (i.e., “hit rates”) and racial distributions of stops, use of force, and complaints. Importantly, these analyses should account for contextual variables by controlling for geographic location, time of day, officer and suspect height and weight, and number of officers present, among other factors.
  • 5As predictors of racial disparities are revealed, researchers should also continue working with law enforcement to provide interventions (i.e., trainings) that reduce disparities.
  • 6Perhaps most importantly, throughout these collaborative relationships between law enforcement and researchers, neighborhoods and communities must be included in the process in order to ensure full democratic participation in the pursuit of police equity.

Despite the sizable obstacles to understanding as much as we should about policing racial bias, these steps (in addition to the momentum already in place) represent a genuine opportunity to measure—and reduce—racism in policing. With the continued necessary support, the relationships that have begun producing innovative research can turn into ones that produce community-changing police policies. And, at the very least, we may come to know as much as we should about equality in policing.


  • 1

    Though, ostensibly, outcomes tests can be conducted at an officer-level of analyses, these analyses are too rare at present to constitute a research literature.

  • 2

    More information is available online, at http://www.policingequity.org.

PHILLIP ATIBA GOFF received his Ph.D. in Social Psychology from Stanford University in 2005. From there he joined the faculty at The Pennsylvania State University where he initially developed the research paradigm that would pave the way for the Consortium for Police Leadership in Equity, which he co-founded with Tracie L. Keesee, Ph.D. (a Division Chief in the Denver Police Department) in 2008. While on leave at the Russell Sage Foundation, Dr. Goff joined the psychology department faculty at the University of California, Los Angeles 2008, where he continues conducting both theoretical and translational research on contemporary racial and gender inequality. His principal research interest is to identify mechanisms that account for the apparent contradiction between declining racial bigotry and persisting racial inequality.

KIMBERLY BARSAMIAN KAHN received her Ph.D. in Social Psychology from the University of California, Los Angeles in 2010. From there, she enjoyed a post-doctoral fellowship year at the Center for Social Research and Intervention (CIS) at Lisbon University Institute (ISCTE-IUL) in Lisbon, Portugal. In 2011, Dr. Kahn joined the psychology department faculty at Portland State University where she is an assistant professor. Her principal research interest addresses forms of bias and discrimination that tend to go unacknowledged by the broader culture (e.g., discrimination based on racial stereotypicality or bias that targets those who themselves confront discrimination).