Qualitative Data Analysis for Health Services Research: Developing Taxonomy, Themes, and Theory


  • Elizabeth H. Bradley,

    1. Department of Epidemiology and Public Health, Yale University School of Medicine, 60 College Street, New Haven, CT 06520-8034,
    Search for more papers by this author
    • Address correspondence to Elizabeth H. Bradley, Ph.D., Professor, Department of Epidemiology and Public Health, Yale University School of Medicine, 60 College Street, New Haven, CT 06520-8034. Leslie A. Curry, Ph.D., Associate Professor of Medicine, is with the University of Connecticut School of Medicine, Farmington, CT. Kelly J. Devers, Ph.D., Associate Professor, is with the Departments of Health Administration and Family Medicine, Virginia Commonwealth University, Richmond, VA.

  • Leslie A. Curry,

    1. University of Connecticut School of Medicine, Farmington, CT
    Search for more papers by this author
  • Kelly J. Devers

    1. Departments of Health Administration and Family Medicine, Virginia Commonwealth University, Richmond, VA
    Search for more papers by this author


Objective. To provide practical strategies for conducting and evaluating analyses of qualitative data applicable for health services researchers.

Data Sources and Design. We draw on extant qualitative methodological literature to describe practical approaches to qualitative data analysis. Approaches to data analysis vary by discipline and analytic tradition; however, we focus on qualitative data analysis that has as a goal the generation of taxonomy, themes, and theory germane to health services research.

Principle Findings. We describe an approach to qualitative data analysis that applies the principles of inductive reasoning while also employing predetermined code types to guide data analysis and interpretation. These code types (conceptual, relationship, perspective, participant characteristics, and setting codes) define a structure that is appropriate for generation of taxonomy, themes, and theory. Conceptual codes and subcodes facilitate the development of taxonomies. Relationship and perspective codes facilitate the development of themes and theory. Intersectional analyses with data coded for participant characteristics and setting codes can facilitate comparative analyses.

Conclusions. Qualitative inquiry can improve the description and explanation of complex, real-world phenomena pertinent to health services research. Greater understanding of the processes of qualitative data analysis can be helpful for health services researchers as they use these methods themselves or collaborate with qualitative researchers from a wide range of disciplines.

Qualitative research is increasingly common in health services research (Shortell 1999; Sofaer 1999). Qualitative studies have been used, for example, to study culture change (Marshall et al. 2003; Craigie and Hobbs 2004), physician–patient relationships and primary care (Flocke, Miller, and Crabtree 2002; Gallagher et al. 2003; Sobo, Seid, and Reyes Gelhard 2006), diffusion of innovations and quality improvement strategies (Bradley et al. 2005; Crosson et al. 2005), novel interventions to improve care (Koops and Lindley 2002; Stapleton, Kirkham, and Thomas 2002; Dy et al. 2005), and managed care market trends (Scanlon et al. 2001; Devers et al. 2003). Despite substantial methodological papers and seminal texts (Glaser and Strauss 1967; Miles and Huberman 1994; Mays and Pope 1995; Strauss and Corbin 1998; Crabtree and Miller 1999; Devers 1999; Patton 1999; Devers and Frankel 2000; Giacomini and Cook 2000; Morse and Richards 2002) about designing qualitative projects and collecting qualitative data, less attention has been paid to the data analysis aspects of qualitative research. The purpose of this paper is to offer practical strategies for the analysis of qualitative data that may be generated from in-depth interviewing, focus groups, field observations, primary or secondary qualitative data (e.g., diaries, meeting minutes, annual reports), or a combination of these data collection approaches.


Qualitative research is well suited for understanding phenomena within their context, uncovering links among concepts and behaviors, and generating and refining theory (Glaser and Strauss 1967; Miles and Huberman 1994; Crabtree and Miller 1999; Morse 1999; Ragin 1999; Sofaer 1999; Patton 2002; Campbell and Gregor 2004; Quinn 2005). Distinct from qualitative work, quantitative research seeks to count occurrences, establish statistical links among variables, and generalize findings to the population from which the sample was drawn. Although qualitative and quantitative methods have historically been viewed as mutually exclusive, rigid distinctions are increasingly recognized as inappropriate and counterproductive (Ragin 1999; Sofaer 1999; Creswell 2003; Skocpol 2003). Mixed methods approaches (Creswell 2003) may include both methods employed simultaneously or sequentially, as appropriate.


There is immense diversity in the disciplinary and theoretical orientation, methods, and types of findings generated by qualitative research (Yardley 2000). The many traditions of qualitative research include, but are not limited to, cultural ethnography (Agar 1996; Quinn 2005), institutional ethnography (Campbell and Gregor 2004), comparative historical analyses (Skocpol 2003), case studies (Yin 1994), focus groups (Krueger and Casey 2000), in-depth interviews (Glaser and Strauss 1967; McCracken 1988; Patton 2002; Quinn 2005), participant and nonparticipant observations (Spradley 1980), and hybrid approaches that include parts or wholes of multiple study types. Consistent with the pluralism in theoretical traditions, methods, and study designs, many experts (Feldman 1995; Greenhalgh and Taylor 1997; Sofaer 1999; Yardley 2000; Morse and Richards 2002) have argued that there cannot and should not be a uniform approach to qualitative methods. Nevertheless, some approaches to qualitative data analysis are useful in health services research. In this paper, we focus on strategies for analysis of qualitative data that are especially applicable in the generation of taxonomy, themes, and theory (Table 1). Taxonomy is a formal system for classifying multifaceted, complex phenomena (Patton 2002) according to a set of common conceptual domains and dimensions. Taxonomies promote increased clarity in defining and hence comparing diverse, complex interventions (Sofaer 1999), which are common in health policy and management. Themes are recurrent unifying concepts or statements (Boyatzis 1998) about the subject of inquiry. Themes are fundamental concepts (Ryan and Bernard 2003) that characterize specific experiences of individual participants by the more general insights that are apparent from the whole of the data. Theory is a set of general, modifiable propositions that help explain, predict, and interpret events or phenomena of interest (Dubin 1969; Patton 2002). Theory is important for understanding potential causal links and confounding variables, for understanding the context within which a phenomenon occurs, and for providing a potential framework for guiding subsequent empirical research.

Table 1.   Selected Types of Results from Qualitative Data Analysis
TaxonomyFormal system for classifying multifaceted, complex phenomena according to a set of common conceptual domains and dimensionsIncrease clarity in defining and comparing complex phenomena
ThemesRecurrent unifying concepts or statements about the subject of inquiryCharacterize experiences of individual participants by general insights from the whole of the data
TheoryA set of general propositions that help explain, predict, and interpret events or phenomena of interestIdentify possible levers for affecting specific outcomes; guide further examination of explicit hypotheses derived from theory



There is no singularly appropriate way to conduct qualitative data analysis, although there is general agreement that analysis is an ongoing, iterative process that begins in the early stages of data collection and continues throughout the study. Qualitative data analysis, wherein one is making sense of the data collected, may seem particularly mysterious (Campbell and Gregor 2004). The following steps represent a systematic approach that allows for open discovery of emergent concepts with a focus on generating taxonomy, themes, or theory.

Reading for Overall Understanding

Immersion in the data to comprehend its meaning in its entirety (Crabtree and Miller 1999; Pope, Ziebland, and Mays 2000) is an important first step in the analysis. Reviewing data without coding helps identify emergent themes without losing the connections between concepts and their context.

Coding Qualitative Data

Once the data have been reviewed and there is a general understanding of the scope and contexts of the key experiences under study, coding provides the analyst with a formal system to organize the data, uncovering and documenting additional links within and between concepts and experiences described in the data. Codes are tags (Miles and Huberman 1994) or labels, which are assigned to whole documents or segments of documents (i.e., paragraphs, sentences, or words) to help catalogue key concepts while preserving the context in which these concepts occur.

The coding process includes development, finalization, and application of the code structure. Some experts (Morse 1994; Morse and Richards 2002; Janesick 2003) argue that a single researcher conducting all the coding is both sufficient and preferred. This is particularly true in studies where being embedded in ongoing relationships with research participants is critical for the quality of the data collected. In such cases, the researcher is the instrument; data collection and analysis are so intertwined that they should be integrated in a single person who is the “choreographer” (Janesick 2003) of his/her own “dance.” Such an analysis may not be possible to be repeated by others who have differing traditions and paradigms; therefore, disclosure (Gubrium and Holstein 1997) of the researcher's biases and philosophical approaches is important. In contrast, other experts recommend that the coding process involve a team of researchers with differing backgrounds (Denzin 1978; Mays and Pope 1995; Patton 1999; Pope, Ziebland, and Mays 2000) to improve the breadth and depth of the analysis and subsequent findings. Cross-training is important in the use of such teams.

Developing the Code Structure

The development of the code structure is an iterative and lengthy process, which begins in the data collection phase. There is substantial diversity in how to develop the code structure. This debate (Glaser 1992; Heath and Cowley 2004) centers on whether coding should be more inductive or more deductive. Regardless of approach, a well-crafted, clear, and comprehensive code structure promotes the quality of subsequent analysis (Miles and Huberman 1994).

Grounded Theory Approach to Developing Code Structure

For grounded theorists, the recommended approach to developing a set of codes is purely inductive. This approach limits researchers from erroneously “forcing” a preconceived result (Glaser 1992). Data are reviewed line by line in detail and as a concept becomes apparent, a code is assigned. Upon further review of data, the analyst continues to assign codes that reflect the concepts that emerge, highlighting and coding lines, paragraphs, or segments that illustrate the chosen concept. As more data are reviewed, the specifications of codes are developed and refined to fit the data. To ascertain whether a code is appropriately assigned, the analyst compares text segments to segments that have been previously assigned the same code and decides whether they reflect the same concept. Using this “constant comparison” method (Glaser and Strauss 1967), the researchers refine dimensions of existing codes and identify new codes. Through this process, the code structure evolves inductively, reflecting “the ground,” i.e., the experiences of participants.

More Deductive Approaches to Developing Code Structure

Some qualitative research experts (Miles and Huberman 1994) describe a more deductive approach, which starts with an organizing framework for the codes. In this approach, the initial step defines a structure of initial codes before line-by-line review of the data. Preliminary codes can help researchers integrate concepts already well known in the extant literature. For example, a deductive approach of health service use might begin with predetermined codes for predisposing, enabling, and need factors based on the behavioral model (Andersen 1995). Great care must be taken to avoid forcing data into these categories because a code exists for them; however such a “start list” (Miles and Huberman 1994) does allow new inquiries to benefit from and build on previous insights in the field.

An Integrated Approach to Developing Code Structure

An integrated approach employs both inductive (ground-up) development of codes as well as a deductive organizing framework for code types (start list). Previous researchers have identified various code types (Lofland 1971; Lincoln and Guba 1985; Strauss and Corbin 1990; Miles and Huberman 1994); however, five code types (Table 2) are helpful in generating taxonomy, themes, and theory, all of which have practical relevance for health services research. These code types are (1) conceptual codes and subcodes identifying key concept domains and essential dimensions of these concept domains, (2) relationship codes identifying links between other concepts coded with conceptual codes, (3) participant perspective codes, which identify if the participant is positive, negative, or indifferent about a particular experience or part of an experience, (4) participant characteristic codes, and (5) setting codes.

Table 2.   Code Types and Applications
Code TypesCharacterizationApplication/Purpose
Conceptual codes/subcodesKey conceptual domains and essential conceptual dimensions of the domainsDeveloping taxonomies; useful in themes and theory
Relationship codesLinks among conceptual codes/subcodesGenerating themes and theory
Participant perspectiveDirectional views (positive, negative, or indifferent) of participantsGenerating themes and theory
Participant characteristicsCharacteristics that identify participants, such as age, gender, insurance type, socioeconomic status, etc.Comparing key concepts across types of participants
Setting codesCharacteristics that identify settings, such as intervention versus nonintervention group, fee-for-service versus prepaid insurance, etc.Comparing key concepts across types of settings

Finalizing and Applying the Code Structure

The codes and code structure can be considered finalized at the point of theoretical saturation (Glaser and Strauss 1967; Glaser 1992; Patton 2002). This is the point at which no new concepts emerge from reviewing of successive data from a theoretically sensitive sample of participants, i.e., a sample that is diverse in pertinent characteristics and experiences. Theoretical saturation will take longer to accomplish for more multifaceted areas of inquiry with greater diversity among participants. If, during analysis, a conceptual gap is identified, the researcher should expand the sample to continue data collection to clarify and refine emerging concepts and codes. For instance, if an observation or interview elicits information about a concept that has not been heard or that contradicts previous understandings, the researchers should expand the sample to include participants and experiences to understand this new concept more fully. This use of the codes to guide data collection is known as theoretical sampling and is central to conducting qualitative research.

Applying the Finalized Code Structure

The application of the finalized code structure to the data is an important step of analysis. One approach to applying the finalized code structure to the data is to have two to three members of the research team re-review all the data, applying independently the codes from the finalized code structure. Then, the team meets in a group to review discrepancies, resolving differences by in-depth discussion and negotiated consensus. The result is a single, agreed upon application of the final codes to all parts of the data. This approach is reasonable and frequently used in the published literature. Another approach to applying the finalized code structure is to establish the reliability of multiple coders from the research team with a selected group of data. Once coders have been established to be reliable with one another, one of the coders completes the remainder of the coding independently. This approach can be more time efficient than the approach that requires the multiple coders to recode all data with the final code structure and then resolve disagreement by joint consensus. Intercoder reliability (Miles and Huberman 1994) can be evaluated by selecting new data (for instance, two to three transcripts that were not analyzed as part of the code development phase before theoretical saturation) and having two researchers code these data, using the finalized code structure. The two researchers code the transcripts independently and compare the agreement on coding used. One calculates the percentage of all segments coded, which are coded with the same codes, and some experts (Miles and Huberman 1994) have proposed 80 percent agreement as a rule of thumb for reasonable reliability.

The approach in each of the steps of qualitative data analysis reflects a balance of differing views among researchers. Formality, including quantifying intercoder reliability, may improve the ability of those less trained in qualitative methods to understand and value evidence generated from qualitative studies. However, overly mechanistic approaches or reliance on inexperienced qualitative analysts may dampen the insights from qualitative research (Morgan 1997). Formal rules and processes should not replace analytic thought itself. In any project, if the codes are not conceptually rich and are oversimplified in their separation from the context of their occurrence, the insights from the inquiry will be limited.



We focus on three types of output from qualitative studies—taxonomy, themes, and theory. These outputs can be helpful in a number of ways including, but not limited to, the fostering of improved measurement of multifaceted interventions; the generation of hypotheses about causal links among service quality, cost, or access; and the revealing of insights into how the context of an events might influence various health-related outcomes.


Taxonomy is a system for classifying multifaceted, complex phenomena according to common conceptual domains and dimensions. In health services research, we are often evaluating multifaceted interventions, implemented in the real world rather than controlled conditions. Qualitative methods provide a sophisticated approach to specifying the complexity rather than simple dichotomous characterizations of interventions (i.e., treatment versus control) common in quantitative research (Sofaer 1999). Furthermore, a common language or taxonomy that distills complex interventions into their essential components is paramount to comparing alternative interventions and promoting clear communication. Examples of taxonomy include classification systems for health maintenance organizations (Welch, Hillman, and Pauly 1990), integrated health systems (Gillies et al. 1993; Bazzoli et al. 1999), goal-setting for older adults with dementia (Bogardus, Bradley, and Tinetti 1998), and quality improvement efforts in the hospital setting (Bradley et al. 2001).

How does one move from the phase of applying the finalized code structure to generating and reporting taxonomy? If one has applied the code types as described above, then the structure of the taxonomy will mirror closely the conceptual codes and subcodes. Conceptual codes define key domains that characterize the phenomenon; conceptual subcodes define common dimensions within those key domains. Within each dimension, there may be further subdimensions depending on the complexity of the inquiry. Importantly, taxonomies identify domains and dimensions that are broad in nature. For example, in a taxonomy classifying quality improvement (Bradley et al. 2001), we defined six domains that comprise quality improvement efforts in the hospital setting: organizational goals, administrative support, clinician leadership, performance improvement initiatives, use of data, and contextual factors. Within the domain of organizational goals, there were four dimensions (i.e., content, specificity, challenge, sharedness of the goals). For each domain and dimension, the code represents the abstract concept, not the specific statement about that concept. For instance, a domain might be “nursing leadership,” as opposed to the statement, “there is strong nursing leadership here.” The difference is important to recognize as taxonomies describe a discrete set of axes or domains that characterize multifaceted phenomena.


Themes are general propositions that emerge from diverse and detail-rich experiences of participants and provide recurrent and unifying ideas regarding the subject of inquiry. Themes typically evolve not only from the conceptual codes and subcodes as in the case of taxonomy but also from the relationship codes, which tag data that link concepts to each other. For example, as in a study of health services integration (Gillies et al. 1993), three concepts were identified that might form a taxonomy of integration approaches: functional integration, physician integration, and clinical integration. However, the study also suggests that clinical integration requires success in function and, ideally, physician integration before full clinical integration can be achieved. This latter statement might be called a theme, a statement or proposition about how health system integration proceeds. The statement does more than just identify conceptual domains; it also suggests a relationship among the concepts. Similarly, a study of managing a safety-net emergency department (Dohan 2002) identified themes of patients using the emergency department for relief from social, not health, problems and the extreme financial stress that is part of every day in the department. The study also revealed how these tensions were managed, i.e., by defining patients as “interesting cases” and fostering an organizational obligation to provide uncompensated care.

Another approach to developing themes is to conduct a comparative analysis of concepts coded in different participant groups or setting codes. The researcher retrieves data coded with both a conceptual or relationship code and with a participant characteristic code (e.g., fee-for-service Medicare versus traditional Medicare). The comparison can assess whether certain concepts, relationships among concepts, or positive/negative perspectives are more apparent or are experienced differently in one group than in another. These kinds of comparisons are sometimes performed informally by researchers reading and comparing statements and observations; however, formal mechanisms including the use of truth tables (Ragin 1987, 1999) and explanatory effects matrices (Miles and Huberman 1994) to catalogue the presence of selected concepts among comparisons groups have also been implemented.


Theory emphasizes the nature of correlative or causal relationships, often delving into the systematic reasons for the events, experiences, and phenomena of inquiry. Theory predicts and explains phenomena (Kaplan 1964; Merton 1967; Weick 1995). Data tagged by relationship codes are essential to generating and reporting theory. A comprehensive theory will integrate data tagged with conceptual codes and subcodes as well as with relationship and perspective codes. Comparative analysis about group-specific differences is also sometimes used to develop theory.

Theory development can be less bewildering with consistent cataloguing of relationships among concepts, using the constant comparison method to generate inductively conceptual codes and subcodes as well as relationship codes. The process for developing theory is, nonetheless, diverse depending on the subject, the context, and the experience of the researcher. Illustrating theory development, a study of barriers to pediatric health care (Sobo, Seid, and Reyes Gelhard 2006), parents identified a set of six barriers that can limit access and use of critical pediatric services. The study then linked these barriers into a theory about the interaction of necessary skills and prerequisites, realization of access, the site of care, and parent/patient outcomes. Through its theoretical development, the study also suggests a new paradigm for understanding the biomedical health care system, likening it to a cultural system in which parents and patients needed to learn (or be acculturated) to function competently.


Qualitative research methodologies can generate rich information about health care including, but not limited to, patient preferences, medical decision making, culturally determined values and health beliefs, consumer satisfaction, health-seeking behaviors, and health disparities. Furthermore, qualitative methods can reveal critical insights to inform development, translation, and dissemination of interventions to address health system shortcomings. A clear understanding of such methodologies can help the field adopt and integrate qualitative approaches when they are appropriate. Taxonomies, themes, and theory produced with rigorous qualitative methods can be particularly useful in health services research. Taxonomies improve our description and hence, measurement and evaluation, of real-world phenomena by allowing for multiple domains and dimensions of multifaceted interventions. Themes and theory guide our research to explain and predict various outcomes within diverse contexts of the health care system. In this paper, we highlight an integrated approach to qualitative data analysis, which applies the principles of inductive reasoning and the constant comparison method (Glaser and Strauss 1967) while employing predetermined code types (conceptual, relationship, perspective, participant characteristics, and setting codes) to analyze data. A vast body of methodological work conducted over decades has produced impressive innovation and advancement in qualitative research techniques. This paper has sought to translate qualitative data analysis strategies and approaches from this methodological literature to enhance their accessibility and use for improving health services research.


Dr. Bradley is supported by the Patrick and Catherine Weldon Donaghue Medical Research Foundation and the Claude D. Pepper Older Americans Independence Center at Yale University. The authors are grateful to Emily Cherlin, MSW, for her research assistance on this project.