Health research is increasingly being conducted within multi-disciplinary teams. Collaborations can include sociologists, anthropologists, psychologists, clinical academics, economists, epidemiologists and statisticians. Service user researchers (those who have experienced the service or treatment being researched as patients or recipients) are increasingly becoming a part of teams. Working collaboratively means that a single research question can be explored from multiple perspectives. Different perspectives and standpoints inevitably produce different ways of understanding and interpreting phenomena. This means that a research question can be explored and understood with greater depth, breadth and richness.
Whilst service user researchers are increasingly being involved in collaborative research as interviewers, they/we are less likely to be involved in the analysis of data. This means that they/we can have no influence on how data are interpreted once it has been collected, and so an alternate and important way of viewing the data will be wholly absent. If included in data interpretation, service users’ different standpoints can generate new ways of understanding and explaining phenomena. For instance, a recent research study of psychiatric inpatient detention found that analyses by service user researchers focussed on the experiences and feelings of participants, whilst university researchers focussed on processes and procedures.
Multiple coding is broadly accepted as an important means of increasing the quality of qualitative data analysis. Yet, it can also provide an important method for engaging service users in team-based qualitative data analysis. In this study, we aim to use an empirical report of collaborative data analysis to reflect on the extent to which multiple coding enabled the voice of a service user researcher to be heard. We begin by describing multiple coding and its main alternative, inter-rater reliability. We then give a descriptive account of a study of Cognitive Behavioural Therapy for psychosis (CBTp), which included a service user researcher. We discuss the ways in which multiple coding enables collaborative research teams to engage service users in qualitative data analysis, concluding that without the use of multiple coding, a significant, different and novel perspective on the data will be wholly absent.
There are many different terms used to describe people who use mental health services. These terms can vary depending on geographical location and perspectives on mental health and distress. We have elected to use the term ‘service user’ throughout this article as it relates to the receipt of CBTp. Other terms include patient, consumer and survivor.
There are two main methods that engage multiple analysts in team-based qualitative data analysis: multiple coding and inter-rater reliability. Multiple coding involves individual analysts discussing their findings to understand the similarities and differences in their interpretations. This typically reveals new insights and explanations, creating an enriched account of the data. Thus, ‘The greatest potential of multiple coding lies in its capacity to furnish alternative interpretations’. This is particularly valuable where different interpretations are generated through alternative standpoints, as in multi-disciplinary team-based analysis.
The main alternative to multiple coding is inter-rater reliability, a technique originally used in quantitative research to establish whether two or more independent raters evaluate data in the same way. This technique has been imported into qualitative research where it involves raters coding a section of transcript and calculating their level of agreement. In some cases, a kappa score is calculated, which allows for weighting of disagreement. The aim is to minimize differences so that reliability between raters can be asserted. Thus, the key difference between multiple coding and inter-rater reliability is that in the former, points of non-consensus are discussed to enrich understanding of the data whilst in the latter these disagreements are simply observed and counted.
Numerous arguments have been made against the use of inter-rater reliability in qualitative data analysis. These include that having a simplified coding frame achieves high reliability at the cost of richness and complexity, making the resulting analysis superficial. It has also been suggested that the knowledge a person builds over the life of a study cannot be conveyed to an independent rater. Yardley believes that agreeing coding rules boundaries the imaginative and rich interpretative aspects of qualitative analysis. She further argues that two people can be trained to code a transcript in the same way, yet that is all it would be, an interpretation agreed by two people.
Inter-rater reliability is at times the most appropriate option. An obvious example is where researchers wish to code and count a very large data set at a highly manifest level, for example using content analysis. Under these circumstances, training multiple coders to reliably apply the same coding frame can be appropriate. Morse has also argued that qualitative research that aims to describe real or concrete phenomena (what she terms direct data) can usefully employ inter-rater reliability. She continues:
But as we move into research that uses more interpretive types of analyses via indirect data, such methods of verification are no longer pertinent and may even invalidate the analysis, keeping it superficial.
Given the complexities of health conditions and services, it is likely that researchers will want to understand the data beyond its manifest or superficial content. Furthermore, much of the data that qualitative researchers employ is – what Morse describes as – semi-direct or indirect. Inter-rater reliability with such data takes place at the cost of richness and depth and is not the best method for exploring latent content or etic (researcher rather than participant defined) constructs. Moreover, inter-rater reliability is unable to harness the benefits of working in teams that contain multiple perspectives.
A descriptive account of multiple coding: The CBTp study
The CBTp study aimed to create an outcome measure of CBT for psychosis that reflected the perspectives and priorities of both therapists and service users. To do this, a consultation exercise was held with CBTp therapists to identify their priorities. In the first instance, a round table discussion was held with five experts in the field of CBTp which generated a list of topics for discussion in focus groups with service users. The topics were refined and finalized following written feedback from three of the original experts as well as eight practising therapists working in a specialist clinic for CBTp (the Psychological Interventions Clinic for outpatients with Psychosis – PICuP – based at the South London and Maudsley NHS Foundation Trust, London). This led to six topics for discussion in focus groups: (i) mood and emotions; (ii) empowerment, self-confidence and self-esteem; (iii) understanding and managing problems; (iv) day-to-day functioning and quality of life; (v) experiences of psychosis; and (vi) relationships.
Focus groups were attended by service users who had attended the PICuP clinic for CBTp within a fixed 24 month period (n = 76). Twenty-five attendees were excluded because they were under 18, were deemed too unwell or distressed to participate, had discontinued therapy or had received therapy from the focus group facilitator. Of the 51 people deemed eligible for participation, 12 agreed to participate (23.5%). This is a convenience sample of volunteer service users.
All participants gave informed written consent prior to participating. A total of three focus groups were held (including one pilot) to discuss people's priorities for CBTp outcomes. During the first half of the focus groups participants discussed their own priorities for CBTp outcomes. In the second half of each group, participants discussed the importance and relevance of the six CBTp therapist generated topics. Participants gave written feedback on their priorities during the closing stages of focus groups. Groups were facilitated by a service user researcher (AS) and clinical academic (KG). Groups were audio taped, with additional notes taken by a psychology assistant (SW).
The data (excluding the pilot) were analysed following the analytic strategy described below. Once the outcome measure was drafted, it was subject to a number of further stages of development.
Ethics approval for the study was granted by the South London and Maudsley NHS Trust and the Institute of Psychiatry Ethical Committee (243/03).
The analytic strategy
The broad goal of the analytic strategy was to generate CBTp outcomes that could form the basis of an outcome measure. Analysis involved three core analysts: a psychology assistant (SW), a clinical academic with experience of delivering CBTp (KG) and a service user researcher with experience of receiving CBT (AS). Thematic analysis was employed to generate results that were succinct, focussed on the research aims (generating items for an outcome measure) and grounded in participants’ experiences and perspectives.[14, 15]
An analysis strategy was written and agreed, and two focus groups were independently analysed by each analyst (the pilot was used to test the analytic strategy and was not used in the final analysis). A qualitative data analysis software package was used (MAXqda, VERBI Software, Research GmbH, Berlin, Germany) to enable efficient and systematic storage, organization and retrieval of data.
The first step was to generate deductive codes. These were drawn from existing literature and the results of the consultation exercise held with CBTp academic experts and therapists led by KG. Deductive codes were entered into the software package and the transcripts were repeatedly read for instances of these codes. Data were further interrogated for similarities and patterns, leading to inductive codes. Throughout this process, use was made of memos in two key ways. First, memos were attached to code titles to describe their evolving meaning. This was particularly relevant to deductive, therapist defined codes, where the original meaning of the code may have altered through the analysis. Second, memos were embedded in the text to capture analyst's early ideas about the data.
Each analyst then examined the content of each code and memo and produced a careful summary which included emerging ideas about the patterns and connections within the data, including where codes and concepts linked; illustrative quotes, with a particular focus on disagreements and contrasting experiences (negative instances); and ideas for areas which would fruitfully be clarified or followed up in further stages of the research.
The process of multiple coding
The final stage of the analysis was to engage fully with the process of multiple coding. Each analyst brought their descriptions of and reflections about the data to an all day meeting. We began by sharing our experiences on the process of conducting the analysis. One analyst then described their codes in detail as the starting point for discussion (KG). At the end of this process, any undiscussed codes were shared by the remaining analysts.
Much of our discussions revolved around the similarities, differences, connections and patterns within and between our codes. This was akin to ‘constant comparison’ but rather than being undertaken as a lone activity, comparisons were made between our analyses. Our discussions revealed high levels of agreement between analysts, and we were able to reach a consensus about the data. This synthesis was enabled by careful listening, lengthy and full discussion, a joint reflection on shared codes and ideas and an interweaving of our interpretations.
Discussions also revealed different opinions on how ideas and themes should be grouped together. Where these grouping differences arose, they were discussed until a list and description of codes felt to encapsulate the data was achieved. Where there were genuine differences of opinion, these were debated in a way that enabled us to move towards a fuller, shared understanding of the data. This meant debating alternative explanations and interrogating our interpretations for validity and resonance with the data. To do this, we each reflected on our judgements and accepted competing explanations where they had better resonance or fit with our participants’ discussions.
Where a code was only identified by one analyst, it was explained and discussed. If there was consensus that the code had not already been captured, it was added to the final list. As the aim of the analysis was to move towards a list of codes or themes that could be considered outcomes for CBTp, all codes were included, even where only one analyst had identified them. Codes were also generally included where only one participant had raised the issue.
Through discussion and debate, codes were changed, adapted, redefined, refined, integrated and abandoned until we arrived at a list of codes that, as far as possible, reflected our shared perception of participant's discussions.
On occasion, we were unable to reach consensus. Our deferred resolution was to take these points of non-consensus to individual interviews with focus group participants (excluding the pilot focus group members) along with areas that we felt would benefit from further clarification. In total, nine focus group participants each participated in two semi-structured interviews (although one participant was unable to attend the second interview and so gave written feedback). One month intervals occurred between the focus groups, first interviews and second interviews to allow time for analysis and preparation. As well as commenting on points of non-consensus from the thematic analysis, interviewees considered whether the final list of outcomes made sense of the focus group discussions; the importance and feasibility of outcomes; the specific language used for outcomes; and the format of the measure. A modified Delphi Technique (DT) was employed to help move towards a group consensus on the final items for the measure. In the first interviews, participants commented on and rated the importance of items. Participants were shown their importance rating for each item along with the group mean importance rating in the second interviews. Participants then re-rated the importance of each item.
The results of multiple coding
The primary results of the discussions can be found in Table 1. Agreement between analysts was high. For example, analysts identified broadly equivalent numbers of codes under the categories of ‘more positive thoughts, feelings and experiences’ and ‘understanding self and experiences’. In many instances, codes had been given identical labels such as ‘stopping symptoms’ or ‘seeing things differently’. This was partly due to the fact that coding was predominantly descriptive (enabling description and organization of data) and manifest (directly observable rather than underlying) because we were not seeking underlying meanings, but organizing and describing participants desired outcomes for CBTp. Moreover, the aim of multiple coding was to reach a consensus about the data. This consensus was built through discussion and reflection, which aimed to move us towards a shared understanding of the data based on our independent interpretations.
Table 1. Categorization of outcomes, identified outcomes and numbers of outcomes identified by each analyst
|Feeling normal/existing in the world||Feeling normal, being able to exist in the world||2||1||2|
|Greater control of self, thoughts and symptoms||Gaining ownership of your own thoughts, stopping symptoms, being more in control, seeing thoughts and feelings as different from reality, being able to switch off from overwhelming mental activity, managing crises||6||5||3|
|More positive thoughts, feelings and experiences||Peace of mind, improved mood, having more positive ways of thinking, developing hope, feeling more confident, feeling safe, seeing things differently, changed experience of the world, beliefs have less negative power||7||7||8|
|Coping strategies||For life in general, for everyday distressing experiences, developing a different approach to problems||3||3||3|
|Path to recovery||Turning point/breakthrough, preventing relapse, getting back on track||3||0||2|
|Understanding self and experiences||Understanding yourself, understanding your experiences, making sense of experiences||2||3||2|
|Heterogeneity||Outcomes and experiences are individual||1||0||1|
A small number of differences between analysts were also apparent. Most notably, the clinical researcher, and to a lesser extent the psychology assistant, identified more outcomes related to control over self, thoughts and symptoms than the service user researcher. Furthermore, during the discussions, we agreed that many participants expressed a strong desire for CBTp to teach them coping strategies. However, we were unable to agree which coping strategies were prioritised by participants. As Table 2 demonstrates, all analysts identified coping strategies relating to everyday living as priorities. However, the clinical academic and psychology assistant further identified a range of symptoms and emotions, whilst the service user researcher and psychology assistant identified coping strategies for experiences.
Table 2. Coping strategies perceived as priorities by each analyst
|Clinical researcher (KG)||Life in general, living in the world, everyday, voices, delusions, fear, stress and anxiety, emotions/feeling nothing|
|Psychology assistant (SW)||Existing in the world, voices, experiences, emotions|
|Service user researcher (AS)||Life in general, day-to-day living, experiences|
As a result of the individual interviews with participants, we were able to resolve our disagreements about some codes. This was because the participants decided whether the outcomes made sense, were reflective of their discussions, were relevant and important to them, and should be included. A number of items were dropped – such as ‘understanding that thoughts are not facts’ – because they did not make sense to participants, and the wording of numerous items was modified. The Delphi analysis demonstrated a progression towards a consensus, with a significantly smaller difference between individual and group ratings of outcomes in the second interviews compared to the first. This led to a final draft outcome measure which was subjected to psychometric testing.
The experience of multiple coding
There was an initial need to develop an individual and in-depth understanding of the data, which helped make subsequent discussions rich, meaningful and grounded. As a written analysis strategy was developed, these individual analyses were conducted in a similar and systematic way. We were then able to engage in a lengthy discussion that explored our similarities and differences, enabling us to move towards a consensus on the final CBTp outcomes. Depth and detail were retained through exploring patterns and connections across our analyses and how themes and ideas could be interwoven.
It was found that having three analysts meant that more themes were identified. This is because rather than the analysis resting on one person, each member of the team contributed to the identification and development of themes, with the whole becoming greater than the sum of its parts. This finding supports Ahuvia who has stated that ‘the theoretical sensitivity of a group of researchers working together is likely to be higher than any member of the group working alone’.
Although the differences between us were far outweighed by our consensus, it was also apparent that our backgrounds influenced how we understood the data. For example, the clinical and service user researchers each identified the value of listening and talking, key to the therapeutic alliance. Yet only the clinical researcher and psychology assistant recognised the importance of distancing, a meta-cognitive process. Although our aim was to build consensus, there were a few occasions when our interpretations of the data were so different we were unable to reach agreement. These points of non-consensus were critical in exposing where our perspectives bore most strongly on the interpretation of the data, and where our experiences made our standpoints and worldviews incommensurate. Our deferred resolution was to take these points of non-consensus back to individual interviews with research participants to enable them to explain what they meant in their own words. This passed decision-making power to the participants, and we experienced it as a positive way of resolving our disagreements.
Clinicians typically occupy a position of dominance and power in relation to service users. Moreover, clinical researchers may struggle to acknowledge the ‘double identity’ of service user researchers; that is, that service user researchers have research expertise as well as experience of using mental health services. This means that the service user researcher's voice could have been overwhelmed in the process of multiple coding as it so heavily relies on the willingness to listen, debate and concede. We believe that in our team, the research expertise of all members was recognised as valid and so all voices contributed to data interpretation. Arriving at a full and rich individual understanding of the data before attempting to synthesize our findings was important in giving weight and legitimacy to each voice. Similarly, Barry and colleagues have commented:
reviewing individual analysis efforts and reanalyzing as a group improves the rigor of the analysis and provides a safeguard against any team member dominating the analysis.
To understand our competing interpretations, it was important that we understood one another's standpoint. Therefore, we shared our perspectives on CBTp at the outset of the study. We revisited these standpoints during our discussions and considered how they were influencing our understanding of the data. This reflexive approach was important in helping us to appreciate alternative understandings.
We found that a number of factors made multiple coding feasible. First, we were a core team of three analysts, meaning that we each had time to explore and express our perspectives within the meeting. Second, we analysed a small data set (two focus groups). Finally, we allowed ample time for our task, setting aside a full day. This meant that we were able to fully utilise the strengths of multiple coding and had the time to reach broad consensus. Yet despite having a full day's discussion, we were unable to agree participants’ priorities for coping strategies or the specific language of the CBTp outcomes, and this stage of the process was deferred. We had not anticipated this, and it demonstrated the importance of allowing sufficient time for the process of multiple coding. This is perhaps a feature of collaborative data analysis, Barry et al. commenting:
Several writers talk about how time-consuming the team approach is and the financial implications of this… Reaching consensus and exploring everyone's ideas are time-consuming.
Hearing the voices of service users in collaborative analyses
The CBTp study involved researchers from different backgrounds working collaboratively. We wanted to ensure that all perspectives, including that of the service user researcher, influenced the interpretation of data. Moreover, rather than counting the differences between us, we wanted to explore our differences to arrive at a truly collaborative understanding of the data. Multiple coding is a common means of improving the quality of qualitative data analysis. But we found that it is also a key means of enabling service users’ perspectives to influence the way that data are interpreted and represented, once collected.
Driedger and colleagues have commented that convergent interviewing – a method that also draws on multiple perspectives in data analysis – increases the credibility of analysis. This is because codes are socially constructed between researchers and participants, and convergent interviewing – like multiple coding – enables analysts:
to be accountable for our assumptions and theoretical perspectives at the forefront and provides the opportunity to make such assumptions known.
Exploring and understanding alternate perspectives and standpoints in multi-disciplinary teams increases the credibility of interpretations. This is because alternate and competing explanations have been considered before consensus is reached, or occasionally, before non-consensus is agreed. Furthermore, including service user researchers in multi-disciplinary teams can in and of itself increase the ecological validity of research through the inclusion of a perspective and understanding borne of lived experience. Trivedi and Wykes have similarly reflected on the benefits that multiple standpoint-based interpretations can bring to data analysis:
although the type of data analysis may be fixed, the interpretations of data may vary considerably depending on who is doing the interpreting.
Despite finding high levels of consensus, our discussions also exposed points of fracture or non-consensus. These fracture points were suggestive of the role of clinical and personal experiences in analysts’ sensitization to data: the clinical academic was trained to understand from the perspective of clinical practice, and so was sensitised to hearing the comments that fell within that framework (most notably those relating to symptoms). Conversely, the service user researcher drew on experiences of receiving CBT and a social model of understanding mental distress, and so was sensitised to hearing data within that context (most notably her emphasis on experiences). Both the service user researcher and clinical psychologist were familiar with the concept of recovery, whilst this was not identified by the less experienced psychology graduate. This finding is supported by two additional studies that have compared analyses between clinical or university academics and service user researchers.
First, Cotterell, a palliative care nurse researcher, worked with a group of service users with life-limiting conditions to understand more about the experiences and palliative care needs of others with life-limiting conditions. Cotterell found that his analysis produced different results to that of the service user group, with Cotterell focussing on the professional and service users focussing on the emotional and the critical. To Cotterell, this demonstrated that service users are ‘“agents of knowledge” and, despite substantial difficulties, can also be willing and able to contribute to new knowledge production’.
Second, in a study of service users’ experiences of involuntary psychiatric detention, Gillard and colleagues compared the analyses of three university researchers and three service user researchers. Whilst there were no differences in the types of interview questions asked by each group, there were significant differences in how they interpreted and coded data: service user researchers focussed on the experiences and feelings surrounding detention and coercion, whilst the university researchers focussed on process, medication and patient behaviours. They argue that neglecting either perspective in the analysis would result in an incomplete picture of service users’ lived experiences of detention.
Gillard and colleagues' findings have resonance with our study. Yet, rather than conducting and contrasting separate analyses, our aim was to reach a shared understanding of the data informed by our multiple identities. Whilst discussions inevitably revealed fracture points, we attempted to resolve these through discussion and debate and our overall levels of agreement were high.
It is important to stress that service user researchers can have different roles within an analytic strategy. The aim may be to explore whether service user and clinical researchers offer the same accounts of the data, as in Cotterell's study. Conversely, the aim may be to test whether service user and other researchers interpret data differently, as in Gillard's paper.23 For us, the key aim was to arrive at a collaborative understanding of the data. In each case, different approaches to collaborative coding will be needed. Multiple coding has a particular role to play in exploring differences and in arriving at a consensus that is enriched by multiple perspectives.
Multiple coding or inter-rater reliability?
Most qualitative researchers would acknowledge that when attempting to represent the expressed realities of our participants, we are working through a lens of experience and personal history. We can attempt to be explicit about our biases and worldview through reflexive practice in order to understand how they might affect our interpretations and emphases, but we cannot leave them at the door altogether. Working in a multi-disciplinary team carries the advantage that multiple perspectives interpret data. Multiple coding can harness multiplicity by enabling discussion of individual team member's understandings of the same data. Through these discussions, the richness of the data and the creativity and imagination needed for the craft of analysis remain integral. The analysts are exploring and discussing data from the position of having immersed themselves in it, and therefore have a rich understanding. In this way, the appropriate use of multiple coding enriches the analytic process.
If this project had employed standard inter-rater reliability alone we would have been able to report that three raters coded a section of transcript with a specified percentage level of agreement. This would have demonstrated that three people were able to agree about what piece of text goes into which code, but would have revealed nothing about the subtle differences in meaning between them or how these decisions were reached. This contrasts with multiple coding which seems to hold particular advantages to those working collaboratively. By engaging fully in multiple coding, the craft of analysis was not lost; analysis remained interpretive, evolving, thoughtful and grounded in the data. The process of multiple coding enriched our shared understanding and improved the content of codes and number we were able to identify. Finally, multiple coding enabled us to build a consensus about the data that drew on multiple perspectives. This will become increasingly important as health research is more frequently conducted in multi-disciplinary teams including service user researchers.
Strengths and limitations
The participants in this study were a convenience sample of volunteers from a clinic for CBTp. This means that the items generated for the outcome measure may not be generalizable. However, the aim of this paper is not to present the results of our study but to discuss the method used to generate our findings.
The data used in the study were predominantly descriptive and manifest because we were aiming to identify outcomes for CBTp. Whilst such data may be suitable for inter-rater reliability, we hope to have demonstrated the unique added value of multiple coding. Thus, multiple coding can be employed in different types of studies and with different types of qualitative data to engage service users in collaborative analyses.
The CBTp study employed multiple coding to build a consensus about the data. Consequently, we did not test whether there were differences between analysts or between methods of qualitative team-based data analysis. Therefore, our work is limited to a discussion of our chosen method, multiple coding.
Service users are increasingly being included in multi-disciplinary teams throughout health research. Their role is often confined to that of interviewer and there is growing evidence that this is of benefit both to those being interviewed and to the research more broadly. However, we believe that this role – whilst valuable – is limited. This is because service users bring unique perspectives and experiences to data interpretation. Multi-disciplinary research teams must find ways of including service user researchers in interpretive work so that their role is not reduced to that of data collector. In this study, we have demonstrated that multiple coding is an appropriate means of achieving this.
We would like to acknowledge the important contribution of the Steering Group to the design and development of the CBTp study: Phillipa Garety, Elizabeth Kuipers and Emmanuelle Peters, based at the Institute of Psychiatry, and Jan Scott, based at Newcastle University. Til Wykes and Diana Rose acknowledge the support provided by the NIHR Biomedical Research Centre for Mental Health at the South London and Maudsley NHS Foundation Trust and Institute of Psychiatry, King's College London.
This work was supported by a South London and Maudsley NHS Foundation Trust Research and Development grant.
Conflict of interest
No conflicts of interest have been declared.