• Open Access

The OPTION scale: measuring the extent that clinicians involve patients in decision-making tasks

Authors


Glyn Elwyn
Professor, Primary Care Group
School of Medicine
University of Wales Swansea
Grove Building
Singleton Park
Swansea SA2 8PP
UK
E-mail: g.elwyn@swansea.ac.uk

Abstract

Objective  To examine the psychometric properties of a revised scale, named ‘observing patient involvement in decision making’ (OPTION), by analysing its reapplication to a sample of routine primary care consultations. The OPTION instrument assesses to what degree clinicians involve patients in decision making.

Design  Cross-sectional assessment of medical interaction by two calibrated raters.

Setting  Primary care.

Participants  Twenty-one general practitioners provided 186 consultations for assessment.

Measurements  Observational score using the OPTION instrument.

Results  Compared with the first version of the OPTION scale, the revised scale that uses a magnitude instead of an attitude scale, when applied to the same data set, resulted in improvement in the scale's reliability and to lower scores for the levels of involvement achieved by the practitioners. Factor analysis confirms that it is acceptable to regard the scale as a single construct. Although there is moderate variability when raters are assessed on an item by item basis, the agreements between raters at the level of the overall OPTION score is high (the intraclass correlation coefficient scores for total OPTION score was 0.77), a level that is acceptable for the evaluation of a set of consultations per practitioner (e.g. between 5 and 10 consultations), where aggregate scores would be used for determining overall performance.

Conclusions  We conclude that OPTION is sufficiently reliable to be used for formal assessment at the level of the whole instrument (all 12 items).

Introduction

Although there is an increasing call for clinicians to involve patients in decision making about health-care interventions, it is by no means clear how this communication strategy should be achieved and measured. A review of the literature revealed that there was a lack of validated tools for this purpose.1 The overall intent of the communication strategy, often known as ‘shared decision making’,2 is for the patient to be made aware that there are important decisions to be considered, and that these decisions cannot be taken by the clinician alone. Patients will need to be encouraged and assisted to take on the task of understanding the relevant information and to share their values and views with clinicians. The principles of shared decision making have been described and reviewed.3 The skills (competences) have been elucidated and discussed.4

It should be acknowledged that a debate exists as to whether ‘shared decision making’ can or should be undertaken in all clinical interactions. One school of thought argues that shared decision making should only be implemented where there is a ‘genuine’ choice operating, and refer to a classification of clinical situations into those that should follow a standard of care, a guideline or to situations where options may be legitimately discussed with patients. This view argues that the measurement of shared decision making could only take place where the clinical situation warrants the provision of options. In contrast to this view, the ‘observing patient involvement in decision making’ (OPTION) scale takes a different conceptual stance and accepts that it is difficult, if not impossible, to judge where and when patients will want to partake in decisions. Some will not want to learn of options even when there is evidence of definite uncertainty. What is important conceptually is that the possibility of shared decision making is enhanced, and for this to happen, clinicians have to involve patients in the process of understanding the nature of the problem, understanding that there are uncertainties and different likelihoods of harms and benefits and finally that the patient can, if they wish, influence the decision itself. The OPTION scale regards these steps as constituting the process of involving patients in decision making. It does not purport of measure shared decision making, although the authors are convinced that a shared decisions-making process could take place in consultations with low OPTION scores.

A previous publication described how a scale that measured the extent to which clinicians involved patients in decision making was developed and validated (OPTION scale).5 This work described a theoretical construct, provided details of scale design stages and item formulation.6 However, it was recognized that the instrument had aspects that required further attention.5 In particular, difficulties had been encountered with the scaling characteristics and with the phraseology and order of some items. An attitudinal scale had been used and this had led to an overuse of a midpoint that represented uncertainty about the evaluation. The aim of this paper is to address the aspects that required attention and to report an improved, definitive instrument that could be used to measure the extent to which clinicians involve patients in decision-making processes – a tool that is available to researchers and educators for use in research and skill development.

Methods

The study examined the psychometric characteristics of a revised OPTION scale applied to a sample of audiotaped consultations, collected from the routine clinics of 21 general practitioners. Approval to conduct the work was obtained from the Gwent Local Research Ethics Committee. To conduct the validation study, the revised scale was used by two non-clinical lecturers in social sciences who remained independent of each other and of the research team, and who were trained in its use. These two raters had used the previous OPTION scale version to assess the same set of consultation recordings.5 The recordings were taken from the recruitment phase of a research study, specifically a trial of shared decision making and risk communication. As part of the recruitment process to the study, general practitioners in Gwent, South Wales, were asked to audiotape consecutive consultations during a routine consulting session in general practice. To be eligible, the practitioners had to have worked in general practice for at least 1 year and <10 years. The potential sample of 104 practitioners in 49 practices were invited to participate. As far as we are aware, these volunteer practitioners were naïve to the concepts that we were measuring and had not been exposed to any training or educational interventions that could have influenced their proficiency in this area. Details of the recruitment process have been published elsewhere.5 In order to test inter-rater reliability, the two raters rated all the consultations independently and a random sample of 21 consultations (one per clinician) was selected for repeated ratings by the two raters in order to examine intra-rater reliability.

The revised scale

The main difference between the previously published scale and the scale used in this study was the manner in which the observable competences were rated. Raters were previously asked to consider whether they agreed or disagreed (on a five-point scale ranging from strongly agree to strongly disagree) whether certain skills (such as ways to deal with a problem, preferred patient approach to receiving information, etc.) were observed during the consultations. It was reported that this led to difficulties, whereby raters used the score of ‘3’ to indicate indecision regarding the competences. It was apparent that this uncertainty had multiple origins; there was uncertainty whether the competence was present or not, and also uncertainty as to whether the activity was undertaken with a high degree of skill or not. We suspect that this tendency to use the midpoint for both types of uncertainty inflated the OPTION scores as it is a recognized problem of attitudinal scales.7 In order to address this problem, the revised scale was designed to measure the magnitude of skill rather than the attitude towards the described competences. The score ‘0’ was allocated to the situation where the competency described was not observed, other scores (1 to 4) were allocated to increasing levels of achievement for the described competence (see Table 1). Minor alterations to wording and some re-sequencing of items was also performed.

Table 1.  The revised scale scoring guidance
Scale scoreDefinition
0The behaviour is not observed
1A minimal attempt is made to exhibit the behaviour
2The clinician asks the patient about their preferred way of receiving information to assist decision
3The behaviour is exhibited to a good standard
4The behaviour is observed and executed to a high standard

In order to address this problem, the revised scale was designed to measure the extent of skill rather than agreement about observable behaviours towards the described competences, a shift from an attitudinal to a magnitude based scale. A detailed manual was developed that defined the levels of observed behaviour that should be scored. The score ‘0’ was allocated to the situation where the competency described was not observed, other scores (1 to 4) were allocated to increasing levels of achievement for the described competence (see Table 1). Minor alterations to wording and some re-sequencing of items was also performed.

Analysis

The data were analysed by studying the responses to each item, the scale reliability was assessed with inter-item and item-total correlations, summarized by Cronbach's α. Rater agreement was assessed by using Cohen's kappa (on a five-point scale and a two-point binary scale) and inter- and intra-rater agreement was assessed using intraclass correlation coefficient (ICC). ICC scores above 0.40, 0.60 and 0.80 were interpreted as fair, moderate and substantial agreement respectively.8 Exploratory factor analysis (oblique rotation taking eigenvalues of 1.1) was used to determine factor loadings.9 Further assessment using a forced one-factor analysis (oblique rotation) was also performed.

Results

Sample

Of the potential sample pool of 104 practitioners, 21 general practitioners (from separate practices) provided a tape of a routine clinic prior to receiving detailed information about the proposed research trial.10 These practitioners represented a slightly younger group than the sampling frame: average age 38 years, the male to female ratio was identical to the sampling frame (38% female); 76% (16/21) of the general practitioners recruited had been successful in the membership examination of the Royal College of General Practitioners, compared with an overall membership level of 54% in the sampling frame, and could have been expected perhaps to show skills associated with a more recently trained and motivated group. Of the 242 consecutive patients approached in all practices, 12 (5%) declined to have the consultation recorded (the maximum refusal in any one practice was three patients in a series of 15 patients).

The remaining 230 consultations were assessed and after removing consultations where there were technical recording problems, 186 consultations were available for analysis (average of 8.8 consultations per practitioner). There was no age and sex difference between the consultations excluded because of poor recordings compared with those included for analysis. One practitioner recorded five consultations but the majority recorded eight or more consultations. Consultations with women were twice as frequent in the sample and 66% of the patients seen were between 30 and 70 years of age. The demographic and clinical characteristics of the recorded consultations have been reported previously;5 in summary, there were 126 female and 60 male patients; the patient ages were between 4 months to 83 years, the mean duration of consultations was of 8.2 min and the majority of consultations dealt with respiratory, musculoskeletal, dermatological and psychological problems, the typical spectrum of general practice.

Rating patterns

All items with the exception of items 8 and 9 showed a predominance of zero scores (see Table 2 for baseline scores). Items 8 and 9 showed the most variation across the scale, although results were still confined to the lower scores. None of the items had a score that exceed ‘2’ for any consultation. There were no missing scores. Compared with previously published results,5 these results show a greater level of scoring consistency and a lower level of missing values.

Table 2.  Option item response (%) for two observers, Cohen's kappa and intraclass correlation (ICC)
OPTION item01234Kappa*ICC
  1. *Kappa scores are for five-point scale agreement. Scores in brackets are for agreement across binary scale points (no involvement/involvement).

 1. The clinician draws attention to an identified problem as one that requires a decision-making process
 Observer 196.43.60000.53 (0.52)0.33
 Observer 293.64.51.800
 2. The clinician states that there is more than one way to deal with the identified problem (‘equipoise’)
 Observer 191.86.41.8000.88 (0.88)0.93
 Observer 291.86.41.800
 3. The clinician assesses the patient's preferred approach to receiving information to assist decision making
 Observer 199.10.90000.98 (0.98)0.98
 Observer 21000000
 4. The clinician lists ‘options’, which can include the choice of ‘no action’
 Observer 1909.10.9000.64 (0.76)0.77
 Observer 284.511.83.600
 5. The clinician explains the pros and cons of options to the patient (taking ‘no action’ is an option)
 Observer 195.54.50000.70 (0.70)0.70
 Observer 291.88.2000
 6. The clinician explores the patient's expectations (or ideas) about how the problem(s) are to be managed
 Observer 195.54.50000.56 (0.56)0.56
 Observer 298.21.8000
 7. The clinician explores the patient's concerns (fears) about how problem(s) are to be managed
 Observer 195.53.60.9000.51 (0.59)0.61
 Observer 292.77.3000
 8. The clinician checks that the patient has understood the information
 Observer 191.88.20000.08 (0.10)0.11
 Observer 236.461.81.800
 9. The clinician offers the patient explicit opportunities to ask questions during the decision-making process
 Observer 156.443.60000.45 (0.48)0.48
 Observer 26038.21.800
10. The clinician elicits the patient's preferred level of involvement in decision making
 Observer 199.10.90000.98 (0.98)0.98
 Observer 21000000
11. The clinician indicates the need for a decision-making (or deferring) stage
 Observer 110000000.84 (0.84)0.84
 Observer 291.88.2000
12. The clinician indicates the need to review the decision (or deferment)
 Observer 179.1200.9000.67 (0.67)0.61
 Observer 279.120.9000

Factor analysis

Exploratory factor analysis (oblique rotation taking eigenvalues of 1.1) was used to determine factor loadings.9 The scree plot showed the presence of two factors, the distribution of questions to each factor revealed a Cronbach's α of 0.80 for the first factor and a Cronbach's α of 0.44 for the second factor. Cronbach's α based on all 12 items was 0.68. The 12-item single factor solution explained 28% of the variability. Items 1 and 8 had low inter-rater reliability. The removal of these items produced a small improvement in scale performance (32% of variability explained). We could not determine a pattern to item loadings on the two factors and because the scale was developed to match an agreed set of competences,4 a judgement was made that it was best to use a single factor to retain the 12-item scale in its entirety.

Inter-rater reliability

With the exception of item 8, the five-point scale Cohen's κ scores ranged from 0.45 to 0.98, indicating acceptable inter-rater agreement after correcting for chance.8 Aggregating the rating scores to produce a two-point binary scale showed similar kappa values (see Table 2). The inter-rater ICC for the total OPTION score was 0.77, with values ranging from 0.11 to 0.98 for the individual items, which again showed good levels of agreement, with the exception of item 9. They compare well with inter-rater reliability from the 2001 data, but the overall scores for the practitioners are lower than rated previously as a result of changing from the attitude to magnitude measures (see Table 3 and Figure 1 for results and graphical representation). For all 12 items, the mean Cohen κ score was 0.66, indicating acceptable inter-rater agreement for this type of instrument, after correcting for chance.11 Item 8 had the lowest kappa and ICC scores. Item 9 also showed a difference in inter-rater reliability. Items 8 and 9, therefore, need attention to definition in a revised manual and a focused calibration of raters. No changes have been made to item wording however. Compared with previously published results,5 these scores indicate a marginal improvement in reliability, identical kappa scores and an increase in inter-rater ICC from 0.62 to 0.77.

Table 3.  Mean transformed OPTION 2003 and 2001 scores (0 = min, 100 = max) for each practitioner
GP numberMean OPTION 2003 score (95% CI)Mean OPTION 2001 score (95% CI)Number of consultations
 15.42 (2.07, 8.76)25.42 (15.25, 35.58)5
 42.08 (0.73, 3.44)18.23 (11.26, 25.20)4
201.04 (−0.87, 2.96)13.54 (7.64, 19.44)5
2110.00 (4.47, 15.53)37.08 (28.67, 45.49)5
223.33 (0.85, 5.82)19.32 (3.91, 34.73)5
251.88 (−1.35, 5.10)15.42 (2.93, 27.90)5
276.04 (−1.41, 13. 49)31.36 (13.01, 49.71)5
292.71 (−1.02, 6.43)21.67 (−1.35, 44.68)5
303.13 (−0.42, 6.67)19.58 (12.59, 26.57)5
313.33 (1.65, 5.02)23.15 (16.14, 30.11)5
321.88 (−2.64, 6.39)12.71 (−6.22, 31.64)5
373.75 (−0.50, 8.00)23.54 (9.53, 37.55)5
422.29 (0.61, 3.98)Not rated5
432.29 (−0.02, 4.61)22.50 (15.50, 29.50)5
461.46 (−0.02, 2.93)31.00 (19.41, 42.60)5
472.08 (0.04, 4.13)21.50 (9.98, 33.01)5
501.25 (−0.88, 3.38)15.63 (13.21, 18.04)5
523.96 (0.62, 7.31)29.17 (19.23, 39.10)5
532.50 (1.34, 3.66)23.96 (16.36, 31.56)5
545.21 (−2.61, 13.02)30.42 (11.73, 49.10)5
551.04 (1.04, 1.04)17.92 (14.70, 21.14)5
573.13 (0.71, 5.54)20.63 (14.13, 27.12)5
Figure 1.

OPTION scores (2001 and 2003 scale).

Test–retest reliability

The test–retest data were based on a reduced sample of one consultation per practitioner (n = 21) where the raters scored the consultations for a second occasion. The retest results confirmed a predominance of low scores. The inter-rater ICC for the total OPTION score was 0.53. This ICC was slightly lower than the score of 0.77 achieved for the ratings achieved for the full set of consultations using this scale. At individual item level, Cohen's κ measured on a five-point scale ranged from −0.05 to 1 indicating good agreement for some items, but poor agreement for others. ICC scores at individual item for intra-rater reliability ranged from −0.05 to 0.66 for observer 1, and 0 to 0.66 for observer 2. Despite having only weak ICCs for individual items, the ICC for the total OPTION scores showed a good level of agreement for both observer 1 (0.82) and observer 2 (0.65). As with the initial ratings, test–retest data confirm that the OPTION instrument cannot be regarded as reliable at the individual item level (see ICC scores in Table X but when summed, OPTION ICC scores indicate substantial agreement according to suggested interpretations).8

Discussion

Principal findings

When compared with the first version of the OPTION scale, the revised scale, when applied to the same data set, has resulted in a small improvement in the scale's reliability and to lower scores for the levels of involvement achieved by the practitioners. Factor analysis confirms that it is acceptable to regard the scale as a single construct. We conclude therefore that OPTION is sufficiently reliable to be used for formal assessment at the level of the whole instrument (all 12 items) but that the scale can be used with more flexibility in professional education settings.

Although there is moderate variability when raters are assessed on an item by item basis, the agreements between raters at the level of the overall OPTION score is high (ICC scores for total OPTION score was 0.77), a level that is acceptable for the evaluation of a set of consultations per practitioner (e.g. between 5 and 10 consultations), where aggregate scores would be used for determining overall performance. It should be noted that the scale is only scored at, or close to, the ‘floor’ level. The vast majority of scores given were ‘0’ or ‘1’. Although it could be argued that this is a weakness of the scale in that it does not display sensitivity to existing practice, we prefer to argue, based on parallel work in discourse analysis, that the scale reflects the reality of current routine practice. Clinical encounters do not typically contain examples of practitioners displaying the skills of shared decision making. We therefore consider the scale both reliable and valid to use in research contexts. We have demonstrated the scale's ability to show an increase in skill levels, as exhibited in a clinical controlled trial reported elsewhere.10 We briefly considered the characteristics of a binary scale, where a ‘yes’ or ‘no’ answer could be given to observable competences for each item (Table 2). A simplified binary scale could be used, for instance, in educational contexts, where scores could be given out of a maximum of 12, indicating success or otherwise at addressing each specific competence, with the possibility of rapid feedback in a formative learning context.

Strengths and weaknesses

The major strength of this study is that for the first time a scale has been developed which can be used as a valid and reliable measure for shared decision making in clinical encounters. It builds on a rigorous development path, addresses the weaknesses of the previous instrument and replicates the assessment of a set of consultations taken from day to day practice using the same calibrated raters. The double rating is recommended if the instrument is to be used for research at the level of individual consultations. If however the aim is to achieve an overall ‘involvement’ score at the practitioner level, provided there are at least five consultations available per clinician, we consider the scale reliable enough if single ratings are undertaken. Changing the scale from an assessment of attitude to an assessment of magnitude (observable skills) has added to our confidence in the assessment of skill attainment at the practitioner level.

The major weakness is the recognized clustering of low scores: scores of ‘0’ or ‘1’ predominate. It could be argued that the scale has been poorly designed, that it has no ability to discriminate between practitioners who are working in routine contexts to the best of their ability. It could also be argued that, on the basis of this data, there could be no confidence that the scale could be sensitive to skills at a higher level or to increases in skill attainment. However, data from a controlled trial (using the previously published scale) demonstrates that the scale is capable of detecting changing skill levels.10 Although in ideal circumstances, we would re-establish these finding with a larger sample and in different settings, we conclude that these results are based on the accurate use of a valid and reliable scale in routine clinical contexts – that practitioners with no previous training in shared decision making achieve very low levels of patient involvement in decision making.

Findings in context

A systematic review of instruments that aimed to measure shared decision making did not reveal the presence of reliable valid instruments in this field.1 It is therefore difficult to undertake comparative studies in order to establish concurrent and criterion validity. It may be possible in the future to undertake comparative studies with instruments that have been recently published that aim to undertake similar evaluations.12 It will be important to continue to study the characteristics of the OPTION tool in different clinical contexts, as it is known that different clinical specialities have a different cultural ethos. If the OPTION tool was used to evaluate the consultations of clinical geneticists, for example, it may be that that their scores would be different, given that consultations in genetics are of significantly greater duration and that there is an increased awareness of the need to involve patients in the generic assessment process.13

It is important to bear in mind that the results were observed in general practitioners working in day-to-day settings, having had no special exposure to the concepts of shared decision making and that the duration of the consultation were, on average, 8 min. It is becoming widely recognized that achieving greater levels of patient involvement requires additional time.10 In other words, current practice militates against involving patients in decisions, and even with the most highly skilled communicators in primary care, we would be surprised if substantially higher levels of patient involvement could be achieved without at least a 50% increase in consultation duration (i.e. 10–12 min per consultations). It is important, however, to concede that time, although necessary, is not sufficient on its own.14 There is evidence from the membership examination of the Royal College of General Practitioners that confirms this view. Given the opportunity to provide their ‘best’ consultations for assessment and where it is known that shared decision making will be among the most valued criteria, doctors fail to demonstrate these competences.15 Such findings are confirmed when more qualitative methods are used to analyse medical practice.16–18

Implications

The implications of this work can be summarized as follows. This instrument provides a means of assessing to what degree clinicians involve patients in decision making. To meet the increasing call to inform patients about the harms as well as the benefits of interventions, the lack of the necessary communication skills and the barriers to their development and implementation need to be addressed at policy levels. How best to develop these skills is a matter for educationalists at both undergraduate and postgraduate levels, while the uncertainties about the outcomes of involving patients will require further investigation.19 As part of this work, the OPTION tool provides a means of assessing progress.

Ancillary