Patient-Reported Quality of Care for Osteoarthritis: Development and Testing of the OsteoArthritis Quality Indicator Questionnaire


National Resource Center for Rehabilitation in Rheumatology, Department of Rheumatology, Diakonhjemmet Hospital, PO Box 23 Vindern, N-0319 Oslo, Norway. E-mail:



To develop and test a new instrument for patient self-reported quality of osteoarthritis (OA) care, and to provide quality indicator (QI) pass rates in a Norwegian OA cohort.


The OsteoArthritis Quality Indicator (OA-QI) questionnaire was developed using published QIs, expert panels, and patient interviews. Self-reported data were collected from 359 persons in a Norwegian OA cohort, and test–retest reliability and validity were assessed. Separate QI pass rates and summary QI pass rates were calculated.


The 17-item questionnaire includes QIs related to patient education and information, regular provider assessments, referrals, and pharmacologic treatment. The patient self-reported questionnaire was completed with minimal respondent burden. Support for content validity was confirmed by 2 patient research partners and 2 expert panels. All 10 predefined hypotheses relating to construct validity were confirmed. Test–retest kappa coefficients ranged from 0.20–0.80 and the percentage of exact agreement ranged from 62–90%. The mean pass rate for individual QIs was 31% (range 5–49%). The median summary QI pass rate was 27% (interquartile range 12–50%), with lower summary pass rates for nonpharmacologic compared to pharmacologic treatments.


To our knowledge, this is the first instrument developed to measure patient-reported QI pass rates for OA care. This study indicates that the OA-QI questionnaire is acceptable to persons with OA, and its short format makes it suitable for population surveys. The low patient self-reported QI pass rates in this study suggest a potential for quality improvement in OA care.


Osteoarthritis (OA) is a joint disease characterized by pain, disability, and impaired quality of life. The OA prevalence increases with age and is growing due to the aging of the population and the epidemic of obesity ([1]). OA is one of the leading causes of pain and disability for the adult population worldwide ([2]), and the costs of treatment and work-related losses are a considerable economic burden ([3]).

OA treatment involves a wide range of pharmacologic and nonpharmacologic interventions. At present, no disease-modifying interventions are available, and pharmacologic treatment is mainly aimed at alleviating symptoms to prevent inactivity and functional loss. International recommendations and standards of care have been developed to improve OA management ([2, 4-8]). According to the guidelines from the National Institute for Health and Clinical Excellence (NICE), patient education, exercise, and weight reduction represent core interventions, whereas other pharmacologic and nonpharmacologic treatments, including acetaminophen, nonsteroidal antiinflammatory drugs (NSAIDs), functional assessments, assistive devices, and surgery, are considered adjunct treatments ([2]).

Transferring best evidence into practice is often challenging. Despite efforts to disseminate and implement OA evidence-based recommendations among health care providers, research studies reveal low consistency with published recommendations ([9-11]) and suggest that OA care is suboptimal ([12-18]). In order to further determine, monitor, and improve quality of care, quality indicators (QIs) based on standards of care may be used. Such QIs represent a minimally acceptable standard of care and describe process elements of care that should occur for a particular type of patient or clinical circumstance ([19]). The QIs can be used to evaluate whether the patient's care is consistent with the indicators, and at population levels, results may be reported as QI pass rates.

As part of a recent trend to improve the quality of care provided to patients with chronic diseases, a small number of QI lists for OA care have been developed ([20]), e.g., Assessing Care of Vulnerable Elders (ACOVE) QIs ([21]), Arthritis Foundation (AF) QIs ([22]), and the recent Health Care QIs (HCQIs) for OA from the project ([23]). The ACOVE and AF QIs were developed by an expert panel following systematic reviews using modified versions of the RAND/University of California at Los Angeles Appropriateness Method ([24]). The HCQIs for OA were developed from the standards of care on and refined by a group of researchers and patient representatives. All were developed to measure QI pass rates using medical records or health care provider questionnaires. The patient perspective of pass rates of QIs for OA has not been investigated. Patient self-reported QI pass rates may have some limitations with regard to accuracy and recall bias, but they provide an important aspect of the quality of OA care, since they mirror the care as perceived by the recipients. Self-reported QI questionnaires can include aspects of care that may be less reliably assessed from medical records, e.g., patient information or functional assessments. Therefore, a patient self-reported QI questionnaire may be a valuable supplement to existing QI instruments for OA. To our knowledge, this article presents the first validated instrument for evaluating patient self-reported QI pass rates for OA care.

The purpose of this study was 2-fold: first, to develop and test a new instrument for patient self-reported quality of OA care, the OsteoArthritis Quality Indicator (OA-QI) questionnaire, and second, to assess QI pass rates as reported by persons in a Norwegian OA cohort study.

Box 1. Significance & Innovations

  • The OsteoArthritis Quality Indicator questionnaire is the first validated instrument to measure patient-reported quality indicator pass rates for osteoarthritis (OA) care.
  • The low quality indicator pass rates suggest a potential for quality improvement in OA care in a Norwegian population-based OA cohort.
  • The median summary quality indicator pass rates for nonpharmacologic treatments were lower than for pharmacologic treatments.


Phase 1: instrument development

Literature search

Studies reporting QIs for OA care published between 2000 and 2010 were identified via structured searches of 4 electronic databases (Medline, Embase, CINAHL, and AMED) using the search terms quality of health care, standards of care, quality indicators (Health Care), performance indicator, guidelines (Standards), osteoarthritis, degenerative arthritis, and arthritis care. The searches resulted in 565 potentially relevant articles. The first author (NØ) screened titles and abstracts, and 26 articles were read in full text.

Identification of QIs

The identified QIs were compared and arranged by content. The QIs were critically judged in relation to responder comprehensibility, and questions were developed by a group of researchers (NØ, AG, MG, KBH) working within rheumatology and having experience with questionnaire design.


The instrument was piloted by means of a self-administered questionnaire with 13 persons with OA, followed by a brief interview that was designed to assess their understanding of the questionnaire. These were the first 13 persons attending the clinical examination described below.

Phase 2: data collection, reliability, and validity testing

Study sample and overview of the data collection

This study sample was recruited through a larger study, the Musculoskeletal Pain in Ullensaker (MUST) study. The MUST study is a population-based postal survey of all 12,371 inhabitants between ages 40 and 79 years in Ullensaker Municipality that is followed by a clinical examination of persons who self-report OA in their hands, knees, and/or hips. The currently ongoing clinical examination takes place at Diakonhjemmet Hospital and includes medical and cardiovascular assessments, functional tests, blood and urine samples, imaging, and validated self-reported questionnaires. Persons with self-reported OA attend the clinical examination a few months after the survey.

The study sample received written information about the study, and written consents were given before the baseline data collection was initiated. The study was approved by the Norwegian Regional Committee for Medical and Health Research Ethics (ref. 2009/1703a) and the Norwegian Data Inspectorate.


Demographic data were collected as part of the postal survey. Before attending the clinical examination, the invited persons completed another postal questionnaire that included the OA-QI questionnaire, the Knee Injury and Osteoarthritis Outcome Score ([25]), the Hip Disability and Osteoarthritis Outcome Score ([26]), the Australian/Canadian Osteoarthritis Hand Index ([27]), and questions regarding medication use and health care utilization in the past year.

Clinical examination

The American College of Rheumatology (ACR) clinical classification criteria for knee and hand OA were assessed during the clinical examination ([28, 29]). Classification of radiographic hip OA was defined as verified reduced minimal joint space according to the Danielsson criteria ([30]). Weight and height data were measured to calculate body mass index (BMI), and self-reported data on year of OA diagnosis and symptom onset were collected. Joint pain in the past week was reported on an 11-point numerical rating scale, and symptomatic OA (“joint pain, aching, or stiffness on most days in the previous month”) was recorded as “yes” or “no.”

Test–retest reliability

Two weeks after the clinical examination, the 99 persons examined between February 2011 and May 2011 were asked to complete a retest questionnaire, including the OA-QI items and the question, “Since attending the clinical examination, have you received any information, advice, or treatment for your osteoarthritis?”


The content validity of the OA-QI was assessed by 2 patient research partners and 2 expert panels; the first panel comprised experienced health professionals and researchers at Diakonhjemmet Hospital, and the second panel comprised a multidisciplinary OA research group. Missing data were examined as an indication of item relevance and acceptability. The authors devised 10 a priori hypotheses that were used to assess construct validity. The hypotheses reflect anticipated response patterns among contrasting subgroups in relation to BMI, pain level, functional ability, medication use, etc. For instance, we anticipated that the subgroup responding “no pain/discomfort” would report lower pain levels than the subgroup responding “yes” or “no” on QI 12 about pain assessment. According to Terwee et al ([31]), 75% of the redefined hypotheses should be confirmed.

Statistical analyses

The baseline variables for the test–retest subsample were compared with the remainder of the total sample using Student's t-test. Test–retest reliability was assessed by calculating total proportions of agreement with Cohen's kappa (with 95% confidence intervals [95% CIs]) and the percentage of exact agreement, i.e., the percentage of occasions of which the score was identical between the test and the retest. The statistical significance of predefined validity hypotheses was tested using Student's t-test, Fisher's exact test, or linear regression analyses, depending on the variable properties. The analyses were repeated with the sample split into 2 groups, i.e., persons with lower secondary school (9 years) and persons with upper secondary school (12 years) or university education.

QI pass rates were calculated for each QI separately (with 95% CIs) for the study sample as a whole, where the numerator represents the number of indicators passed (those reporting “yes”) and the denominator represents the number of eligible persons (those reporting “yes” or “no”). Correspondingly, summary pass rates for each person were calculated as the total number of QIs they passed divided by the total number of QIs for which they were eligible. Summary pass rates for persons with hand, hip, or knee OA (excluding those with OA in 2 sites) were calculated. Additionally, summary pass rates for core versus adjunct treatment according to the NICE guidelines (core treatment QIs 1–8, adjunct treatment QIs 9–11 and 13–17) ([2]) and for pharmacologic (QIs 13–16) versus nonpharmacologic (QIs 1–11) treatments were calculated. Due to the skewed distributions of the summary scores, percentages are shown as the median with interquartile ranges (IQRs). Missing data were ignored in analyses; therefore, the number of persons included in the analyses varies.


Instrument development

Literature search

Nine relevant studies were identified by the literature searches. Four studies were excluded: 1 was an overview of existing QI lists ([20]) and 3 were revised and presented in more recent publications ([21, 22, 32]). The remaining 5 studies included 8, 9, 4, 13, and 8 QIs, respectively ([12-14, 33, 34]).

Identification of QIs

Due to some overlap, 22 unique QIs were identified. Of these, one QI about overweight and primary prevention of OA was considered not relevant for the OA-QI. The study authors considered 9 QIs addressing pharmacologic therapy to be too detailed and unsuitable for patient self-report. One QI addressed whether they had been offered a maximum dose of acetaminophen before any other oral agent. Four QIs comprised assessment and information about the risk of gastrointestinal (GI) bleeding, liver toxicity, and other GI symptoms. The other 4 QIs were related to concomitant treatment with misoprostol or a proton-pump inhibitor in cases of GI bleeding risk and daily NSAID and/or aspirin use. The content of the remaining 12 QIs was used to formulate questions, which were translated into Norwegian by the authors. To avoid multiple aspects within a single question, 1 QI was split into 4 questions, and 2 other QIs were split into 2 questions each.

The 17 questions covered a single A4 page, with “yes”/“no” and “not applicable”/“don't remember” as response options. Six questions addressed patient education and information about disease development, treatment alternatives, self-management, lifestyle changes, weight management, and physical activity. Regular provider assessments were addressed in 4 questions. Four questions were related to pharmacologic treatment and 3 addressed different referrals. The final version of the OA-QI has been subject to a forward and backward translation procedure into English by 4 bilingual persons, including 2 with English and 2 with Norwegian as their native language (Table 1).

Table 1. The OsteoArthritis Quality Indicator Questionnaire
Questions on the treatment of your osteoarthritis
There are several different treatment alternatives for osteoarthritis. We would like to know what treatment, information, or advice that you have been given for your osteoarthritis. For each question, please cross off one of the boxes provided.
  YesNoDon't remember
1Have you been given information about how the disease usually develops over time?
2Have you been given information about different treatment alternatives?
3Have you been given information about how you can live with the disease?
4Have you been given information about how you can change your lifestyle?
5Have you been given information about the importance of physical activity and exercise?
6Have you been referred to someone who can advise you about physical activity and exercise? (e.g., a physiotherapist)
  YesNoNot overweight
7If you are overweight, have you been advised to lose weight?
8If you are overweight, have you been referred to someone who can help you to lose weight?
  YesNoNo such problems
9If you have had problems related to daily activities, have these problems been assessed by health personnel in the past year?
10If you have problems with walking, has your need for a walking aid been assessed? (e.g., stick, crutch, or walker)
11If you have problems related to other daily activities, has your need for different appliances and aids been assessed? (e.g., splints, assistive technology for cooking or personal hygiene, a special chair)
  YesNoNo pain/discomfort
12If you have pain, has it been assessed in the past year?
13If you have pain, was acetaminophen the first medicine that was recommended for your osteoarthritic pain?
14If you have prolonged severe pain, which is not relieved sufficiently by paracetamol, have you been offered stronger pain killers? (e.g., co-proxamol, co-dydramol, tramadol, co-codamol, dihydrocodeine, codeine)
15If you are taking antiinflammatory drugs, have you been given information about the effects and possible side effects of this medicine? (e.g., ibuprofen, Nurofen, Brufen, diclofenac, Voltarol, naproxen, Naprosyn, Celebrex)
16If you have experienced an acute deterioration of your symptoms, has a corticosteroid injection been considered?
  YesNoNot severely troubled
17If you are severely troubled by your osteoarthritis, and exercise and medicine do not help, have you been referred and assessed for an operation? (e.g., joint replacement)


The OA-QI was acceptable, easy to understand, and quickly completed with minimal respondent burden. Three people had some difficulty remembering information from the past. One person wondered if everyone would know whether they were overweight or not, and another person did not find a suitable response category because he had pain, but had not considered discussing this with his doctor. Some minor changes were made to the layout and wording of the instrument based on these findings.

Data collection, reliability, and validity testing

Study sample

Of the 619 persons invited, 359 (58%) attended the clinical examination between May 2010 and December 2011. Eight persons were excluded from the analyses: 3 had erroneously self-reported an OA diagnosis and 5 OA-QI questionnaires were missing. The mean age of the 351 persons included in analyses was 64 years, and 70% were women (Table 2). Among those self-reporting hand or knee OA, 69% and 58% fulfilled the ACR criteria, respectively, whereas 84% were classified with hip OA according to the Danielsson criteria. Of the total sample, 95% fulfilled ≥1 of these 3 OA classification criteria. The invited persons that did not want to attend the clinical examination had less education (P < 0.05), but were otherwise comparable to the study sample. There were no differences between the total sample and the test–retest subsample regarding baseline characteristics, except for significantly poorer self-reported knee and hip function in the total sample (P < 0.05).

Table 2. Characteristics of participants in the total sample and the test–retest subsample*
 Total sample (n = 351)Test–retest subsample (n = 80)
  1. BMI = body mass index; OA = osteoarthritis; NRS = numerical rating scale; AUSCAN = Australian/Canadian Osteoarthritis Hand Index; HOOS = Hip Disability and Osteoarthritis Outcome Score; KOOS = Knee Injury and Osteoarthritis Outcome Score.
  2. aScored from 0 (best) to 30.
  3. bFunction in daily living subscale is scored from 0 (best) to 68.
  4. cP < 0.05 by Student's t-test, grouping the variables total sample vs. test–retest subsample.
Women, no. (%)247 (70)55 (69)
Age, mean ± SD years64 ± 8.665 ± 7.9
Measured BMI, mean ± SD kg/m228.1 ± 4.427.7 ± 4.5
Marital status, no. (%)  
Married256 (73)60 (75)
Divorced/separated49 (14)11 (14)
Widowed36 (10)7 (9)
Single8 (2)2 (3)
Occupational status, no. (%)  
Working full time87 (25)22 (28)
Working part time48 (14)11 (14)
Age retired151 (43)35 (44)
Disability pensioner57 (16)12 (15)
Other8 (2)0
Education, no. (%)  
Lower secondary school72 (21)17 (21)
Upper secondary school161 (47)36 (45)
University 1–4 years78 (23)17 (21)
University >4 years30 (9)10 (13)
Frequency of physical activity, no. (%)  
Never7 (2)2 (3)
Less than once a week29 (8)8 (10)
Once a week69 (20)16 (20)
2–3 times a week178 (51)38 (48)
Almost every day67 (19)16 (20)
Comorbidity, no. (%)  
Other rheumatic diseases76 (22)10 (13)
Other chronic nonrheumatic diseases114 (33)26 (33)
No other rheumatic or chronic diseases185 (53)47 (59)
Self-reported OA site, no. (%)  
Hand147 (42)35 (44)
Hip138 (39)26 (33)
Knee173 (49)35 (44)
Time since OA diagnosis, mean ± SD years9 ± 7.09 ± 6.9
Time since OA symptom onset, mean ± SD years14 ± 1013 ± 9.1
Joint pain in the past week (11-point NRS), mean ± SD3.8 ± 2.33.5 ± 2.2
Joint pain or stiffness in the last month, no. (%) reporting yes271 (77)59 (74)
AUSCAN function subscale score (n = 138), mean ± SDa13.4 ± 7.211.6 ± 6.1
HOOS function subscale score (n = 135), mean ± SDb20.4 ± 13.215.3 ± 10.6c
KOOS function subscale score (n = 170), mean ± SDb19.4 ± 14.214.4 ± 13.2c
No. of health care visits in the past year, mean ± SD  
General practitioner4.2 ± 3.43.7 ± 2.8
Medical specialist1.4 ± 1.91.2 ± 1.6
Physiotherapist9.8 ± 18.37.0 ± 14.8

Test–retest reliability

Ninety (91%) of the 99 retest questionnaires were returned. Ten persons were excluded from the test–retest reliability analyses: 9 reported that they had meanwhile received information, advice, or treatment between questionnaires, and 1 had erroneously self-reported an OA diagnosis. The kappa coefficients ranged from 0.20–0.80, and the percentage of exact agreement ranged from 62–90% (Table 3). The majority of disagreement between test and retest scores was related to interchanges between the “no” and “not applicable”/“don't remember” response options. The results were similar for those with lower secondary school and those with upper secondary school or university education.

Table 3. Test–retest reliability (n = 80)*
Quality indicatorsEstimatea95% CIaPEA, %
  1. 95% CI = 95% confidence interval; PEA = percentage of exact agreement; NSAIDs = nonsteroidal antiinflammatory drugs.
  2. aCohen's kappa.
1Disease development0.460.24–0.6781
2Treatment alternatives0.340.13–0.5473
5Physical activity0.340.16–0.5163
6Referral physical activity0.580.40–0.7680
7Weight reduction0.700.56–0.8382
8Referral weight reduction0.800.67–0.9390
9Functional assessment0.370.20–0.5462
10Walking aid assessment0.200.00–0.4267
11Other aids assessment0.350.13–0.5668
12Pain assessment0.430.25–0.6168
14Stronger pain killers0.400.23–0.5863
17Referral to orthopedic surgeon0.620.48–0.7776


The 2 research partners and the 2 expert panel groups judged the items to be relevant for the OA population and for the purpose of the instrument, thus confirming the content validity of the instrument. All 10 predefined hypotheses regarding anticipated response patterns were confirmed (P < 0.05) (Table 4), which provides support for the instrument's construct validity. In these analyses, the smallest group size was 6 subjects, but for all other contrasting groups, the number was between 33 and 309, which is regarded as fair to excellent in validity analyses ([35]). The results showed that the contrasting subgroups responded as anticipated on QIs, e.g., persons responding “not overweight” had lower BMIs than those responding “yes”/“no” on QIs 7 and 8. Further, low scores on patient/doctor global assessment of OA disease were associated with high response frequency on the response alternatives of “no such problems/no pain/not severely troubled.” Higher health care utilization in the past year was associated with higher QI pass rates. The results were similar for the 2 education levels described above.

Table 4. Construct validity analyses of predefined hypotheses (n = 351)*
HypothesesAnalysis valueaHypothesis confirmed?
  1. BMI = body mass index; HOOS = Hip Disability and Osteoarthritis Outcome Score; KOOS = Knee Injury and Osteoarthritis Outcome Score; AUSCAN = Australian/Canadian Osteoarthritis Hand Index; OA = osteoarthritis; NS = not significant.
  2. aThe number of subjects in the contrasting groups is given. When several analyses or a chi-square test are performed, the range of subjects is provided.
  3. bStudent's t-test; grouping variable “not overweight/no such problems/no pain/discomfort/not severely troubled” vs. “yes”/“no.”
  4. cFischer's exact test; “yes” vs. “no”/“no pain/discomfort” and medication use vs. nonuse.
  5. dStudent's t-tests for items 9–17 separately; grouping variable “no such problems/no pain/discomfort/not severely troubled” vs. “yes”/“no.” Linear regression analysis; total number of “no such problems/not pain/not severely troubled” scores on items 1–17 as the outcome variable and patient/doctor global assessment scores as the independent variable.
  6. eLinear regression analysis; total number of “yes” scores on items 1–17 as the outcome variable and health care utilization as the independent variable.
1Persons responding “not overweight” on items 7 and 8 have lower BMI than persons responding “yes” or “no”bt = 15.2, P < 0.001 n = 147 vs. 199Yes
2Persons responding “no such problems” on items 9–11 report better function in daily living (HOOS, KOOS, AUSCAN) than persons responding “yes” or “no”bt = 3.8–6.5, P < 0.001 n = 50–110 vs. 51–105Yes
3Persons responding “no pain/discomfort” on item 12 report lower pain levels than persons responding “yes” or “no”bt = 7.2, P < 0.001 n = 42 vs. 307Yes
4Persons responding “yes” on item 13 more often report acetaminophen use than persons responding “no” or “no pain/discomfort”cP < 0.001 n = 33–120Yes
5Persons responding “yes” on item 14 more often report use of stronger pain killers than persons responding “no” or “no pain/discomfort”cP < 0.001 n = 6–266Yes
6Persons responding “not severely troubled” on item 17 report lower pain levels than persons responding “yes” or “no”bt = 6.2, P < 0.001 n = 156 vs. 193Yes
7Persons responding “not severely troubled” on item 17 report better function in daily living (HOOS, KOOS, AUSCAN) than persons responding “yes” or “no”bt = 3.1–5.2, P < 0.01 n = 46–72 vs. 65–101Yes
8Persons with low scores on patient global assessment of OA disease more often respond “no such problems/no pain/not severely troubled” on items 9–17 than persons with high scoresdt = 4.6–7.1, P < 0.001 n = 39–237 vs. 114–309 β = −0.4, P < 0.001Yes
9Persons with low scores on doctor global assessment of OA disease more often respond “no such problems/no pain/not severely troubled” on items 9–17 than persons with high scoresdt = 2.4–3.6, P < 0.05 (except item 10: t = 1.4, NS) n = 39–235 vs. 112–307 β = −0.2, P < 0.001Yes
10Persons with high health care utilization in the past year more often respond “yes” than persons with low health care utilization on all itemseβ = 0.3, P < 0.01Yes

QI pass rates

There were large variations in pass rates for separate QIs (mean ± SD 31% ± 13%) (Table 5). The QI about referral for weight reduction had by far the lowest pass rate (5%), whereas the highest pass rate (49%) was related to having received information about the importance of physical activity and exercise. For 8 of the 17 QIs, the pass rates were below 25%.

Table 5. Quality indicator (QI) pass rates (n = 351)*
OA-QI itemsMissing data, no.Not applicable/do not remember, no.Eligible persons, no.aQI pass ratesb
No. (%)95% CI
  1. OA-QI = OsteoArthritis Quality Indicator; 95% CI = 95% confidence interval; NSAIDs = nonsteroidal antiinflammatory drugs.
  2. aTotal study sample minus missing/not applicable/do not remember.
  3. bReporting “yes” among eligible persons.
1Disease development12832260 (19)15–23
2Treatment alternatives02732477 (24)19–29
3Self-management03431769 (22)18–27
4Lifestyle13431661 (19)15–24
5Physical activity220329161 (49)44–54
6Referral physical activity510336146 (43)38–49
7Weight reduction314820068 (34)28–41
8Referral weight reduction21492009 (5)2–8
9Functional assessment115020048 (24)19–30
10Walking aid assessment023711425 (22)15–3
11Other aids assessment022013124 (18)13–26
12Pain assessment242307110 (36)31–41
13Acetaminophen339309141 (46)40–51
14Stronger pain killers163287107 (37)32–43
15NSAIDs452295137 (46)41–52
16Cortisone26228791 (32)27–37
17Referral to orthopedic surgeon215619390 (47)40–54

The median score for summary QI pass rates was 27% (IQR 12–50%). There were large differences in pass rates for hand OA when compared to hip or knee OA (15% [IQR 6–43%, n = 39] versus 36% [IQR 17–62%, n = 62] and 33% [IQR 16–55%, n = 61], respectively). The median summary QI pass rates for core and adjunct treatments of the NICE guidelines were 20% (IQR 0–50%) and 31% (IQR 13–57%), respectively. The median summary QI pass rates for nonpharmacologic versus pharmacologic treatments were 20% (IQR 0–45%) and 33% (IQR 17–62%), respectively. There were very low levels of missing data for the individual QIs (Table 5).


The OA-QI questionnaire for patient self-reported quality of OA care was developed by an expert group through a process of a literature search, piloting with patient interviews, and expert panel assessments. The questionnaire is based on published QIs for OA care and was acceptable to an OA cohort showing negligible levels of missing data, evidence for validity, and moderate test–retest reliability. The QI pass rates in this study were low, suggesting suboptimal OA care.

The strengths of this study include the utilization of already published QIs that conform to recent standards for OA care ([2, 7, 8]). Therefore, the instrument reflects evidence-based recommendations promoting effective interventions. Second, instrument development involved patient research partners and expert panels, which ensured the content validity and feasibility of the instrument. Third, there were very low levels of missing data for the OA-QI questionnaire, meaning that most persons could be included in the analyses. Finally, by using patient report, the patients' own perception of treatment, information, and advice is measured.

However, self-reports might represent a weakness because there is a potential for recall bias, which might have led to an over- or underestimation of QI pass rates in this study. QI measurements based on medical records are not necessarily unbiased either, since general practitioners may interpret the content of the consultation differently to their patients; their coding may be restricted to what they see as the major problems and to aspects that were consequential in terms of action ([36]). Another limitation in this study is the generalizability of the results. It is not known whether the self-reporting of an OA diagnosis might have led to over- or underreporting of the diagnosis. Still, almost all of the included persons fulfilled the OA classification criterion for at least one site. On the other hand, this study sample might differ from other OA samples because they were recruited through a population-based study rather than through primary or secondary care. This might imply that persons with minor symptoms were also included in this study. Additionally, the 58% participation rate may mean that the study sample might be subject to selection bias. A potential limitation related to the developmental process of the OA-QI is that we did not account for the small differences in the developmental processes of the various QI sets, but aimed to consider and critically judge all published QIs in relation to responder comprehensibility.

The lack of a gold standard for assessing the validity of this type of questionnaire might be seen as a further limitation. The involvement of the 2 patient research partners was designed to ensure content validity. However, the questions were derived from existing QIs that were not designed for patient completion and may be unclear (e.g., QIs 3, 4, and 9–11) or have some overlap (e.g., QIs 4, 5, and 7). The involvement of a larger group of patient research partners from the stage of QI identification might have improved content and comprehensibility. The results of construct validity testing were excellent, with all a priori hypotheses, agreed upon at a meeting of the authors, being met. However, testing was constrained by the limited number of relevant variables that were available. Future studies should seek to assess the construct validity of the OA-QI through comparisons with other important variables, including aspects of treatment and information received by patients. Some of the items may be further validated through comparison with medical records, and this form of validity testing is recommended in future applications of the OA-QI.

There was considerable variation in the test–retest reliability estimates for the individual QIs. Some QIs, including QIs 4 and 10, had very low estimates, which limits their appropriateness for application as QIs. The general nature of QI 4 may not have contributed to reliability. Furthermore, it might have been more difficult for respondents to judge, or remember, whether they had received information about self-management compared to the more specific information about pharmacologic treatment. In addition, the test was completed before the clinical assessment in the MUST OA study and the retest 2 weeks following the assessment. Although we sought to correct for this through the question about whether they had received information, advice, or treatment in the interim, the examination might have influenced the responses. The comparable test–retest reliability and validity results for the subsample with only lower secondary school demonstrate that the questionnaire may also be applied to populations with low education levels. The test–retest reliability should be further assessed in future applications of the OA-QI together with possible methods to reduce potential vagueness and recall bias, including increased use of timeframes.

The QI summary pass rates in our study were considerably lower for persons with hand OA as opposed to hip or knee OA, which suggests that OA care may be dependent on the site of OA. If this study sample included persons that were recently diagnosed, it would have been interesting to compare pass rates between those with early and those with more advanced OA. The mean QI pass rate in this study was equivalent to ([14, 17, 37]) or lower than ([12, 13, 18, 38]) pass rates in other study samples, with the exception of the low rates found by Li et al ([15]), who only included nonpharmacologic treatments. Compared to other studies, the pass rates of specific QIs in this study were to some extent lower for patient education and functional assessment QIs, but higher for physical activity information and for information about effects and possible side effects of NSAIDs. However, comparing reported QI pass rates should be done with caution because the study samples, settings, and methods differ.

In line with previous studies ([14, 37]), the QI summary pass rate for the core treatments in the NICE guidelines was lower than for adjunct treatments in this study. Correspondingly, the summary pass rate for nonpharmacologic treatments was lower than for pharmacologic treatments. This finding is in contrast to the recently revised ACR recommendations for OA care ([8]), in which nonpharmacologic treatment is strongly recommended and pharmacologic treatment is only conditionally recommended.

Comorbidity is common among persons with OA and may affect the QI pass rates through confusion about which information and treatment relate to which disease. Further, patient comorbidity represents a challenge for the general practitioner in relation to guideline adherence, since some recommendations for other diseases may lead to conflicting advice ([39]). In a systematic review of primary care studies using the ACOVE QIs for different conditions, OA had the lowest pass rates ([40]). Although there is concordance between the OA care that patients would like to have and that which health professionals think they should have ([41, 42]), there is good evidence that clinical practice is not consistent with evidence-based recommendations. Reasons and solutions for this gap have been discussed ([39]), but more research is needed to identify interventions that clearly improve quality of care ([43]).

The OA-QI was developed to capture a new perspective on pass rates of QIs for OA: the care recipients' view. OA care may be organized differently from country to country; however, the instrument is not restricted to a specific care setting or user. The instrument has relevance to health care researchers and stakeholders, including patient organizations and health authorities. Possibly, the OA-QI could also be used by health care providers to capture individual patients' perceptions or needs before the consultation. The short format of the OA-QI questionnaire lends its feasibility for inclusion in surveys as part of OA care intervention trials, monitoring the quality of care, or for screening of individual patients' needs. More advanced statistical analyses are planned to see if the instrument can be further refined.

In conclusion, the OA-QI questionnaire has minimal respondent burden. The application showed acceptable validity and moderate test–retest reliability. The low patient self-reported QI pass rates in this study suggest suboptimal quality of OA care, especially for core treatments according to the NICE guidelines, as well as for nonpharmacologic care in general.


All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Østerås had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Østerås, Garratt, Grotle, Natvig, Kjeken, Hagen.

Acquisition of data. Østerås, Grotle, Hagen.

Analysis and interpretation of data. Østerås, Garratt, Grotle, Natvig, Kjeken, Kvien, Hagen.


The authors would like to thank the participants of the MUST OA study for their time and effort. Assistance was gratefully received from the librarian Hilde Iren Flaatten and the patient research partners, Øyvor Andreassen and Gerd-Jenny Aanerud.