Drs. Gensburger, Ravaud, and Chapurlat have received consultancy fees, speaking fees, and/or honoraria (less than $10,000 each) from Servier Laboratories.
Influence of blinding sequence of radiographs on the reproducibility and sensitivity to change of joint space width measurement in knee osteoarthritis
Version of Record online: 27 JUL 2010
Copyright © 2010 by the American College of Rheumatology
Arthritis Care & Research
Volume 62, Issue 12, pages 1699–1705, December 2010
How to Cite
Gensburger, D., Roux, J.-P., Arlot, M., Sornay-Rendu, E., Ravaud, P. and Chapurlat, R. (2010), Influence of blinding sequence of radiographs on the reproducibility and sensitivity to change of joint space width measurement in knee osteoarthritis. Arthritis Care Res, 62: 1699–1705. doi: 10.1002/acr.20311
- Issue online: 30 NOV 2010
- Version of Record online: 27 JUL 2010
- Accepted manuscript online: 27 JUL 2010 12:00AM EST
- Manuscript Accepted: 19 JUL 2010
- Manuscript Received: 1 FEB 2010
To investigate whether knowledge of the sequence of radiographs impacts inter- and intraobserver reproducibility and sensitivity to change for measuring joint space width (JSW) in patients with knee osteoarthritis (OA).
A cohort of 70 postmenopausal women with radiologic knee OA was assessed through the measurement of knee radiographs acquired in the semiflexed posteroanterior view, using a positioning frame and fluoroscopy, at baseline and 48 months later. Paired readings of radiographs were made using landmarks at baseline by 2 independent observers unblinded to sequence and blinded to sequence. Intra- and interobserver reproducibility was assessed on JSW measurements at baseline and 4 years later and on the longitudinal difference (joint space narrowing [JSN]), using intraclass correlation coefficient (ICC) and Bland and Altman methods. The sensitivity to change was assessed through standardized response means (SRMs).
For JSW and JSN and with both methods, ICCs were high for the intra- and interobserver reproducibility (0.90–0.99 for JSW and 0.77–0.89 for JSN). For the Bland-Altman method, the mean difference was close to 0, with no bias for both observers and methods. The SRMs ranged from 0.38 to 0.48. All of the results were numerically in favor of measuring with knowledge of time sequence, but without a statistically significant difference between the methods.
Intra- and interobserver reproducibility was high with or without blinding of the radiograph sequence. Reading with knowledge of time sequence using baseline landmarks tended to improve sensitivity. Therefore, in longitudinal studies of OA radiographs can be read unblinded to sequence.
Knee osteoarthritis (OA) is characterized by structural damage of articular cartilage, inducing a reduction in joint space width (JSW) (1). JSW measurements are useful for monitoring natural disease progression and the effects of treatments of OA over time on conventional knee radiographs. Guidelines from both the Committee on Medicinal Products for Human Use and the Food and Drug Administration (2, 3) currently recommend joint space narrowing (JSN) measurement as the gold standard imaging end point for assessing disease-modifying drugs in clinical trials. Knee OA progression, assessed by JSN on serial radiographs, is usually reported to be approximately 0.1–0.2 mm/year (4–6). Over the past 15 years, various semiflexed standardized radiographic protocols have been developed (7–11) to improve the accuracy of JSN assessment and sensitivity to change over time in clinical trials. Moreover, automated or semiautomated JSW measurements using digitized image analysis software (12, 13) are considered to be more sensitive and reproducible than manual methods using a ruler or a magnifying lens (14, 15). Reading radiographs by pair increases the possibility of detecting structural damage in comparison with a single reading (16, 17).
In most recent OA clinical trials, the assessment of cartilage loss has been performed with paired radiographs by readers blinded to time sequence (6, 18–20). One study assessing doxycycline was carried out with the time sequence known (21). Currently, there is no clear consensus and no absolute requirement from US or European regulatory recommendations in guidance documents for clinical trials about whether radiographs should be evaluated in known chronological order or with the observer blinded to sequence. The first method is close to the clinical practice and may be more appropriate for assessing OA progression, but in clinical trials may be criticized because the observer is expecting a cartilage loss. Nevertheless, if this bias existed, it should be equal across groups because in clinical trials the observers are blinded to treatment assignment. Recently, on the basis of studies in rheumatoid arthritis (RA), Felson and Nevitt suggested that serial radiographs of patients with OA could be read in known chronological order to increase sensitivity (22).
To our knowledge, no prior study has compared the reproducibility of both measurement methods (blinded and unblinded to sequence) in knee OA. The aim of our study was to investigate whether or not the knowledge of time sequence of knee radiographs impacts inter- and intraobserver reproducibility and sensitivity to change for measuring JSW and JSN in patients with medial knee OA.
MATERIALS AND METHODS
Participants belonged to the Os des Femmes de Lyon (OFELY) cohort (13, 23), a longitudinal prospective study on the determinants of bone loss (1,039 healthy volunteer women ages 31–89 years recruited between February 1992 and December 1993). From this cohort, 606 women had knee radiographs at the eighth annual followup, named for this analysis “baseline,” and then 4 years later. The OFELY cohort study was approved by the local ethics committee and each participant signed a written informed consent before participation.
The present study was restricted to the 81 subjects with medial radiographic OA. Subjects with lateral radiographic OA were excluded. Radiographic OA was defined using a simplified scoring system based on the Altman atlas (13, 24) that assessed each tibiofemoral compartment and each side of the knee for JSN and osteophytes independently on a 4-point scale (range 0–3), and assessed sclerosis on a 2-point scale (range 0–1). Medial radiographic OA was defined as a total score of ≥2 in the medial compartment (e.g., JSN score of 1 and osteophyte score of 1 in the medial compartment).
Among these 81 subjects with medial OA, 11 were excluded (6 because JSW was very close to zero at baseline and nonmeasurable, 3 because bone edges were not clear enough, and 2 for different positioning between baseline and followup). Finally, 70 participants were included in the present study.
The following data were recorded: age, weight, height, and body mass index (BMI), calculated as weight (kg)/height (m2). Knee pain, function, and stiffness were assessed using the Western Ontario and McMaster Universities (WOMAC) OA index (25) with a maximum theoretical total score of 96 (20 for pain, 8 for stiffness, and 68 for function).
Knee posteroanterior radiographs were taken in semiflexion with a standardized fluoroscopically assisted protocol using the SynaFlex x-ray positioning frame (Synarc) (13, 26). The x-ray beam was centered on the femorotibial joint and tilted by an angle of 10 degrees. Medial tibial plateau alignment was controlled by fluoroscopy. To obtain satisfactory medial tibial plateau alignment, the beam angle was adjusted similarly to Lyon schuss view (12). At baseline and 4 years later, radiographs were performed in the same unit with the same equipment and the same radiograph technicians, who were trained using a written procedure and with a reminder training just before the 4-year followup. At the followup visit, the technicians visualized the initial radiographs to improve reproducibility.
Measurement of JSW.
On each film, a graduated ruler was placed to allow calibration. Then the radiographs were digitized with a 190-μm pixel size, according to a standardized procedure, and anonymized. For measurement of JSW, a software-dedicated procedure was used (Morpho-Expert; Explora Nova) (13) as detailed in Figure 1. This software was part of a recording and reading computer system (Application de Lecture Centralisée des Radios de Genoux; Clin Data Management), allowing measurement-paired radiographs. Lines generated during measurement corresponded to landmarks and were recorded for the first radiograph. When measuring the second, the first radiograph appeared with its landmarks.
Radiographs were measured by 2 experienced observers: one confirmed rheumatologist (DG) and one engineer (J-PR) who was intensively trained in knee radiographs measurement. One week before starting the study, both observers performed a common training session on 30 knee radiographs (apart from the study) for the harmonization of each reading step.
Course of the reproducibility study.
Radiographs blinded for patient identity were read by pairs, visualizing baseline landmarks for followup measurement using 2 different methods: unblinded to sequence and blinded to sequence. Both observers read each radiograph twice for each method. The readings were conducted on a regular basis, with one 2-hour reading each day over a period of 4 weeks.
With 70 paired radiographs, power calculation performed before the beginning of the study allowed us to reject an intraclass correlation coefficient (ICC) of 0.7 with a Type I error of 5%, a power of 80%, and an expected ICC of 0.8.
For each method, each measurement, and each observer, JSW (baseline and 4 years later) and JSN (difference between baseline JSW and 4-year followup JSW) were described by the mean and the SD. The same descriptive statistics were also performed on JSW and JSN for the mean of the 2 observers.
For each method, the intraobserver reproducibility (for each observer and for the mean of the 2 observers) and the interobserver reproducibility (for each measurement) were assessed using the ICC (27). For each ICC, the 95% confidence interval (95% CI) was estimated using the bias-corrected accelerated bootstrap method (28). From the study sample of 70 radiographs, 1,000 bootstrap samples were generated to provide an empirical distribution of the parameter of interest. The 2.5% and 97.5% percentiles of this distribution permitted us to estimate the lower and upper limits of the 95% CI. The same approach was used in order to provide a 95% CI for the ICC difference between methods.
Moreover, we assessed the concordance of repeated measures for each observer and between observers using a Bland and Altman method (29). We calculated the bias defined by the mean differences between repeated measures. This mean should be close to 0 if there are no systematic differences between the measurements by a single observer or between the measurements by 2 different observers. We defined limits of agreements as ±1.96 SDs of the differences. Limits of agreement should comprise 95% of the differences.
A nonparametric approach based on a Wilcoxon's signed rank test was also used to compare the intraobserver (measurement 1–2) and interobserver (observer 1–2) differences between methods. Because of the number of comparisons performed in our study (12), we used Bonferroni correction for the degree of significance (P = 0.05/12 [0.004]).
The measurement error was derived from several sources, including error due to the radiographic acquisition, the measurement technique, and the reading method. The SDD was calculated according to the Bland and Altman approach, i.e., 2 × the SD of the difference of the 2 repeated measurements of each observer for the 2 methods.
In order to assess the sensitivity of each method to the disease progression, the standardized response mean (SRM; the change in JSW between 4 years and baseline divided by its SD) was calculated. The 95% CI of the SRM was provided using the bias-corrected accelerated bootstrap method (28). In addition, the SRMs of each reading procedure were compared in order to assess the impact of a better sensitivity to change on the sample size of clinical trials in knee OA (32). The square of the ratio of the SRMs of each reading was equal to the ratio of the estimated sample sizes (16, 17). All statistical analyses were performed using SAS software, version 9.1.3 (SAS Institute).
The baseline characteristics of the 70 included women were within the range usually found in OA clinical studies, with a mean ± SD age of 69.8 ± 7.2 years, a mean ± SD weight of 67.1 ± 14.7 kg, a mean ± SD height of 158 ± 5.4 cm, and a mean ± SD BMI of 26.3 ± 3.4 kg/m2. These women experienced mild knee OA symptoms, as shown by a mean ± SD total WOMAC score of 13.3 ± 14.3. JSW measurements, JSN, and SRMs are summarized in Table 1.
|Observer||Unblinded method||Blinded method|
|Measurement 1||Measurement 2||Measurement 1||Measurement 2|
|1||3.96 ± 0.94||3.97 ± 0.93||4.01 ± 0.97||3.97 ± 0.91|
|2||3.77 ± 0.95||3.73 ± 0.96||3.70 ± 0.96||3.74 ± 0.90|
|1||3.71 ± 1.11||3.68 ± 1.10||3.78 ± 1.12||3.72 ± 1.11|
|2||3.48 ± 1.06||3.45 ± 1.05||3.41 ± 1.03||3.49 ± 1.11|
|1||−0.25 ± 0.58||−0.29 ± 0.60||−0.23 ± 0.61||−0.25 ± 0.60|
|2||−0.29 ± 0.62||−0.28 ± 0.61||−0.29 ± 0.61||−0.26 ± 0.63|
|SRM (95% CI)‡|
|1||0.43 (0.19–0.66)||0.48 (0.25–0.69)||0.38 (0.18–0.58)||0.41 (0.15–0.62)|
|2||0.46 (0.25–0.64)||0.46 (0.25–0.65)||0.48 (0.22–0.66)||0.41 (0.17–0.65)|
ICCs per observer and method are provided in Table 2. We found high intra- and interobserver reproducibility of JSW (ICCs ranging from 0.90 to 0.97) and of JSN (ICCs ranging from 0.77 to 0.84). ICC values were slightly higher with the unblinded to time sequence method than with the blinded method. The comparison between the methods using a bootstrap analysis demonstrated no statistically significant differences for ICCs.
|Unblinded JSW||Blinded JSW||Unblinded JSN†||Blinded JSN†|
|Observer 1||0.97 (0.95–0.98)||0.97 (0.95–0.98)||0.95 (0.93–0.97)||0.97 (0.96–0.98)||0.82 (0.70–0.90)||0.81 (0.70–0.89)|
|Observer 2||0.96 (0.94–0.98)||0.97 (0.95–0.98)||0.94 (0.87–0.96)||0.97 (0.95–0.98)||0.84 (0.72–0.92)||0.77 (0.61–0.88)|
|Measurement 1||0.93 (0.90–0.96)||0.93 (0.89–0.96)||0.90 (0.86–0.94)||0.92 (0.88–0.94)||0.79 (0.67–0.88)||0.78 (0.60–0.87)|
|Measurement 2||0.93 (0.89–0.96)||0.95 (0.93–0.96)||0.92 (0.88–0.95)||0.94 (0.91–0.96)||0.84 (0.75–0.91)||0.81 (0.68–0.89)|
Bland and Altman plot results.
The Bland and Altman plotting method results for intra- and interobserver reproducibility of JSN measurements are summarized in Table 3. The mean differences were very low, close to zero. Therefore, there were no systematic differences and no bias between the 2 measurements and 2 observers. In addition, the limits of agreement were also narrow for the 2 methods, which is equivalent to a high reproducibility for the 2 methods. For the intraobserver reproducibility, the mean ± SD difference was −0.01 ± 0.35 mm for the unblinded to time sequence method, and −0.03 ± 0.42 mm for the blinded to time sequence method. The limits of agreements were −0.69 to +0.67 for the unblinded to time sequence method and −0.85 to +0.80 for the blinded to time sequence method. The comparison of the mean intraobserver (measurement 1–2) and interobserver (observer 1–2) differences between the methods were not statistically significant for JSN.
|Unblinded method||Blinded method|
|Mean difference, mean ± SD mm||Limits of agreements||SDD, mm†||Mean difference, mean ± SD mm||Limits of agreements||SDD, mm†|
|Observer 1||0.04 ± 0.35||−0.65 to 0.73||0.70||0.01 ± 0.37||−0.71 to 0.74||0.74|
|Observer 2||−0.01 ± 0.35||−0.69 to 0.67||0.70||−0.03 ± 0.42||−0.85 to 0.80||0.84|
|Measurement 1||0.04 ± 0.39||−0.72 to 0.80||n/a||0.06 ± 0.40||−0.73 to 0.85||n/a|
|Measurement 2||−0.01 ± 0.34||−0.68 to 0.66||n/a||0.01 ± 0.38||−0.74 to 0.77||n/a|
SDDs are provided in Table 3. They ranged from 0.70 mm for the unblinded to time sequence method to 0.84 mm for the blinded to time sequence method. The mean values of the 2 observers were 0.55 mm for each method. A lower SDD could allow the detection of more relevant cartilage loss over time.
Sensitivity to change.
The sensitivity to change of JSN values for each observer, each measurement, and each method is reported in Table 1, expressed as the SRM. The higher the change compared with its variability, the higher the SRM. A high SRM indicates superior sensitivity to detect knee OA progression. The SRMs of the mean between the 2 observers were 0.47 and 0.45 (measurement 1) and 0.49 and 0.43 (measurement 2) for the unblinded to time sequence method and the blinded to time sequence method, respectively. The ratio between estimated sample sizes when comparing the blinded versus unblinded method equaled 1.09 for the mean between the 2 observers on measurement 1 and 1.30 on measurement 2. Therefore, between 9% and 30% more patients would be needed when using a blinded method than when using an unblinded method. The bootstrap distributions for SRM differences between the methods did not differ significantly.
To our knowledge, this is the first study to investigate whether measurement of knee radiographs with known chronological order impacts reproducibility and sensitivity to change for JSW and JSN assessment using a standardized semiflexed radiographic protocol and a semiautomated digitized measurement.
In this longitudinal study performed on 70 paired radiographs of medial knee OA, we showed that both methods (blinded and unblinded to time sequence) were highly reproducible, with a slight trend in favor of knowledge of time sequence. The sensitivity to change of both methods was also assessed through the SRM, with values similar to those (0.38–0.48) in previously published data (33–35), and the same numerical trend was observed in favor of the unblinded method. All of the results were confirmed using the mean of the measurements from both observers. These results were in accordance with previous recommendations that the combination of 2 experienced observers is superior to the assessment of a single one (36).
Our results are consistent with those from the study by Botha-Scheepers et al (37), performed in 20 patients with OA at multiple sites, including the knee, with qualitative assessment of JSN and osteophytes (graded 0–3) but without any quantitative measurement. Radiographs were read in pairs using 2 methods: blinded time sequence and unblinded sequence. The reading procedures were compared with regard to their sensitivity to change. The SRM estimated for the JSN and osteophyte progression scores in the hands, hips, knees, and spine were higher for the unblinded to sequence method than for the blinded method. Therefore, measuring in chronological order tended to be more sensitive to change. In a study with a similar design, Auleley et al (16) compared several radiographic reading procedures in 104 patients and evaluated their impact on required sample sizes in longitudinal hip OA studies. They reported that knowledge of time sequence could be particularly useful in the case of an important progression of hip OA. In addition, these data have largely been shown in RA and ankylosing spondylitis (38–41). In clinical trials for osteoporosis, the assessment of vertebral fractures on serial radiographs should be performed only with knowledge of time sequence (42).
Our study had several strengths. First, our data were based on a large number of patients (n = 70) in accordance with the sample size studied in a similar study in knee OA by Botha-Scheepers et al (37). Second, radiographs were performed using a standardized, fluoroscopically assisted protocol in a single center by the same technicians, and JSW measurements were performed with a semiautomated validated method (13). The interobserver reproducibility was high and close to the intraobserver reproducibility. This might be explained by the intensive joint training undertaken by the observers. In addition, the current study was carried out following a precise and rigorous protocol (using the same reading sessions for both observers in a limited time).
This study also had limitations. Subjects were selected from a cohort of volunteer women with knee OA that was only radiographically defined and without clinical features as shown by WOMAC scores of ∼13. This population is also probably different and less affected than the populations from clinical trials in OA. This may explain the slow progression of JSN in the current study. In our study, the 4-year cartilage loss varied from 0.25 to 0.28 mm (0.06–0.07 mm per year), which is the lowest range of JSN in comparison with recently published data (5). Moreover, these results may not be generalized to other populations such as men or subjects with lateral OA, and possibly to multicenter studies using several imaging centers. Another limitation is the lack of a gold standard in JSN measurement. Therefore, despite our efforts to achieve the best precision in evaluating JSN, we could not compare the methods' accuracy and define the most accurate method for JSN measurement. The comparison was possible only through the reproducibility of the observers and assessment of the sensitivity to change (SRM). Nevertheless, the progression of cartilage degradation (JSN) is very close for both measurement methods (blinded and unblinded to time sequence), and the JSN did not significantly differ between those 2 methods.
Reading with knowledge of time sequence is consistent with usual clinical practice, in which the observer always takes the baseline radiograph as a reference and assesses the evolution between the images. This is also in agreement with Buckland-Wright and colleagues, who suggested that knowledge of time sequence is particularly recommended in randomized clinical trials assessing a treatment effect on cartilage degradation and that blinded reading should be preferable when determining the rate of incidence and progression of the disease (43). More recently, Felson and Nevitt concluded, based on several studies in RA, that knowledge of time sequence could be helpful in estimating whether a treatment affects the imaging outcomes and that serial images from trials and longitudinal studies of patients with OA should be read with knowledge of time sequence (22).
In conclusion, intra- and interobserver reproducibility was high with or without blinding of the radiograph sequence. Knowledge of time sequence tended to improve the sensitivity of detection of cartilage loss over time, without any bias. These results imply that, in OA randomized clinical trials, knee radiographs can be read unblinded to time sequence.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Gensburger had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Gensburger, Roux, Arlot, Ravaud, Chapurlat.
Acquisition of data. Gensburger, Roux, Arlot, Sornay-Rendu.
Analysis and interpretation of data. Gensburger, Roux, Arlot, Ravaud, Chapurlat.
- 2Committee for Medicinal Products for Human Use Draft Guideline. Guideline on clinical investigation of medicinal products used in the treatment of osteoarthritis. London: European Medicines Agency; 2009.
- 3FDA Draft Guidance. Clinical development programs for drugs, devices and biological products intended for the treatment of OA. Washington (DC): Federal Register; 1999. p. 64–135.
- 6The effect of glucosamine and/or chondroitin sulfate on the progression of knee osteoarthritis: a report from the Glucosamine/Chondroitin Arthritis Intervention Trial. Arthritis Rheum 2008; 58: 3183–91., , , , , , et al.
- 20Long-term effects of chondroitins 4 and 6 sulfate on knee osteoarthritis: the Study on Osteoarthritis Progression Prevention, a two-year, randomized, double-blind, placebo-controlled trial. Arthritis Rheum 2009; 60: 524–33., , , , .
- 28An introduction to the bootstrap. New York: Chapman and Hall; 1993., .
- 34Selection of knee radiographs for trials of structure-modifying drugs in patients with knee osteoarthritis: a prospective, longitudinal study of Lyon schuss knee radiographs with the definition of adequate alignment of the medial tibial plateau. Arthritis Rheum 2005; 52: 1411–7., , , , , , et al.