Computer-based methods to measure radiographic joint space width (JSW) have the potential to improve the longitudinal assessment of rheumatoid arthritis (RA). The purpose of this report was to measure the long-term patient repositioning reproducibility of software-measured radiographic JSW.
Patients underwent baseline and followup hand radiography examinations with a followup time of ≤3 years. To eliminate any JSW change due to real disease progression, the evaluation was performed on “unaffected” joints, defined as having JSW and erosion Sharp scores of 0 at both baseline and followup. The root mean square SD (RMSSD) and coefficient of variation (CV) were used as the reproducibility metrics.
The RMSSD was 0.14 mm (CV 10.5%) for all joints, 0.18 mm (CV 10.9%) for the metacarpophalangeal (MCP) joints, and 0.08 mm (CV 8.3%) for the proximal interphalangeal (PIP) joints. The distribution of JSW change was asymmetric, suggesting that narrowing due to RA progression occurred for several joints. A second analysis was performed, excluding joints where the loss of JSW was greater than 3 SDs. For this analysis, the RMSSD was 0.10 mm (CV 7.5%) for all joints, 0.12 mm (CV 7.3%) for the MCP joints, and 0.07 mm (CV 7.1%) for the PIP joints.
Repositioning reproducibility is very good but is likely to be a dominating factor compared to reader and software reproducibility. Additionally, further evidence is given that a software method is able to detect changes in some joints for which the Sharp score is insensitive.
Radiography is used routinely to monitor progression in common and potentially disabling diseases such as rheumatoid arthritis (RA) and osteoarthritis (1, 2). Radiographic change is considered the “gold standard” to assess disease progression in RA and is a common outcome measure for clinical trials (3).
There are two main structural changes from RA visible on conventional radiographs: 1) increase in erosion size and number and 2) loss of joint space width (JSW). Erosions, i.e., cavities created in the bone near the joint, are seen as radiolucent or dark regions or discontinuities in the bone margins. JSW is an indirect measure of cartilage loss, and can be appreciated on radiographs by a decrease in the distance between the projected margins of a joint.
Research requires reproducible and quantitative surrogate outcome measures; however, radiographic assessment using traditional scoring methods such as the Sharp (4) and Larson and Thoen (5) systems is subjective and based on a qualitative assessment of the joints. The available scoring methods do not attempt to provide a true measure of the size of the radiographic structures; rather, a score is given on an ordinal scale that is based on a comparison to representative examples.
Image analysis software can be used to provide quantification of these structural changes on a continuous scale and has been shown to be more responsive to change than semiquantitative scoring (6). Computerized methods also provide automated archiving of scores and integrate directly with digital imaging modalities. On the other hand, software is usually not 100% reliable and a quality assurance step is generally needed to ensure that the structures of interest are accurately identified. The need for a quality assurance step necessarily implies a degree of measurement error associated with the reader and correction software.
Several computer-based methods to measure radiographic JSW have been developed by different laboratories (7–11). These methods have the potential to improve the longitudinal assessment of RA by providing an objective and continuous outcome measure with enhanced reliability and sensitivity to change. This report provides a new validation study of a previously developed semiautomated software application (8) to measure radiographic JSW of the proximal interphalangeal (PIP) and metacarpophalangeal (MCP) joints on digitized hand radiographs.
Longitudinal change in radiographic JSW can be caused by 3 factors: actual disease progression and 2 different sources of measurement error. The measurement error can be due to either reader or software variability, or to what is generally referred to as patient repositioning between radiographic acquisitions. The repositioning error captures the effect of joint positioning, including rotation, flexion, extension, abduction, or adduction of each joint. The radiographic technique, the technological equipment, the beam geometry, and varying operators (different technologists) also influence the repositioning error, and our analysis examines the effect of all of them.
Using our software tool, we have previously reported the reader reproducibility (12) and the sensitivity to longitudinal change (6) of our technique. The goal of the current study was to quantify the second source of error, the change in JSW due to patient repositioning. We are measuring the repositioning error for individual joints in a random cohort of patients with RA who were not part of a clinical trial and had no standardized positioning procedure, and the repeat radiographs were acquired at variable intervals but not on the same day. This will lead to a better understanding of the different sources of measurement error and, through a comparison to JSW change due to RA, can help inform power calculations for clinical studies.
SUBJECTS AND METHODS
The subjects were selected from a set of 129 patients with RA from the National Data Bank for Rheumatic Diseases with baseline and followup bilateral hand radiography examinations. The radiographs were digitized with a pixel spacing of 0.1 mm and a 12-bit gray scale using a Lumisys L75 laser film digitizer. The characteristics of the patients and images are described in more detail in separate publications (6, 13).
JSW was measured with the semiautomated software method (14, 15). We briefly summarize the technique below. The software first automatically determines the locations of the PIP and MCP joints on each hand radiographic image. The computer then creates a cropped image of each joint for subsequent software joint delineation (Figure 1). Two anatomic landmarks are automatically placed tangent to the bone margins of the distal portion of the joint to define a measurement region and length (L). The joint margins are delineated in a region that is centered in the joint and covers a distance of 0.3 × L; JSW is calculated as the average distance between the distal and proximal margins in this region. In practice, the computerized joint delineation is not 100% perfect; therefore, a graphical user interface software was created to check and correct errors. An extensive set of image processing features was implemented to add objectivity and improve reader precision.
The average JSW for the MCP and PIP joints on digits 2–5 was measured using the automated method by 4 readers (GN, Pd, AF, JD) with extensive experience using the software tool. The images had been previously evaluated using the Sharp scoring system on hard copy films viewing both visits simultaneously but blinded to acquisition of the time point order.
Using the software technique, the images were assessed with the readers fully blinded to the time point and patient identification. To eliminate JSW change due to real disease progression, we examined only “unaffected” joints, defined as having a JSW and erosion Sharp score of 0 at both baseline and followup. Additionally, we selected only patients with a time difference between the two radiographs of less than or equal to 3 years. These selection criteria allowed inclusion of a total of 437 (222 MCP and 215 PIP) joints from 37 subjects for this study. Table 1 describes the patient characteristics.
Table 1. Description of the patient characteristics
Total no. of subjects included
Mean followup time between first and second radiograph, years
Mean disease duration at first radiographic assessment, years
Mean age at baseline, years
Ethnic origin, no. (%)
The long-term reproducibility was quantified for each joint using the coefficient of variation (CV) and the root mean square SD (RMSSD), defined as:
where JSWbl = the baseline JSW value, JSWfu = the followup JSW value, and N = the total number of measurements. The CV is defined as the ratio of the RMSSD to the average JSW.
Table 2 provides the RMSSD and CV results for all of the joints, and Figure 2 is a graph of the change in JSW versus the time difference between baseline and followup examinations. Figure 3 shows histograms of the change in JSW distribution for followup times less than 20 months and greater than or equal to 20 months.
RMSSD = root mean square SD; CV = coefficient of variation; JSW = joint space width; MCP = metacarpophalangeal; PIP = proximal interphalangeal.
The asymmetric distribution of change in JSW in Figures 2 and 3 offers evidence for narrowing due to progressing RA for some joints with a Sharp score of 0 at both visits. In an attempt to account for this effect, we performed a second analysis based on the assumption that any joint with a JSW loss greater than 3 SDs in magnitude cannot be considered “unaffected.” Table 3 summarizes the results after such joints were excluded from the analysis. In Table 4, we repeat the identical analysis using a single measure of JSW from each subject, which was the average of all of the joints that satisfy the Sharp score criterion. Averaging JSW over all of the joints for a single patient appeared to increase the effect of the true progressors on the reproducibility measurement.
Table 3. RMSSD and CV results for the second analysis*
Average baseline JSW, mm
Average JSW change, mm
Joints with change in JSW less than −3 × SD were excluded. See Table 2 for abbreviations.
Table 4. RMSSD and CV results for the average JSW calculated individually for each patient*
In Table 2, the reproducibility, measured as the RMSSD, was higher for the MCP joints (0.18 mm) than the PIP joints (0.08 mm), although the difference was less pronounced using the CV as a metric (10.9% versus 8.3%). For the second analysis (Table 3), the difference between the MCP and PIP joint reproducibility is less substantial (0.12 mm versus 0.07 mm).
Based on the RMSSD, the MCP joint reproducibility is approximately double that for the PIP joint, while the results are similar for the CV. We can compare the results to a study that used a subset of the same radiographs and measured the reader method reproducibility using duplicate readings of the same film (12). The long-term reproducibility is substantially worse than the reader reproducibility for which an intrareader reproducibility of 0.04 mm and an interreader reproducibility of 0.03 mm were measured. For this study, 4 different readers read the same set of digitized hand radiographs. We can compare to the results from a cross-sectional study of very early RA patients (less than 1 year since disease onset) that used the same software (16). This study reported an average MCP JSW of 2.00 mm (men) and 1.63 mm (women) and an average PIP JSW of 1.64 mm (men) and 1.31 mm (women). For a comparison to longitudinal change, in the study by Finckh et al (6), the median decrease in JSW over 4 years was observed to be 0.16 mm. These data taken together suggest that the repositioning reproducibility is likely to be a dominant error with our digital JSW assessment method, unless steps are taken to improve this source of uncertainty.
Reduction of the repositioning error might be achieved by standardizing positioning procedures. In a clinical trial, emphasis could be placed on training the radiography technologists and maintaining consistent standards throughout the study. For knee radiography, the use of a positioning frame has helped to standardize the joint orientation over multiple visits (17); a similar approach might be used for hand radiographs.
This study also provides further evidence that a software method is able to detect changes in some joints for which the Sharp score is insensitive. There was joint narrowing apparent for some joints, even when the Sharp score was 0 at baseline and followup, implying that software measures can detect more subtle changes than is possible with qualitative scoring. Figures 2 and 3 also suggest that these undetected progressors are less common for the PIP joint and for a shorter followup time.
There are several limitations to our study. The main drawback of our analysis is that we measured the reproducibility using subjects with known disease progression and defined “unaffected” joints as having a Sharp score of 0 at both baseline and followup. As was evident on the plots, this assumption was not 100% correct. Our data were collected in a rheumatologic clinic and the radiographs were acquired without a standardized approach, as might be found in a formal clinical trial. Eliminating these effects would likely improve the reproducibility; therefore, our results actually may reflect an upper limit. Our joint-by-joint analysis also means that a variable number of individual joints from each patient are analyzed.
In conclusion, the long-term hand repositioning error is relatively low but is likely to be the dominating measurement error compared to the reader software reproducibility. The study also provides further evidence to support digital assessment of JSW since the computerized method can detect JSW progression in joints with undetectable change using conventional scoring methods.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Duryea had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Neumann, dePablo, Wolfe, Duryea.
Acquisition of data. dePablo, Finckh, Wolfe.
Analysis and interpretation of data. Neumann, dePablo, Finckh, Chibnik, Duryea.