Computerized measurement of changes in joint space width (JSW) on serial radiographs of the knee in the semiflexed, anteroposterior (SF-AP) view has been used recently as a primary outcome measure in clinical trials of disease-modifying osteoarthritis drugs (DMOADs). In the use of fluoroscopy to achieve reproducible alignment of the medial tibial plateau and x-ray beam, the SF-AP radiographic protocol affords greater sensitivity in the detection of joint space narrowing (JSN) than that achieved by conventional radiographic positioning techniques. However, the utility of the SF-AP view is compromised by the variation in x-ray penetration in each examination, which may confound the correction of the automated measurement of JSW for the radiographic magnification inherent in an AP view of the knee. A recent DMOAD trial using the SF-AP protocol showed an improbable increase in JSW of ≥0.50 mm (i.e., greater than the measurement error). The present report provides an analysis of this problem, and the study aim was to demonstrate that substitution of the automated estimates of JSW with precise manual measurements can markedly reduce the problem attributable to radiographic magnification.
SF-AP radiographs were obtained at baseline and at 16 months and 30 months thereafter from subjects enrolled in a 6-center DMOAD trial. For each examination, a 6.35-mm steel ball was affixed to the skin over the head of the fibula to permit estimation of the percentage of radiographic magnification (%Mag) and correction of JSW measurements. Measurements of the minimum interbone distance (IBD) in the medial tibiofemoral compartment and the %Mag were obtained by an automated method (edge detection) and manually. Combinations of automated and manual measurements of the IBD and %Mag in estimates of magnification-corrected JSW were compared with respect to their reproducibility, agreement, and sensitivity to JSN.
With fully automated measurements, variations in x-ray penetration in analog radiographs and edge enhancement in digital radiographs resulted in the computer “seeing” a metal ball whose diameter was artifactually reduced, resulting in an inflated measurement of JSW. Use of manual measurement of the IBD and %Mag largely eliminated these problems and reduced, from 16% to 2%, the frequency of knees exhibiting an increase in JSW ≥0.50 mm. In 14 of the 15 knees in which a significant increase in JSW was noted with the manual method, this increase in JSW could be explained by the development of significant lateral compartment narrowing during the study or poor alignment of the medial plateau.
Although automated and manual methods of JSW measurement of the knee in the SF-AP view possess comparable intrareader reproducibility, the manual method is less susceptible to technical factors that affect the correction of raw JSW estimates for radiographic magnification. Until we can identify practical, effective solutions to these technical problems, use of any radiographic protocol involving AP imaging of the knee in a DMOAD trial must be viewed with caution.
The development of protocols for standardized knee radiography has been regarded as an important advance that, when combined with automated measurement of joint space width (JSW) in digitized radiographs, permits more reproducible estimates of articular cartilage thickness in knee osteoarthritis (OA) than is possible with conventional radiographic methods (1). The semiflexed, anteroposterior (SF-AP) view of the knee (2, 3) has been used in several clinical trials of purported disease-modifying OA drugs (DMOADs; i.e., doxycycline, risedronate). This protocol uses fluoroscopy to guide positioning (flexion and rotation) of the knee and is superior to nonfluoroscopic standardization protocols (4, 5) in achieving reliable parallel alignment of the medial tibial plateau with the central x-ray beam in serial examinations (6). However, with the SF-AP view, in which the knee is not in contact with the x-ray cassette (Figure 1), quantitative estimates of JSW require correction for radiographic magnification, which may inflate estimates of the true value of JSW by as much as 35% (3).
We have previously presented data indicating that longitudinal variations in x-ray penetration alter by as much as 20% (7) the projected diameter of a spherical marker, which is used as a basis for correcting automated measurements of JSW in the SF-AP view for radiographic magnification. Changes in JSW (both positive and negative) due to x-ray penetration–related variations in the size of the marker, rather than to disease progression, confound the detection of joint space narrowing (JSN) in the SF-AP view and diminish the confidence (i.e., statistical power) with which a clinical DMOAD trial of reasonable size and duration can be expected to detect a true structure-modifying effect. Moreover, many clinical radiology departments have converted their methods from conventional analog to digital radiographic imaging. The effects of this trend on our ability to correct automated measurements of JSW from the SF-AP view for radiographic magnification are unknown.
We have recently completed a randomized, placebo-controlled clinical trial (RCT) of a DMOAD in which structure modification was evaluated in serial SF-AP radiographs. In the course of this trial, we encountered unanticipated problems with the correction of automated estimates of JSW for magnification, both in the analog and in the digital radiographs. These problems were highlighted by an inordinately large percentage of serial pairs of SF-AP radiographs (16%) in which automated measurements of JSW in analog and digital images suggested a paradoxical thickening of the articular cartilage beyond the limits of measurement error.
In the present report, we describe our experience with the correction of estimates of JSW for radiographic magnification, and we examine the extent to which substitution of automated estimates of JSW in serial SF-AP views with precise manual measurements may increase the power of a DMOAD trial to detect a difference between the active treatment group and placebo group with respect to the rate of JSN.
PATIENTS AND METHODS
Subjects for this trial (n = 431) were recruited from 6 clinical centers for a placebo-controlled RCT of a DMOAD. All subjects had unilateral knee OA by the Kellgren and Lawrence criteria (OA severity grades 2 or 3 for the index knee and grades 0 or 1 for the contralateral knee, in a standing AP view) (8). SF-AP knee radiographs were obtained at baseline and at 16 months and 30 months thereafter (Figure 1). As directed by the protocol for the SF-AP view (2), a magnification marker (6.35-mm steel ball encased in methyl methacrylate) was affixed to the skin over the head of the fibula for each examination. The minimum interbone distance (IBD) in the medial tibiofemoral compartment and the degree of radiographic magnification (%Mag) were measured by 2 methods, by investigators who were blinded to the treatment group.
First, automated measurements from digitized images were obtained by a research associate (computer operator) in the laboratory of 1 of the investigators (KAB), using xJSW software (9). This software uses a semiautomated (operator-assisted) edge-detection subroutine to define the margins of the femur and the medial tibial plateau in the medial compartment, and then fits circles between the bony margins (edges) to identify the circle with the smallest diameter (i.e., the minimum IBD, expressed in pixels). Another semiautomated subroutine estimates the %Mag reflected in the size of the circular projection of the magnification marker. After the operator draws a square region of interest (ROI) of known dimensions (e.g., 80 × 80 pixels) around the image of the marker, the software counts the number of unexposed pixels within the ROI (i.e., pixels that appear white against the black background). This count is expressed as a percentage of all pixels within the square. Based on the assumption that this percentage represents the area of the circle relative to that of the ROI, the diameter of the circle (expressed in pixels) is determined arithmetically. Given the known diameter of the marker (6.35 mm), the %Mag for each radiograph is expressed as the ratio of the 2 diameters (in pixels/mm). The minimum IBD is then divided by the corresponding value of %Mag, to yield a magnification-corrected estimate of JSW.
The manual measurements of minimum IBD and %Mag were then obtained directly from the radiographs, in accordance with the method of Lequesne (10). Manual measurements were performed by an investigator (SAM) who was blinded to the results of the automated measurements. The minimum IBD and the diameter of the projected marker were gauged with a screw-adjustable divider and transferred to a blank sheet of paper as pin pricks. A magnifying lens fitted with a 1-cm graticule (±0.2 mm) was then used to measure the distance between each pair of pin pricks. The manual measurement of IDB was then adjusted proportionately, based on the ratio of the measured diameter of the marker to the known diameter (%Mag), providing a magnification-corrected estimate of JSW.
For the purpose of establishing intrarater reproducibility, a second set of automated measurements of the IBD and %Mag were obtained in a random sample of 30 digitized images, with the operator blinded to the results of the first set of analyses. Similarly, 2 investigators (ML and SAM) each generated 2 sets of manual measurements of the IBD and %Mag from a separate random sample of 30 radiographs, with each rater blinded to his own initial measurements as well as to those of the other rater. Automated and manual methods of measurement were then compared with respect to reproducibility and agreement (Pearson correlation) of the estimates of IBD, %Mag, and magnification-corrected JSW.
To evaluate the discrete contributions of the measurements of IBD and %Mag to the sensitivity with which serial estimates of magnification-corrected JSW reflect the thinning of articular cartilage (i.e., JSN), 3 approaches to estimating JSW in the SF-AP view were compared: fully automated, fully manual, and hybrid (i.e., automated estimate of IBD corrected manually for magnification). These approaches were evaluated according to 2 criteria.
First, we compared the automated, manual, and hybrid approaches with respect to the frequency with which the 30-month estimate of JSW was ≥0.50 mm as compared with the baseline value. The value of 0.50 mm represents the upper limit of measurement error (the 95% confidence interval) for JSW estimates in repeated SF-AP radiographs obtained over an interval of 7–10 days (11). All knee radiographs obtained in the trial (both knees of subjects in both treatment groups) were evaluated in this manner.
Second, based on data from the placebo group of the present RCT, we estimated sample size requirements for a hypothetical 30-month DMOAD trial designed to detect a 30% decrease in the rate of JSN in the active treatment group, in comparison with the placebo group, with 80% statistical power and α = 0.05. The mean and SD values of the 30-month JSN in the placebo group derived from automated, hybrid, and manual approaches were the bases for alternative sample-size projections.
The intrarater reproducibility of measurements derived from automated and manual methods is shown in Table 1. The intraclass correlation coefficients (ICCs) for the repeated automated measurements of the IBD and %Mag were 0.993 and 0.968, respectively. When combined to produce a magnification-corrected estimate of JSW, the result also possessed remarkable intrarater reproducibility (ICC 0.991). For the repeated manual measurements conducted by each of 2 raters, the intrarater reproducibility was comparable with that seen in automated measurements, with ICCs of 0.973 and 0.980 for IBD, 0.989 and 0.998 for %Mag, and 0.986 and 0.996 for JSW. Moreover, the interrater reproducibility for manual measurements of the IBD was only slightly smaller (ICC 0.953) than the corresponding estimates of intrarater reproducibility (Table 1). In contrast, the correlation between manual measurements of the diameter of the ball (the %Mag) by the 2 raters was much smaller (ICC 0.785) than that for the measurements of the IBD. However, given that the SF-AP radiographic technique results in radiographic magnification of the knee within a relatively narrow range (10–35%) (3), the lesser level of agreement with regard to the degree to which the estimates of IBD required correction for magnification did not affect the interrater reproducibility of estimates of JSW (ICC 0.956).
Table 1. Intrarater reproducibility of automated and manual measurements of joint space narrowing in the knees of patients with osteoarthritis*
IBD = interbone distance; %Mag = percentage of radiographic magnification; JSW = joint space width.
Corrected for radiographic magnification.
Manual (reader 1)
Manual (reader 2)
Correlations between the automated and manual measurements of the IBD and %Mag in all radiographs from the trial are presented in Table 2. In radiographs from each of the 6 clinical centers, the correlation between automated and manual measurements of the IBD was very strong (0.88–0.93; P < 0.0000 for all). However, the %Mag estimates by the 2 methods were only moderately correlated (0.40–0.47) in 5 of the 6 clinical centers (Table 2).
Table 2. Pearson correlations between automated and manual measurements among the clinical centers
The percentage of index and contralateral knees in which the automated measurements over 30 months indicated an increase in medial JSW beyond the margin of measurement error (i.e., ≥0.50 mm) (11) varied from 7% to 41% across the 6 clinical centers, and this increase was seen in 16% of knee radiographs overall (Table 3). Substitution of the automated estimate with the manual estimate of %Mag (i.e., the hybrid approach) reduced only moderately the frequency of this type of measurement error in estimates of JSN (to 6–23% across the 6 clinical centers and 10% overall). The greatest decrease in the frequency of knees showing significant widening of the joint space (from 41% to 16%) occurred in the clinical center with the highest percentage of such errors in fully automated data. Notably, digital radiographic images, rather than analog images, comprised a far greater proportion of the SF-AP radiographs from this center than from the other 5 centers. Furthermore, a temporal trend was observed in the digital radiographs from this center; in contrast to the radiographs acquired at baseline, the digital images acquired at 16 months and at 30 months were processed with edge enhancement, which is an adjustment of the digital image that highlights bony margins and the perimeter of the magnification marker but decreases the intensity of the interior of the marker (Figure 2).
Table 3. Changes in magnification-corrected joint space width (JSW) in the index and contralateral knees, by method of measurement, in a 30-month, randomized clinical trial of doxycycline versus placebo in knee osteoarthritis
In the fully manual measurements of both the IBD and the %Mag, the frequency with which JSW values increased beyond the limits of measurement error was reduced to ≤5% in each of the clinical centers, and to only 2% overall (Table 3). Notably, of the 15 knees in which manual measurements indicated an increase in JSW ≥0.50 mm, the results in 5 were found to have been due to progression of JSN in the lateral compartment that could not have been anticipated at baseline, and the results in 9 were attributable to longitudinal changes in knee flexion or rotation that occurred despite the positioning standards of the SF-AP protocol.
The effect of measurement-related error variation in JSN by these 3 approaches on the sample-size requirements for a hypothetical DMOAD trial can be extrapolated from data from the placebo group of the present RCT. In the fully automated, serial measurements of JSW, the mean ± SD JSN over 30 months was 0.30 ± 0.92 mm (Table 3). Based on these data, if a hypothetical DMOAD trial were designed to have 80% power to detect, with a P value <0.05, a rate of JSN in the active treatment group that was 30% slower than that in the placebo group, then 1,642 subjects per treatment group would be required. If automated estimates of the %Mag were replaced with manual measurements of the diameter of the magnification marker, neither the hybrid measurements of the 30-month JSN in the placebo group (mean ± SD 0.35 ± 0.99 mm) nor the sample-size requirements (1,397 subjects/group) for the hypothetical DMOAD trial would change appreciably. However, because the fully manual estimates of JSN were relatively free of instances of inexplicable widening of the medial joint space, manual estimates of JSN yielded a larger mean value and smaller SD (mean ± SD 0.45 ± 0.70 mm) than did the automated or hybrid approaches (Table 3). Accordingly, with fully manual measurements of JSN, the sample size needed to afford 80% confidence that a true 30% DMOAD effect will be detected would decrease to 424 subjects per treatment group.
Clinical trials of DMOADs, hindered until recently by the absence of reliable radiographic methods by which to document structure modification, have been facilitated greatly by the development of protocols for standardizing the position of the knee in serial radiographic examinations (1). The SF-AP view was the first such protocol to be described (2). The primary advantage of the SF-AP view, compared with conventional radiographic methods, was the reproducibility with which the knee could be positioned and repositioned under fluoroscopy to achieve parallel alignment of the medial tibial plateau and central x-ray beam (superimposition ±1 mm of the anterior and posterior margins of the plateau).
In a DMOAD trial, the benefit of reliable radioanatomic positioning of the knee is remarkable precision (i.e., reproducibility in repeated examinations) of estimates of the minimum JSW in the SF-AP view after the measurement of the minimum IBD is corrected for radiographic magnification (3). In theory, increased precision in serial measurements of JSW should permit increased sensitivity to JSN; however, longitudinal studies of the radiographic progression of JSN in knee OA using highly standardized radiographic methods have yet to be reported in the literature (1). The choice of automated measurements of JSW at the outset of this trial was predicated on the fact that the original demonstration of the reproducibility of the SF-AP view was based on data derived from digital image analysis of JSW (3). However, the conclusions of that study and those of subsequent reports have been inconsistent with respect to whether semiautomated measurement software is superior to manual methods with respect to reproducibility of JSW estimates and sensitivity to JSN (3, 12, 13).
The data in the present study came from one of the first clinical DMOAD trials to use the SF-AP view. This trial was designed to have ≥80% power to detect a 30% decrease in the rate of JSN in the active treatment group, relative to the placebo group, provided that loss to followup was less than 15% and that the mean and SD JSN values (a composite of biologic variability and measurement error) in the placebo group accrued in roughly equal proportions over 30 months, as suggested by the literature (14, 15). At the completion of the trial, the former condition regarding minimal loss to followup was achieved (16). However, with automated measurements, the SD value for JSN in the placebo group at 30 months was 3-fold larger than the mean JSN value (Table 3), and the power to detect a true DMOAD effect was only 17%. In contrast, with fully manual measurements, the increase in mean JSN and concurrent decrease in between-subject variability of JSN in the placebo group restored much of the intended power of the present study (i.e., from 17% to 53%). It should be recognized that suboptimal power does not preclude detection of a significant DMOAD effect (17), although it does increase the risk of failing to detect a true DMOAD effect when one exists (Type II error).
We became aware of the potential for excessive error variation in JSN estimates when automated JSW measurements from interim (16-month) analog radiographs of many of the early enrollees in the trial indicated frequent increases in JSW beyond the limits of measurement error (≥0.50 mm). Visual inspection of these radiographs rarely confirmed an appreciable increase in joint space (due, for example, to a longitudinal change in alignment of the medial tibial plateau or to progression of lateral JSN). In almost all cases, the increase in JSW obtained with semiautomated measurement software was not observed in manual measurements of the IBD and %Mag. The evaluation of serial radiographs led to the observation that the worst cases of a false increase in JSW occurred in pairs of images in which the followup radiograph was notably darker than the baseline member of the pair. This prompted an investigation into whether uncontrolled, longitudinal variations in x-ray penetration compromised the capacity of the magnification-correction subroutine of the measurement software to estimate the %Mag. We found that overpenetration irretrievably “burned away” the margin of the radiographic image of the marker, causing a spurious reduction in size as measured by the computer software, and resulting in underestimation of the %Mag and corresponding overestimation of the JSW (Figure 2); indeed, variations in exposure altered the automated estimate of %Mag by as much as 20% (7).
We also showed that adjustment of the %Mag for the optical density of the analog radiograph in the black area surrounding the marker (a surrogate for x-ray penetration) could counteract this confounder to a significant degree (7). Other investigators also have recognized this problem in analog radiographs and developed a remedy based on adjustment of the sensitivity of the film digitizer (Beary J: personal communication). However, solutions to the limited problem of magnification correction in analog radiographs became moot when, during the course of our clinical trial, the radiology departments in 5 of our 6 clinical centers converted from analog to digital radiography.
The advent of digital radiography posed 2 additional problems for measurement of JSN in this trial. First, in 24% of subjects, JSW was measured in an analog radiograph at baseline and in a digital radiograph in 1 or both followup examinations. In the remainder of subjects, all 3 radiographs were either analog (60%) or digital (16%). Second, within the subset of digital radiographs, unanticipated variations were noted with respect to whether the printed images were processed with or without edge enhancement. As noted above, edge enhancement of the digital image of the magnification marker alters its appearance, decreasing the intensity of the interior of the circular projection (Figure 2). Because the subroutine of the measurement software counts only unexposed (white) pixels within the surrounding square ROI as representing the magnification marker, dimmed interior pixels are not counted, resulting in an underestimation of the %Mag and overestimation of the JSW—a result similar to that seen in overpenetrated analog radiographs. This problem occurred almost exclusively in digital radiographs from a single clinical center, and coincided with the turnover of radiology technologists who, in the absence of specific instructions to the contrary, used or did not use edge enhancement according to their individual professional practices.
In general clinical practice, a moderate degree of irreversible edge enhancement is applied routinely to the raw digital image prior to reproduction on film or electronic display. Unfortunately, many image processing techniques are nonlinear and the processing algorithms are proprietary, making it virtually impossible to reverse the processing in order to make measurements from the original “raw” untransformed image. Although it is now possible to save digital copies of the original untransformed image data, this capability did not exist at the beginning of our RCT, when analog radiographs were routine. In addition, to maintain consistency of study procedures, all images analyzed in this RCT, regardless of acquisition method (e.g., analog or digital), were digitized from filmed images prior to determination of the JSW.
Although the many sources of variation in magnification-related error in measurements of JSW may have been remedied to an acceptable degree by use of specific approaches (e.g., manual measurement of the %Mag in edge-enhanced digital radiographs, adjustment of the %Mag in analog radiographs for optical density), it was axiomatic that all measurements of JSW in this RCT be obtained by a single, reproducible method. Accordingly, we obtained a parallel set of manual measurements of JSW carried out in accordance with the method of Lequesne (10), and determined that automated and manual measurements of JSW in SF-AP radiographic views possess comparably high intra- and interrater reproducibility (Table 1). Whereas the 2 methods produced highly correlated measurements of the minimum IBD, estimates of the %Mag by the 2 methods were only moderately correlated. Magnification correction of the automated measurement of IBD with manual estimates of the %Mag (the hybrid approach) yielded only a small improvement in the overall sensitivity of serial JSW measurements to JSN, compared with the fully automated approach (Table 3). The main advantage of manual measurement of the %Mag was to negate the effect of edge enhancement in digital radiographs. Indeed, unlike the effect of overpenetration of analog radiographs, edge enhancement actually facilitated manual measurement of the projected diameter of the marker.
However, the greatest reduction in error variation in estimates of JSN was achieved by use of a fully manual method, which yielded an SD value for the 30-month JSN that was only 55% higher than the mean JSN value (i.e., half the magnitude of that obtained with fully automated measurements). This suggests that automated measurement of the minimum IBD, while affording remarkable precision in an individual radiograph, may be subject to other sources of error in evaluating serial radiographs of the same knee. It is possible that an automated subroutine for identifying the location in serial radiographs of the minimum IBD is more susceptible than expert human judgment to subtle alterations of the bony margins of the joint that are unrelated to alignment (e.g., encroachment of a marginal osteophyte into the joint space or a change in the sharpness of the dense cortical line representing the surface of the medial tibial plateau).
Based on the above analyses, we concluded that the evaluation of a DMOAD effect with the SF-AP radiographic view should be performed with fully manual measurements of JSW obtained by the method of Lequesne (10). Indeed, until methods are developed to negate the problems that we have encountered in correction of quantitative estimates of JSW for radiographic magnification, use of any radiographic protocol for AP imaging of the knee in future DMOAD trials must be viewed with caution. Several alternative standardization protocols for posteroanterior positioning of the knee have been described (4, 5, 18). However, only the Lyon schuss protocol (18) uses fluoroscopy to achieve parallel radioanatomic alignment of the medial tibial plateau and central x-ray beam, with a frequency in serial examinations comparable with that seen with the SF-AP view. Parallel radioanatomic alignment of the medial tibial plateau is the only element of positioning that is consistently related to sensitivity in detection of JSN in knee OA (15, 19). For this reason, the Lyon schuss view has been demonstrated to be more sensitive to JSN than the conventional standing AP view on radiographs (20).