Patients were recruited to participate in a natural history study of symptomatic knee OA, the Boston Osteoarthritis of the Knee Study (conducted from 1997 to 2001). The recruitment for this study has been described in detail elsewhere (9). Briefly, participants were recruited from 2 prospective studies, 1 of men and 1 of women. Potential participants were asked 2 questions: “Do you have pain, aching, or stiffness in one or both knees on most days?” and “Has a doctor ever told you that you have knee arthritis?”. For patients who answered yes to both questions, we conducted a followup interview in which we asked about other types of arthritis that could cause knee symptoms. If no other forms of arthritis were identified, then the individual was eligible for recruitment. A series of knee radiographs (posteroanterior, lateral, and skyline) were obtained from each patient to determine whether radiographic OA was present. If patients had a definite osteophyte on any view of the symptomatic knee, they were eligible for the study. Because they had frequent knee symptoms and radiographically defined OA, all patients met the American College of Rheumatology criteria for symptomatic knee OA (10).
The study included a baseline examination and followup examinations at 15 and 30 months. At baseline, patients who did not have contraindications to MRI underwent an MRI of the more symptomatic knee. MRIs of the same knee were also performed at the 15- and 30-month followup visits. At the baseline assessment, patients were weighed, with shoes off, on a balance-beam scale, and height was assessed. The Institutional Review Boards of Boston University Medical Center and the Veterans Administration Boston Health Care System approved the examinations.
All studies were performed with a Signa 1.5T MRI system (General Electric, Milwaukee, WI) using a phased-array, 2-element surface-receiver knee coil. A positioning device was used to ensure uniformity among patients and, over time, in individual patients. Coronal, sagittal, and axial images were obtained. Fat-suppressed spin-echo (FSE) proton density and T2-weighted images (repetition time 2,200 msec, echo time 20/80 msec, slice thickness 3 mm, 1-mm interslice gap, 1 excitation, field of view 11–12 cm, matrix 256 × 128 pixels) were obtained.
Meniscal degeneration and cartilage morphologic changes were assessed using the semiquantitative, multifeature Whole-Organ MRI Score (WORMS), which is applicable for use with conventional MRI techniques (11), and readers were blinded to these scores of MRI progression. There were a total of 3 readers who scored all MRIs, as previously described (12). The majority of longitudinal MRIs (86%) were read by a trained musculoskeletal radiologist (AG) and a musculoskeletal researcher, who read the images together. One reader (DJH), who was trained in the WORMS method by one of the musculoskeletal radiology readers (AG), scored the remainder of the subjects' MRIs. Thirty MRIs of the knee were reread during the course of the study for ascertainment of the intra- and interobserver reliability of the scoring of cartilage morphologic changes. Ten films were reread during the course of the study to ascertain the reliability of the scoring of other features, including features of the meniscus.
Tibiofemoral cartilage on MRI was scored on all 5 plates (central and posterior femur, and anterior, central, and posterior tibia) in both the medial and the lateral tibiofemoral joints. The anterior femur was not included in this analysis because this is part of the patellofemoral joint. Plates of the tibiofemoral joint were read using the coronal and sagittal fat-suppressed T2-weighted FSE images on a 7-point scale: 0 = normal thickness and signal; 1 = normal thickness but increased signal on T2-weighted images; 2 = partial-thickness focal defect of <1 cm in greatest width; 3 = multiple areas of partial-thickness (grade 2) defects intermixed with areas of normal thickness, or a grade 2 defect wider than 1 cm but <75% of the region; 4 = diffuse (≥75% of the region) partial-thickness loss; 5 = multiple areas of full-thickness loss wider than 1 cm but <75% of the region; 6 = diffuse (≥75% of the region) full-thickness loss.
In the WORMS system, a score of 1 does not represent a morphologic abnormality but rather a change in signal in cartilage with otherwise normal morphologic features. Scores of 2 and 3 represent similar types of abnormality of the cartilage, focal defects without overall thinning. Scores of 1 and 2 in this population were exceedingly unusual. Therefore, to create a consistent and logical scale for evaluation of cartilage morphologic change and a fair comparison with radiographic JSN changes, we collapsed the WORMS cartilage scores to a 0–4 scale, in which the original scores of 0 and 1 were collapsed to 0, the original scores of 2 and 3 were collapsed to 1, and the original scores of 4, 5, and 6 were considered 2, 3, and 4, respectively (12). The intraclass correlation coefficient (ICC) for agreement of cartilage readings among the readers ranged from 0.72 to 0.97. We defined a lesion as occurring in either the medial or the lateral compartment if it was present in the femur or the tibia of that compartment. Although we conducted analyses using this collapsed WORMS cartilage scale, analyses using the original scale yielded similar results.
The anterior horn, body segment, and posterior horn of each of the medial and lateral menisci were graded from 0 to 4 based on both the sagittal and the coronal images: 0 = intact; 1 = minor radial tear or parrot-beak tear; 2 = nondisplaced tear; 3 = displaced tear or partial resection; 4 = complete maceration/destruction or complete resection. This global variable scoring of meniscal integrity incorporates all elements of meniscal disease, and necessarily factors in meniscal position. Herein we regarded this as a global meniscal score, recognizing that abnormal meniscal position represents only one aspect of the score. For the purposes of this analysis, we have described this variable as meniscal degeneration. The interobserver agreement (ICC values) for reading of meniscal degeneration ranged from 0.72 to 0.97.
Using the coronal MR images and EFilm workstation software, we measured the following meniscal position variables to the nearest millimeter, in both the medial and the lateral compartments: subluxation, meniscal height, and meniscal covering and uncovering of the tibial plateau (see Figure 1). These measures were determined on the mid-coronal slice, where the medial tibial spine was of maximal volume. The point of reference for subluxation was the tibial plateau osteochondral junction at the joint margin (excluding osteophytes). Covering was measured from this point to the central edge of the meniscus, and uncovering was measured from this point to the tibial spine, both in the medial and in the lateral tibiofemoral compartments. The proportion of tibial coverage was defined as (meniscal covering/[meniscal covering + meniscal uncovering]).
Figure 1. Meniscal position measurements obtained on coronal magnetic resonance imaging (MRI). Measurements were made on the mid-coronal slice where the medial tibial spine was of maximal volume. The point of reference for subluxation was the tibial plateau osteochondral junction at the joint margin (excluding osteophytes). Covering was measured from this point to the central edge of the meniscus, and uncovering was measured from this point to the tibial spine (S) for the medial (M) and the lateral (L) tibiofemoral compartments.
Download figure to PowerPoint
In the sagittal plane, anterior subluxation of the medial and lateral menisci was assessed on 1 sagittal slice for the medial and then the lateral compartment. The slice selected for the medial compartment was located in the area where the semimembranosus tendon was most clearly visible, and for the lateral compartment, the point at which the fibula head was of maximal volume. A meniscus that was completely macerated or destroyed (as defined above) did not generate a measure of subluxation. Thus, when the meniscal WORMS value was equal to 4, we did not include these knees in the analyses of subluxation. Interobserver reliability (ICC values) for readings of meniscal position ranged from 0.86 to 0.93.
For these analyses, the predictors (meniscal degeneration, meniscal position, and cartilage morphologic changes) were read at baseline and 30 months. In addition, the dependent variable, JSN, was determined at both time points.
Weight-bearing posteroanterior radiographs were obtained at 0, 15, and 30 months, using the protocol of Buckland-Wright (13). The beam was aligned relative to the center of the knee using fluoroscopic positioning, and the knee was flexed so that the anterior and posterior lips of the medial tibial plateau were superimposed. The feet were rotated until the tibial spines were centered in the notch, and outlines of foot rotation were then made on foot maps so that the foot rotation would be the same for subsequent films. Fluoroscopic positioning has been shown to produce a more accurate assessment of the joint space compared with nonfluoroscopic acquisition, and to improve reproducibility of the joint space assessment (3). A reader unfamiliar with the MRI findings read all of the radiographs, paired and unblinded to sequence (9).
For evaluation of disease progression, we focused on the width of the joint space in the medial and lateral compartments, since this has been found to correlate with cartilage thickness (6). Films were read by using the Osteoarthritis Research Society International Atlas (14), in which each medial and lateral tibiofemoral joint space was graded from 0 (normal) to 3 (bone on bone). The intraobserver agreement (kappa value) for reading change in JSN was 0.81 (P < 0.001).
All analyses were performed in a compartment-specific manner. The dependent variable was JSN on plain radiographs (possible grade range 0–3). For the predictors, we used the summary score for cartilage in 5 plates (anterior tibia, central tibia, posterior tibia, central femur, and posterior femur; possible range 0–20), the summary score for meniscal degeneration (anterior, body, and posterior; possible range 0–12), and the meniscal subluxation measures by compartment. We described the distribution of each predictor (cartilage score on MRI, semiquantitative score of meniscal morphologic abnormalities, and meniscal subluxation measures) according to radiographic JSN outcome categories. Meniscal coverage was not included in this analysis, because there was no prior knowledge about what is normal and what is abnormal.
We conducted a cross-sectional analysis using a multivariate logistic regression model to estimate the relative contribution of meniscal factors and cartilage morphologic abnormalities to JSN (dependent variable), while adjusting for age, sex, and body mass index (BMI). We used the same approach for change in JSN (dependent variable) and change in predictor variables.
To ensure that we could estimate the relative contributions of each of the meniscal and hyaline cartilage measures to joint space width and loss, we used forced inclusion methods, starting with a model containing age, sex, and BMI and then, one at a time, adding the predictor variables (summary cartilage score on MRI, summary semiquantitative score of meniscal morphologic abnormalities, and meniscal subluxation measures). We also performed regression modeling with initial inclusion of all of the predictor variables, i.e., age, sex, BMI, summary cartilage score on MRI, summary semiquantitative score of meniscal morphologic abnormalities, and meniscal subluxation measures (hereafter referred to as the full model).
We compared the Akaike's information criterion (AIC) and the c statistic (area under the receiver operating characteristic curve [AUC]) generated from the full model with the values generated from the model containing only age, sex, BMI, and each predictor, to assess the relative contribution of each factor to the model. The AIC statistic adds twice the number of predictors and outcome levels to the −2 log likelihood to penalize models that are overly complex, thus arriving at a less biased assessment of the ability of a model to predict the outcome. When comparing models on the basis of the AIC, a lower value indicates a more desirable model (15). The c statistic is the AUC for a model and is a rank-based measure of how well a model discriminates between outcomes. It varies from 0.5 if the model's predictions are no better than chance, to 1.0 when the model always assigns higher probabilities to correct cases than to incorrect cases.