To compare the level of agreement between quantitative and qualitative methods in determining patellofemoral relationships, since controversy exists regarding the use of quantitative vs. qualitative criteria to interpret images of the patellofemoral joint (PFJ) obtained using kinematic magnetic resonance (MR) imaging.
Materials and Methods
One hundred twenty mid-patellar axial plane images obtained using kinematic MR imaging from fifteen subjects were randomly selected for analysis. MR images represented various knee flexion angles ranging from 0 to 60 degrees. Quantitative analysis (bisect offset and patellar tilt angle) was performed by two examiners using a computer-assisted software program. Based on data from previously published literature, MR images were characterized as demonstrating normal, medial, or lateral patellar subluxation, and/or normal, medial, or lateral tilt. Using similar categories, two different examiners experienced in reading MR images of the PFJ then applied qualitative criteria to the same images.
The average agreement between the quantitative and qualitative assessments of horizontal patellar displacement and patellar tilt ranged from poor to moderate (Kappa coefficient values of 0.27 and 0.45, respectively). Quantitative and qualitative techniques demonstrated acceptable intra- and inter-observer reliability.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
THE USE OF STANDARD radiography for diagnostic evaluation of the patellofemoral joint (PFJ) has many recognized limitations and, as such has questionable clinical value (1). In 1988, a kinematic magnetic resonance imaging (MRI) procedure was developed to provide diagnostic information pertaining to patellar alignment during the initial increments of joint flexion, when subtle position-related abnormalities are the most apparent (2). Since then, reports have indicated that kinematic MRI of the PFJ is a sensitive and useful technique for assessment and characterization of aberrant positions of the patella (1, 3, 4).
Interpretation of images obtained from kinematic MRI examinations may be accomplished either using qualitative or quantitative techniques (3–6). Qualitative assessments have been used in the vast majority of clinical reports utilizing kinematic MRI of the PFJ (3, 5, 7–9). Quantitative interpretation, however, is the preferred method for research purposes and typically includes the use of established PFJ indices such as the Merchant angle, lateral patellofemoral angle, and bisect offset (5, 7, 10). The preference for quantitative measures in experimental studies is fairly obvious, as the use of objective criteria minimizes experimenter bias associated with qualitative assessment, and provides numerical values for statistical analyses. Objective measurements are rarely used clinically, however, as they are time consuming, difficult to apply in cases of patella alta, or patellofemoral dysplasia, and often are difficult to interpret (5).
Despite the fact that qualitative and quantitative techniques are used for assessment of kinematic MRI of the PFJ, it is not known if both methods yield similar interpretations. In fact, evidence exists suggesting that the two methods may give different results. For example, using qualitative criteria from images obtained using kinematic MRI, Shellock, et al (6) reported 76% of PFJs examined had a correction or improvement in patellar subluxation after the application of a realignment brace. Utilizing the same realignment brace and kinematic MRI protocol, a recent study by Powers, et al (11) found no difference in patellar tracking in subjects with patellofemoral pain when objective measurements were employed. Although the results of these two studies may be related to differences in patient populations, it is also possible that the conflicting findings were the result of the contrasting methods used to assess the images (i.e., quantitative vs. qualitative criteria).
Given the potential discrepancy between qualitative assessment and quantitative measurement of PFJ relationships, the purpose of this investigation was to compare the level of agreement between these two methods. A secondary purpose of this study was to establish the inter- and intra-observer reliability of qualitative and quantitative assessment of PFJ alignment.
MATERIALS AND METHODS
One hundred twenty mid-patellar, axial plane MR images of the PFJ were randomly selected from a pool of 546 images. Using previously described kinematic techniques (6, 11), data were obtained from 15 subjects (eight male and seven female, mean age 26.1 ± 8.1 years). Nine of these subjects had a diagnosis of patellofemoral pain and six subjects were pain-free. The selected images represented various knee flexion angles (0° to 60°) and were obtained during either weight-bearing knee flexion and extension (i.e., squatting against body weight) or non-weightbearing knee extension (i.e., extending the knee against gravity in the seated position) (12).
A 0.5-T vertically open MR system was used, allowing images to be acquired in the standing or seated positions (5). A fast multiplanar spoiled gradient-echo (FMSPGR) pulse sequence was used to acquire axial images of the PFJ. Imaging parameters were: time to repeat (TR) 10.3 msec, echo time (TE) 2.7 msec, 40° flip angle, 35 × 18 cm field of view (FOV), 256 × 128 matrix size, number of excitations (NEX) 2, and 12 mm slice thickness. Fifteen images were acquired in 20 seconds (0.75 seconds per image).
Quantitative analysis of the selected images was made by two researchers experienced in measuring PFJ relationships. A custom macro written for NIH Image software (National Institute of Health, Bethesda, MD) was used for all measurements.
Medial and lateral patellar displacement was assessed using the bisect offset measurement as described by Brossmann, et al (7). The bisect offset index was measured by drawing a line connecting the posterior femoral condyles and projecting a line anteriorly through the deepest point (apex) of the trochlear groove. This line intersected the patellar width line, which connected the widest points of the patella (Fig. 1) (7). To obtain data when the trochlear groove was flattened, the perpendicular line was projected anteriorly from the bisection of the posterior femoral condylar line (13). The bisect offset was representative of the extent of the patella lateral to midline and was expressed as a percentage of total patellar width.
Medial/lateral patellar tilt was measured using a modification of the technique described by Powers, et al (13). The patellar tilt angle was reported as the angle formed by the lines joining the maximum width of the patella and the line joining the posterior femoral condyles (Fig. 2) (13). All tilt measurements were reported in degrees.
Qualitative assessment of the selected images were made by two investigators, each with over 10 years of clinical and research experience in reading axial plane MR images obtained using kinematic methods. These two examiners used subjective criteria previously described by Shellock, et al (5, 6, 14), and were asked to characterize each image with respect to horizontal patellar displacement (medial, normal, and lateral) and patellar tilt (medial, normal, and lateral). Briefly, normal patellar alignment was qualitatively defined as being present when the median ridge of the patella was positioned in the femoral trochlear groove, without transverse displacement of the medial or lateral facets. Medial or lateral subluxation of the patella was considered to be present when the median ridge of the patella was displaced medially or laterally relative to the femoral trochlea, and the medial or lateral facet was displaced relative to its respective anterior femoral condyle. Medial or lateral tilt of the patella was considered to be present when the anterior aspect of the patella was oriented medially or laterally. For purposes of establishing intra- and inter-observer reliability, both qualitative assessments and quantitative measures were made on 50 images on two different days approximately one week apart. All investigators were blinded to previous measurements and to identifying information on the MR images. In all cases, images were presented in random order.
Comparison of continuous numeric data (quantitative measurements) and categorical data (qualitative measures) required subsetting the numerical data into distinct categories. The establishment of these categories was based on previously published data obtained from healthy individuals using techniques similar to those employed in the current study (13).
With respect to medial/lateral displacement (Fig. 1), an image was considered “normal” if the bisect offset value fell between 0.44 and 0.64. Based on the work of Powers, et al (13), this range of patellar displacement represents the mean patellar displacement of pain-free individuals (0.54) ± 2 SD (0.05), and should encompass 95% of the healthy population. Therefore, a bisect offset value greater than 0.64 was considered indicative of lateral patellar displacement. A bisect offset value less than 0.44 was considered indicative of medial patellar displacement.
With respect to medial/lateral patellar tilt (Fig. 2), an image was considered “normal” if the patellar tilt angle was found to fall between −2.5 (negative value indicating medial tilt) and 13.5 degrees. This range of tilt represents the mean patellar tilt angle of pain-free individuals subjects (5.5 degrees) ± 2 SD (4.0 degrees) (13), and should encompass 95% of the this healthy subject population (15). Therefore, a patellar tilt angle value greater than 13.5 degrees was considered indicative of lateral tilt. A patellar tilt angle value less than −2.5 degrees was considered indicative of medial tilt.
Interclass correlation coefficients (ICC)2, 1 (the standard equation number) were used to determine intra- and inter-obsrever reliability for the quantitative measurements (continuous numeric data). Kappa coefficients were used to determine intra- and inter-observer reliability for the qualitative assessments (categorical data).
Once the quantitative data were characterized categorically, the Kappa coefficient was used to assess the level of agreement between the qualitative and quantitative assessments of patellar alignment. Separate analyses were performed for horizontal patellar displacement and patellar tilt and were performed using SPSS statistical software (SPSS Inc., Chicago, Illinois).
The intra-observer reliability of the two examiners performing the quantitative measurements was excellent for both horizontal patellar displacement (ICCs: 0.92 and 0.99; Table 1) and patellar tilt (ICCs: 0.99 and 0.96; Table 2). In addition, there was excellent agreement (inter-oberver reliability) between the two quantitative examiners for both horizontal patellar displacement (ICC = 0.90; Table 1) and patellar tilt (ICC = 0.90; Table 2).
Table 1. Interclass Correlation Coefficients and Kappa Coefficients for Medial/Lateral Patellar Displacement
Quantitative examiner 1
Quantitative examiner 2
Qualitative examiner 3
Qualitative examiner 4
Intertester reliability (quantitative).
Intertester reliability (qualitative).
Agreement between quantitative and qualitative.
ICC = Interclass correlation coefficient; K = Kappa Coefficient.
The intra-observer reliability of the two examiners performing the qualitative measures ranged from substantial to excellent for both horizontal patellar displacement (Kappa coefficients: 0.67 and 0.80; Table 1) and patellar tilt (Kappa coefficients: 0.80 and 0.94; Table 2). The inter-observer reliability of the qualitative measures was moderate for horizontal patellar displacement (Kappa = 0.54; Table 1) and patellar tilt (Kappa = 0.51; Table 2).
The level of agreement between qualitative and quantitative assessment techniques was poor to moderate for horizontal patellar displacement (Kappas ranging from 0.28 to 0.45; Table 1). Similarly, the level of agreement between qualitative and quantitative assessment techniques was poor for patellar tilt (Kappas ranging from 0.27 to 0.31; Table 2).
The primary result of this study is the poor agreement between quantitative and qualitative methods in characterizing PFJ relationships. This finding suggests that care must be made when comparing data obtained from these two methods. One likely explanation for this discrepancy may be related to differences in landmarks used by these two techniques. Quantitative measurements are typically made by identifying specific osseous landmarks, while qualitative assessments are made by visually describing the relationship of articulating joint surfaces relative to one another.
Another possible explanation for the poor agreement between the qualitative and quantitative methods is reflected by the fact that quantitative measurements may be more precise than qualitative assessments. For example, a measurement of 13.6 degrees of lateral patellar tilt would fall into the normal quantitative category but could easily be considered lateral based on qualitative criteria (Fig. 3). This suggests that although discrepancies may not occur when evaluating images with obvious malalignment, disagreement would more likely occur when evaluating “borderline images”.
Most of the observed differences between the two methods were seen with qualitative assessment categorizing horizontal patellar displacement as being lateral while quantitative measurement categorized the patella as being normal (Table 3). The same trend was evident with patellar tilt. The fact that many of the disagreements were “borderline” images illustrates the difficulty in comparing data from these two methods (Fig. 2). Expanding the “normal” range would have improved agreement between the two assessment techniques; however, a data point that is 2 SD above the population mean typically represents a value that would be considered outside of the normal range (15).
Table 3. Example Measurement Comparison Matrix Between Quantitative and Qualitative Examiners
The secondary finding of this study was that intra-observer reliability for both qualitative assessments and quantitative measurements were acceptable, with the intra-observer reliability of the quantitative measurements being slightly greater than that of the qualitative assessments. The higher degree of intra-observer reliability found in the quantitative measurements may be attributed to the precise landmark criteria, and the fact that measurements were made using a computer-aided system. Additionally, inter-observer reliability was higher for quantitative measurements. It stands to reason that quantitative measurements would be reliable as long as the precise anatomic criteria for the measurement is well understood by all examiners.
The moderate inter-observer reliability found with the qualitative examiners was somewhat unexpected; however, this result should be interpreted with caution. The qualitative examiners agreed on more than 70% of the images when categorizing horizontal patellar displacement, and agreed in 80% of the cases when categorizing patellar tilt. These percentages were similar to the percentage of agreements found between quantitative examiners, which produced ICC values in excess of 0.90. The inconsistency between the reliability statistic and the percentage of agreements can be explained by the computational formula of the Kappa coefficient. The Kappa coefficient weights the number of agreements between examiners by a proportion of chance agreement. In this study, qualitative examiners categorized the majority of patellae as normal or laterally displaced, and normal or laterally tilted. Therefore, the possibility of chance agreement between qualitative examiners was relatively high, thus reducing the magnitude of the Kappa value. A sample with a more even distribution of medial, normal, and lateral patellar displacement and tilt would have reduced the proportion of chance agreement, therefore improving the reliability of the qualitative examiners. Hence, the relatively low agreement between qualitative examiners compared to the quantitative examiners should be interpreted with caution.
Based on the results of this comparison study, care must be taken in making a determination as to which technique (qualitative or quantitative) should be used when assessing PFJ relationships. In order to make such a decision, both methods would have to be compared to a known “gold standard” with the underlying assumption being whichever method more closely approximated the “gold standard” would likely provide the best estimate of true patellofemoral alignment. To date, such a “gold standard” does not exist. However, an argument could be made that assessing PFJ relationships using osseous landmarks (as is commonly done with quantitative methods) may not be valid because cartilagenous surfaces, and not bony surfaces, are interacting at the joint. This premise is highlighted in a recent study by Staubli, et al (16), who found that patellofemoral relationships based on osseous landmarks do not coincide with patellofemoral relationships defined by cartilagenous surfaces. Unfortunately, articular cartilage is not readily visualized using fast gradient-echo pulse sequences typically utilized for kinematic MRI of the PFJ. Therefore, the use of such indices would not be possible using kinematic imaging techniques. Although most (if not all) quantitative methods to assess patellofemoral relationships use osseous landmarks, it is possible that quantitative indices that use cartilagenous interfaces could have a higher degree of agreement with the qualitative assessment methods used in the current study. Further research is necessary to test this hypothesis.
In conclusion, although quantitative and qualitative methods demonstrated moderate to excellent intra- and inter-observer reliability, there was poor agreement between qualitative and quantitative methods. The most common area of disagreement occurred when the patella was characterized as being laterally displaced or tilted based on qualitative assessment, as opposed to being characterized as “normal” based on quantitative measurements. While quantitative and qualitative methods are likely used in different circumstances (i.e., research vs. clinical), these results indicate that care must be taken when comparing data obtained from these two methods.