Algorithm-Based Qualitative and Semiquantitative Identification of Prevalent Vertebral Fracture: Agreement Between Different Readers, Imaging Modalities, and Diagnostic Approaches
We compared SQ and ABQ diagnosis of VF imaged by radiography and X-ray absorptiometry. Mild ABQ VF had stronger associations with osteoporosis than mild SQ VF. Interobserver agreement (radiographic diagnosis) was better for ABQ.
Introduction: Vertebral fracture (VF) assessment from images acquired by X-ray absorptiometry (VFA) is often based on a semiquantitative approach (SQ); prevalent VF is identified if vertebral height appears reduced by >20%. Algorithm-based qualitative definition of osteoporotic VF (ABQ) requires evidence of endplate depression, and there is no threshold for reduction in vertebral height. The aims of this study were to (1) compare the prevalence of VFs; (2) compare the characteristics of women with and without VFs; (3) compare interobserver agreement; and (4) compare agreement between methods and imaging modalities for ABQ and SQ definitions of VFs.
Materials and Methods: Spine radiographs and absorptiometry images for 203 elderly women were assessed using ABQ (readers ABQ-1 and ABQ-2). These readings were compared with SQ assessments (readers SQ-1 and SQ-2) of the same images performed in a previous study. Agreement between readers and methods was assessed by kappa (κ) statistics.
Results: The prevalence of VF was 15–18% (radiography) and 12–24% (VFA) for ABQ and SQ, respectively. Women with ABQ or SQ fractures were older and had lower BMD than those without fracture (p < 0.01). Mild ABQ (but not SQ) VF was associated with low BMD. κ scores for interobserver agreement for radiography and VFA, respectively, were as follows: ABQ, κ = 0.74 (95% CI, 0.60, 0.87) and 0.65 (95% CI, 0.48, 0.81); SQ, κ = 0.53 (95% CI, 0.46, 0.60) and 0.51 (95% CI, 0.44, 0.58). For agreement between ABQ-1 and SQ-1, κ = 0.55 (95% CI, 0.39, 0.72) for radiography and 0.41 (95% CI, 0.25, 0.58 for VFA.
Conclusions: The prevalence of radiographic VF identified by ABQ and SQ was similar, but on VFA was 50% higher for SQ. Mild ABQ VF was associated with low BMD. Interobserver agreement for radiographic diagnosis was significantly better for ABQ than for SQ. Agreement between ABQ and SQ was moderate.
Prevalent vertebral fracture (VF) predicts future osteoporotic fractures independently of BMD.(1) Accurate identification of patients with prevalent VF is important for the effective targeting of therapy to reduce their fracture risk. VFs may be diagnosed from spine radiographs or from vertebral images acquired by X-ray absorptiometry. The latter, which is now referred to as densitometric VF assessment (VFA),(2) substantially reduces the effective radiation dose incurred by conventional radiography.(3) Several studies based on various different definitions of VF(4–11) have been performed to determine the comparability of densitometric imaging and conventional radiography. Current guidelines for VFA(2) recommend visual estimation of apparent reduction in vertebral height ≥20% for diagnosis of a prevalent VF, as described in the Genant semiquantitative (SQ) approach.(12) Fractures identified thus may be confirmed quantitatively using the scan analysis software. Schousboe and DeBold(8) performed VFA using this approach and compared it to SQ diagnosis of spine radiographs obtained in the same study population. They observed moderate agreement between readers and imaging modalities for the SQ assessment of VF.
A modified approach to visual diagnosis of VFs known as the algorithm-based qualitative method (ABQ) systematically excludes nonosteoporotic deformities.(13) Using this approach, osteoporotic VF is identified when there is the appearance of fracture at the vertebral endplate, and there is no minimum threshold for apparent reduction in vertebral height. There are currently no published data on interobserver agreement for the identification of VFs using this method, either for VFA or for radiographic diagnosis, and the ABQ and SQ methods have not been compared for the assessment of vertebral images acquired by X-ray absorptiometry. We therefore compared the ABQ and SQ methods for the assessment of radiographs and absorptiometry scan images previously acquired in the population studied by Schousboe and DeBold.(8) The aims of this study were to (1) compare the prevalence of VF; (2) compare the characteristics of women with and without VFs; (3) compare interobserver agreement; and 4) compare agreement between methods and imaging modalities for the ABQ and SQ definitions of VF.
MATERIALS AND METHODS
We evaluated spinal radiographs and vertebral images acquired by X-ray absorptiometry for postmenopausal women referred for bone densitometry at a large multispecialty group medical practice in suburban Minneapolis, MN, USA. Semiquantitative assessment of these diagnostic images has been reported previously.(12) Study entry was initially offered to all women ≥65 yr of age who were referred for bone densitometry. After the first 100 patients were enrolled, entry was restricted to women ≥65 yr of age with BMD T-score ≤ −1.0 at the total hip, femoral neck, or lumbar spine, in an attempt increase the proportion of study enrollees with one or more VFs. Women with scoliosis (∼17% of the population) were not excluded from this study. In total, 205 women were enrolled into the study during 2003. Women for whom both spinal radiographs and absorptiometric images were available were included in this analysis (n = 203). The mean age of these women was 74 yr (range, 65–93 yr). Approval for the study was granted by the Park Nicollet Institute Institutional Review Board, and all study participants gave informed consent. The study was funded by grants from Hologic (Bedford, MA, USA) and the Park Nicollet Institute.
Radiographic diagnosis was considered the gold standard for this analysis. Radiographs of the thoracic and lumbar spine were obtained in the lateral projection only, with patients lying in the left lateral decubitus position. Thoracic radiographs were centered on vertebra T8 and lumbar radiographs on vertebra L3, with a tube to film distance of 40 in. Spinal radiographs were identified by the patient's unique study number. The radiographs were not digitized. The absorptiometric images for VFA were acquired on the same day, using one of two Hologic Delphi W densitometers or a Delphi C densitometer (Hologic) and the Hologic Instant Vertebral Assessment (IVA) application. Single-energy scans of the thoraco-lumbar spine for VFA were obtained in the postero-anterior (PA) and lateral projection (lateral decubitus position).
Identification of prevalent VF
Readers ABQ-1 (GJ) and ABQ-2 (LF) independently assessed the spinal radiographs and absorptiometric images (VFA) for evidence of prevalent VF using the ABQ method.(13) They assessed the absorptiometric images first and were blinded to the results of these assessments when they subsequently read the spine radiographs. The readers were also blinded to the results of the original study analysis in which the same diagnostic images had previously been independently assessed by readers SQ-1 (JTS) and SQ-2 (CRD) using the SQ approach.(12) The ABQ readings of spine radiographs and absorptiometric images by readers ABQ-1 and ABQ-2 were performed ≥1 wk apart. Spine radiographs were viewed on a standard illuminated viewing box, and VFA was performed from a computer screen equipped with image analysis software (Hologic physician viewer).
SQ assessment of VF
The SQ assessments of VF performed in the original study have been described previously.(8) Vertebrae T4 through L4 were evaluated by readers SQ-1 and SQ-2 according to the Genant SQ method.(12) Vertebrae were judged to be normal (grade 0) or fractured (grades 1–3) according to visual estimation of apparent reduction in the anterior, middle, or posterior vertebral heights. Fractures were graded 1, 2, or 3 (mild, moderate, or severe) when vertebral height appeared reduced by ∼20–25%, >25% to <40%, or ≥40%, respectively. Fractures were also classified by type (wedge, biconcave, or crush). For posterior (crush) fractures, the height reduction criteria for the adjacent vertebrae both above and below the vertebra in question had to be met. Apparent reductions in vertebral height ≥20% identified visually by VFA were confirmed quantitatively using the scan analysis software.
ABQ assessment of VF
Vertebrae T4 through L4 were evaluated by readers ABQ-1 and ABQ-2 using the ABQ method. This method has been described in a previous report.(13) Each vertebra was classified to one of the following categories: (1) osteoporotic fracture; (2) non-osteoporotic short vertebral height; (3) normal; (4) uncertain (possible osteoporotic fracture, but uncertain because of atypical appearances or poor image quality); or (5) unable to evaluate (poor image quality or not imaged). Osteoporotic VF was identified when there was typical osteoporotic depression of the central vertebral endplate (concave fracture), with or without fracture of the vertebral ring apophysis or vertebral body cortex (wedge or crush fracture). For each osteoporotic fracture identified, the severity, type of fracture (concave, wedge, or crush), and affected endplate (s) were recorded. The severity of fracture identified by ABQ was determined by visual estimation of the apparent reduction in vertebral height as follows: approximately ≤25%, grade 1 (mild); >25% < 40%, grade 2 (moderate); ≥40%, grade 3 (severe fracture). This approach is similar to the Genant SQ grading scale,(12) except that there is no minimum threshold for reduction in vertebral height for ABQ definition of a prevalent fracture, whereas using the SQ approach, fracture is identified when vertebral height appears reduced by at least 20%.
Apparent reduction in vertebral height without endplate depression was categorized by ABQ as non-osteoporotic short vertebral height (SVH) due to other causes. This was assessed qualitatively, taking into account the variation commonly seen within and between vertebrae and in different regions of the spine; in other words, SVH was identified when one or more heights appeared shorter than expected. A threshold for SVH was not applied, but any height that is approximately <15% lower than expected is unlikely to be discernible by the naked eye. Each vertebra with SVH was categorized as follows: (1) normal or developmental variation; (2) degenerative change or Scheuermann's disease (with or without degenerative change); (3) scoliosis or kyphosis; (4) non-osteoporotic (traumatic or pathologic) fracture; or (5) metabolic bone disease other than osteoporosis. The affected height(s) (anterior, middle, posterior, or all three) was also recorded for each vertebra identified with SVH. Minor endplate deformities such as small Schmorl's nodes or osteophytes were not classified as SVH but were noted as supplementary information. Women with evidence of both osteoporotic VF and non-osteoporotic SVH were included in the VF group only. The main features of the ABQ method compared with the Genant SQ method (as previously described)(12) are summarized in Table 1.
Table Table 1.. ABQ and SQ Assessment of Osteoporotic VFs
BMD was measured by DXA in all study subjects using either the Hologic Delphi W densitometer or the Delphi C densitometer (Hologic). At the time of the study, the three densitometers were cross-calibrated for clinical purposes. The three devices measured BMD (using the same spine phantom) to within 0.01 g/cm2.
The prevalence of VF for radiography and VFA, respectively, was calculated as the percent of the 203 women identified with at least one VF and as the percent fractured vertebrae of the 2639 that could potentially be evaluated. Prevalence for ABQ was calculated from assessments made by ABQ-1 and was compared with prevalence for SQ based on assessments previously performed by SQ-1 in the same study population.(8)
The two-sample t-test was used to compare mean values for age, height, weight, BMD (measured at the lumbar spine and total hip), and BMD T-scores in women with and without VF identified by ABQ and SQ, respectively. Age- and weight-adjusted expected values for BMD were derived by performing regression of age and weight on BMD. The regression equations were used to calculate expected values for BMD, and Z-scores were calculated as the (observed minus expected BMD) divided by the study population SD. The one-sample t-test was used to analyze mean BMD Z-scores in women with VF to determine whether they differed significantly from the population mean (zero). These analyses were performed separately for radiography and VFA using Statgraphics version 5.0 (Manugistics, Rockville, MD, USA); p < 0.05 was considered statistically significant.
Agreement between readers and diagnostic approaches for the identification of prevalent VF was analyzed using κ statistics. Interobserver agreement for ABQ was tested by comparing the numbers of women and the numbers of vertebrae with concordant and discordant ABQ diagnosis of prevalent VF by ABQ-1 and ABQ-2. The interobserver agreement for ABQ was compared with that previously reported for SQ assessment of VF performed by SQ-1 and SQ-2 in the same study population.(8) Agreement between ABQ and SQ for the identification of prevalent VF was calculated by comparing the numbers of women and the numbers of vertebrae with concordant and discordant diagnosis of prevalent VF by readers A and C, respectively. These analyses were performed separately for radiography and VFA using MedCalc (B-9030; MedCalc, Mariakerke, Belgium). Mean κ scores were classified according to the system described by Altman.(14) Mean κ scores for interobserver agreement were considered significantly different if there was no overlap in the 95% CI for κ.
Prevalence of VF identified using the ABQ and SQ methods
The prevalence of women with VF was broadly similar for ABQ-1 and SQ-1 when identified from spine radiographs and was ∼50% higher for SQ-1 than for ABQ-1 when identified on VFA (Table 2).
Table Table 2.. Prevalence of Osteoporotic VF Identified by the ABQ and SQ Methods
Characteristics of women with and without VF identified using the ABQ and SQ methods
Women with osteoporotic VFs were significantly older and had significantly lower BMD than women without fracture (Table 3). This was true for women identified either by ABQ-1 or SQ-1 and for both imaging modalities. The mean BMD Z-scores in women with VF were also significantly lower than zero (one-sample t-test, p < 0.01) for all comparisons except for total hip BMD in women with fracture identified on VFA by SQ-1: in these women, the mean Z-score may have been marginally lower than zero (p = 0.06).
Table Table 3.. Characteristics of Women With and Without Prevalent VF Identified by the ABQ and SQ Methods
For women with mild VF only, the mean Z-score for lumbar spine BMD was significantly lower than zero in women identified by ABQ-1, both for radiographic diagnosis (mean, −0.489; 95% CI, −0.789, −0.180; p = 0.003) and on VFA (mean, −0.608; 95% CI, −0.911, −0.305; p = 0.001 for VFA). For women with mild fracture identified by SQ-1, the p values were 0.102 for radiographic diagnosis (mean, −0.446; 95% CI, −0.988, 0.097) and 0.165 for VFA (mean, −0.262; 95% CI, −0.637, 0.114). The total hip Z-scores in women with mild VF identified by either method (ABQ-1 or SQ-1) were not significantly lower than zero.
When the mean BMD Z-scores for women identified with VF by one method alone (ABQ-1 or SQ-1) were compared with those with fracture identified by both readers (concordant diagnosis), they were similar for all comparisons (radiography and VFA) except for total hip BMD in women with fracture identified on VFA by SQ-1 alone; in these women, the mean Z-score was significantly higher than in women with concordant diagnosis of fracture.
Interobserver agreement for the identification of prevalent VF using the ABQ and SQ methods
There was good agreement (assessed by κ statistics) between ABQ-1 and ABQ-2. The κ scores did not differ significantly for diagnosis by radiography or VFA or for the detection of mild versus moderate or severe fractures (Table 3). In contrast, the agreement between readers SQ-1 and SQ-2 had been significantly improved when the analysis was restricted to women with moderate or severe fractures.(8) The κ score for interobserver agreement for ABQ radiographic diagnosis was significantly higher than for SQ, with no overlap of the 95% CIs (Table 4).
Table Table 4.. Agreement Between Readers and Methods for the Identification of Women With Prevalent VF
Reader ABQ-1 identified nine women (radiography) and four women (VFA) who were not identified by ABQ-2 (Table 5). Women with discordant ABQ diagnosis had mainly mild thoracic fractures or deformities. Disagreement between ABQ-1 and ABQ-2 was not influenced by moderate or severe disc space osteoarthritis (Fisher's exact test for comparison of proportions, p = 0.48 and 0.06 for radiography and VFA, respectively), as was originally reported for the SQ readings.(8) The exclusion of 14 patients with moderate to severe scoliosis who had been identified in the original analysis(8) yielded only minimal differences in the mean κ scores (0.03 and 0.05 for VFA and radiography, respectively) for agreement between ABQ-1 and ABQ-2 (data not shown).
Table Table 5.. Reasons for Discordant Identification of Women With Prevalent VF
Agreement between ABQ and SQ assessment for the identification of prevalent VF
The κ scores for agreement between ABQ-1 and SQ-1 were moderate and were significantly better for radiographic identification of women with moderate or severe (grade 2 or 3) fractures (Table 4). Most of the women with fractures identified by SQ-1 only had non-osteoporotic SVH or uncertain fracture caused by oblique projection, scoliosis, or poor image resolution according to ABQ-1 (Table 5). All women with VF identified by ABQ-1 alone (radiography and VFA) had mild fractures only, and these were mainly in the thoracic region; for two of these women, ABQ-1 reported the apparent reduction in vertebral height identified on VFA to be <20%. The exclusion of patients with moderate or severe scoliosis had little effect on the κ scores for agreement between ABQ-1 and SQ-1, with minimal increases of 0.03 and 0.04 for radiography and VFA, respectively.
Agreement between radiography and densitometric VFA
The agreement between radiography and VFA assessed by κ statistics was good when assessed by ABQ-1 and moderate (fracture grades 1–3) to good (fracture grades 2–3) when assessed by SQ-1 (Table 6). The severity of fracture had little influence on the sensitivity and specificity of VFA performed by ABQ-1, but for SQ-1, the sensitivity of VFA was somewhat better for the identification of women with moderate to severe fracture compared with those with any grade of fracture (Table 6).
Table Table 6.. Agreement Between Densitometric and Radiographic VFA for the Identification of Women With prevalent VF
The approach currently recommended for VFA(2) is the Genant SQ method.(12) The ABQ approach may be a more accurate method of assessing prevalent VF on VFA and could conceivably reduce the false-positive rate and produce a more accurate evaluation of a patient's future fracture risk: this is an important consideration in assessing the clinical profile of the procedure. We applied the ABQ method to identify prevalent VFs in elderly women and compared this to the SQ assessments previously performed in the same study population. This is the first report to compare these two visual diagnostic approaches for the assessment of vertebral images acquired by both radiography and X-ray absorptiometry (VFA). Fewer women were identified with VF by ABQ than by SQ, and the difference was more marked for VFA than for radiographic diagnosis. Women with VF identified by ABQ or SQ were significantly older and had significantly lower BMD compared with women without fracture identified by the same method. The associations between low BMD and mild VF were stronger for women with fractures identified by ABQ than by SQ. Interobserver agreement for ABQ was in the range classified as good,(14) and for radiographic diagnosis, it was significantly better than for the SQ assessments performed in the original study.(8) Agreement between ABQ and SQ for the identification of prevalent VF was moderate; agreement between VFA and radiography was good for ABQ and moderate to good for SQ.
The ABQ method was developed in an attempt to reduce the false-positive rate and the subjectivity associated with traditional qualitative diagnosis: this is achieved by incorporating specific criteria to identify osteoporotic fractures and to exclude non-osteoporotic deformities. We speculated that the qualitative nature of the ABQ method might adversely affect the interobserver agreement for ABQ (particularly for VFA). In fact, there was good agreement between readers ABQ-1 and ABQ-2, which seems comparable to that previously reported for SQ assessment in a population with a high prevalence of VF (which produces higher κ scores).(12) For radiographic diagnosis, interobserver agreement for ABQ was also significantly better than between readers SQ-1 and SQ-2.(8) The two SQ readers, however, were not radiologists, whereas the ABQ readers were a radiologist (ABQ-1) and a nonradiologist (ABQ-2) with experience of evaluating VF; hence, the different levels of experience might account for some of the apparent differences in interobserver agreement for the two methods. Quantitative and SQ approaches are generally considered more objective than qualitative definition of VF, but reproducibility does not necessarily equate with accuracy; prevalent morphometric fractures, for example, are less strongly associated with low BMD(15) compared with those differentially classified by an expert radiologist. Some qualitative assessment of the vertebral endplate is also recommended when applying the Genant SQ method,(12,16) and if this is not applied, the accuracy of the method to detect prevalent osteoporotic fracture (particularly in the hands of an inexperienced observer) may be reduced. This may be particularly relevant for VFA, with its increasing availability, particularly when the images are assessed by nonradiologists.
Interobserver agreement for SQ was significantly improved for the detection of moderate or severe VFs (and most of the discordant diagnoses between ABQ-1 and SQ-1 were also mild fractures or deformities), whereas this had relatively little impact on the agreement between ABQ readers. This is unusual, because concordance between readers is usually better when mild fractures are excluded,(4,6) but it may because greater attention is paid to the appearance of the endplate using ABQ than to vertebral height per se. The clinical significance of mild prevalent fracture remains controversial,(17–20) but in this study population, we did observe low BMD at the lumbar spine in women with mild ABQ but not SQ fractures.
The percent disagreement for the identification of women with VF by readers ABQ-1 and ABQ-2 was low (6% or less). If we consider the diagnosis by the more experienced reader (ABQ-1) as the gold standard, <1% of all vertebrae evaluated were misclassified by ABQ-2 using the same method and for either imaging modality. These vertebrae were often reported difficult to evaluate because of poor image contrast, oblique projection, or scoliosis. Differential classification of non-osteoporotic SVH in women with VFs was less often a source of disagreement between ABQ readers than between ABQ-1 and SQ-1, and this probably reflects the systematic approach used in ABQ to exclude non-osteoporotic deformities.
This analysis has several strengths. It is the first comparison of interobserver agreement for ABQ and SQ assessment of VF. The assessments were performed in the same study population, which rules out population differences. We did not exclude vertebral levels that were not evaluated in both assessments under comparison, and we analyzed agreement between readers without excluding patients with moderate or severe scoliosis as has been done in previous analyses.(8,21) Our results, therefore, may be more applicable to clinical practice. Our analyses of interobserver agreement may underestimate the agreement that might be achieved between experienced radiologists trained in the application of the respective methods, but the results suggest that ABQ can be applied by a nonradiologist at least as effectively as by a nonradiologist using the SQ method. The absence of consensus on the optimal approach to the detection of prevalent VF means that we cannot be certain which deformities truly represent osteoporotic fracture. In subsequent studies, we plan to determine how well the ABQ method predicts future fracture and to evaluate interobserver agreement for ABQ definition of incident VFs.
We conclude that the prevalence of VF identified by ABQ and SQ was similar for radiographic diagnosis, but for VFA it was 50% higher for SQ; mild vertebral fractures identified by ABQ, but not by SQ, were associated with low BMD; interobserver agreement for radiographic diagnosis of prevalent VF was significantly better for the ABQ compared with the SQ method; and agreement between ABQ and SQ was moderate.
The authors thank Kevin Wilson, PhD (Hologic) for comments. This research was funded by Hologic, Bedford, MA, USA, and the Park Nicollet Institute, Minneapolis, MN, USA. LF is supported by a research fellowship from the Medical Research Council, UK.