- Top of page
- Materials and Methods
Experienced and inexperienced observers evaluated the assessability of 50 radiographs (25 dogs) and determined the hip status (dysplasia/nondysplasia and final scoring according Fédération Cynologique Internationale [FCI]-criteria) individually. A radiographic technical quality assessment was performed in a separate reading session. Interobserver agreement in determining dysplasia/nondysplasia and FCI-scoring did not significantly increase with the increasing quality of a radiograph, irrespective whether these observers are experienced or not. There was a significant agreement between the technical quality assessment and assessability (P<0.0005). Despite the effort to objectify radiographic quality and to present high-quality radiographs to observers, interobserver agreement on dysplasia/nondysplasia and final scoring, remains low, even in the experienced group. Although increased radiographic quality narrows the range of scoring, the range remains unacceptably high.
- Top of page
- Materials and Methods
Progress in decreasing the incidence of canine hip dysplasia (CHD) remains low1,2 despite the extensive use of various protocols using the subjective hip-extended method to screen breeding dogs. One of the reasons for this slow progress may be the inability to define a dysplastic dog or to define a dog's classification as A, B, C, D, or E according to the FCI-scoring system in an objective, repeatable, and correct way. The correct CHD diagnosis is pivotal to making sound recommendations regarding breeding programs to reduce the frequency of CHD and may prove useful in the clinical management and treatment of CHD.3–9,2,10–14
The CHD diagnosis is characterized by low interobserver agreement for final scoring, which could be due to improper positioning of the pelvis and/or femurs, or to insufficient radiographic contrast and/or detail. The importance of radiographic quality in the diagnosis of CHD has been described.15–20
Experienced observers are aware of the importance of radiographic quality in the diagnosis of CHD and are familiar with quality criteria. Radiographic quality assessment and FCI-scoring is performed in the same reading session. If a radiograph is considered assessable, the observer automatically indicates that the radiograph contains all the information that is needed to make a correct evaluation of the hip status.
Both interobserver agreement on assessability and on final scoring remain low despite improved subjective radiographic quality (assessable or not).12,13 Therefore, the National Committee for Inherited Skeletal Disorders in Belgium (NCISD) introduced a detailed and morphometric-based technical quality assessment to standardize the evaluation of radiographic quality and thus to improve interobserver agreement. This technical quality assessment is done in a separate reading session.
We are not aware of the existence of a technical quality assessment of hip-extended radiographs. Our aim was to describe the effect of a technical quality assessment on interobserver agreement in the diagnosis of CHD.
Materials and Methods
- Top of page
- Materials and Methods
Thirty observers, recruited from the universities of Bern, Ghent, Giessen, Utrecht, Zurich, and from private practices, who are members of the Flanders Orthopaedic Working Group, took part in the study as described previously.12,13 First, observers determined whether they would accept the radiograph for evaluation (radiograph is assessable or not). Secondly, observers indicated whether each coxofemoral joint was dysplastic or not, and third provided a final score for each joint according to FCI-regulations (A, B, C, D, or E),15 irrespective of whether the radiograph had been deemed assessable or not.
Observers were classified as either experienced (9) or inexperienced (21). The NCISD database was searched for dogs having more than one radiograph submitted where there were obvious positional differences. The selection was done irrespective of the official FCI-score. However, dogs with obvious degenerative joint disease or coxofemoral luxation were excluded. Furthermore, dogs with radiographic signs of malformation of the pelvis or lumbosacral joint were not included. Fifty radiographs from 25 dogs were selected.
Observers evaluated the radiographs individually, unaware of the fact that for every dog two radiographs were present. The radiographic quality on all 50 radiographs was assessed in a separate reading session according to the NCSID scoring system (technical quality assessment). The technical quality assessment was routinely implemented in the NCISD evaluation procedure for radiographs of client-owned dogs since August 2002. The authors are not aware of a similar system being used in other countries. The technical quality assessment is a method to quantify the FCI, OFA, and BVA/KC guidelines to reach the perfect position of the dog. The cut-off values of the technical quality assessment have been obtained empirically. The radiographic quality on all 50 radiographs was assessed by one individual (F.C.) in a separate reading session according to the NCSID scoring system (technical quality assessment).
For the technical quality assessment, all radiographs were digitized using a* digital camera and stored in a computer. Linear measurements that represent the metrical position of the iliac wings (width at comparable regions), femurs (deviation from the body axis), and patellae (positioning related to the fabellae), compared with the ideal position as described by the FCI-regulations,15 which are still used today (Table 1), were performed on these digitized images with a software program.† Although Digimizer® can be used for nonlinear measurements, comparison of surfaces of the pelvic cavities is extremely time consuming and therefore not applicable in routine screening practice. Pelvic symmetry along the body axis was measured by determining the ratio of the width of both iliac wings (ratio=1 if 100% symmetry). Additionally, pelvic symmetry was judged by the visual symmetry of the obturator foramina, acetabula, and the femoral head coverage by comparing each pelvic half, determined by a line drawn through the body long axis. Pelvic rotation along the transverse axis was not judged. Parallelism of both femora was determined by measuring the angle of deviation between the long axis of each femur with the long axis of the body. By drawing a line through the femoral long axis and a line through the top of both fabellae, the position of the patella was evaluated: if both lines divided the patella in half, the femur was considered sufficiently extended (Fig. 1).
Table 1. Fédération Cynologique Internationale (FCI) Protocol as Presented at the FCI-Workshop on Hip Dysplasia in Copenhagen on March 18, 2006
|Procedure for HD screening: the following rules (that focus on positioning and radiographic quality) are to be followed:|
|1. The minimum size of the X-ray film must be such as to totally include the pelvis and if possible both knees.|
|2. The technical quality of the radiographs has to be such as to allow an accurate screening procedure of the hip joint. The margo acetabularis must be clearly visualized. The positioning of the dog must ensure that the pelvis is symmetrical and not rotated along neither the long nor the short axis. Both ossa femoris must be parallel to each other and to the sagittal plane. The knees must be pronated so that the patellae are projected in sulcus intercondylaris on femur and held in a position close to the table. The patellae should be in contact with the line through the top of the fabellae.|
Figure 1. A, pelvic symmetry by metrical width of the iliac wings. B, the long axis of the body. C, femurs positioned parallel to the long axis of the spine. D, patellas symmetrically centered over the femurs and intersecting a line connecting the proximal margins of corresponding fabellae.
Download figure to PowerPoint
Subscores were created on the original radiographs for the contrast/exposure of the image, as determined by the visibility (from invisible to complete visibility) of the dorsal acetabular edge, physis of the femoral head, subchondral acetabular bone, and trabecular bone structure. Additional subscores were introduced for determining motion blur and artifacts of the film. In an excel worksheet (Fig. 2), the results of the technical control were automatically processed to obtain a final score for the technical quality of the radiograph. In this system, a radiograph is considered as technically acceptable if the overall score is at least 60%, and if the subscores for symmetry of the pelvis, positioning of the femurs and patellae, motion blur and unclean film are at least 50%. Additionally, the film is only accepted if the subscores for contrast and exposure of the film are at least 60%. This implies that a radiograph will be rejected for final scoring by the NCISD committee if the overall quality is <60%. Additionally, rejection will occur in the following instances: inadequate pelvic symmetry (left–right ratio >1.12), a difference of >5° between the long axis of the femurs and the body axis, touching of the patella with the cortex of the femur, the patella being above or below the line through the top of both fabellae and in dogs where less than half of the structure such as the dorsal acetabular edge, trabecular bone structure, physis of the femoral head, or subchondral acetabular bone is visible.
The assessability of a radiograph was defined as the percentage of observers that scored the radiograph as assessable. Interobserver agreement was calculated as the percentage of observers that made the same diagnosis: dysplasia or no dysplasia. An agreement score for FCI was derived in the same way. All evaluations on 50 radiographs by 30 observers were used in the statistical analysis.
It was tested whether the radiographic technical quality had an effect on the interobserver agreement score (dysplasia/nondysplasia and FCI-scores) by the mixed model with dog as random effect and the radiographic quality as continuous fixed effect. Additionally, it was tested whether there was an agreement between the subjective quality control (assessability) as done by the observers and the radiographic technical quality assessment using the generalized linear model with as response variable the binary variable technical quality assessment and as covariate the assessability of the radiographs.
Furthermore, the experienced group was compared with the inexperienced group in respect to the agreement between assessability and technical quality assessment by the generalized linear mixed model with as response variable the binary variable agreement assessability–technical quality assessment (0=different, 1=same), as covariate experience (experienced vs. inexperienced) and finally radiograph as random effect. Level of significance was 0.001.
- Top of page
- Materials and Methods
Interobserver agreement for experienced observers on assessability was low (68%).13 Seventeen radiographs of 14 dogs passed the technical quality assessment, which would imply that for 11 dogs, new radiographs would be requested. The average total score given on the accepted radiographs is 78.4% (range 70.5–84.2%) and for unacceptable radiographs 57.3% (range 32.6–73.7%). Radiographs with an overall score of >60% can still be rejected because of reasons mentioned above, which explains the overlap of unacceptable or acceptable radiographs. The description of the final scoring performed by the experienced group on the technically approved radiographs is listed in Tables 2 and 3. The effect of technical quality assessment is narrowing of range of final scoring in dog 1 and narrowing of range of final scoring and polarization toward dysplasia in dogs 3 and 7, but not significantly.
Table 2. Evaluations of Experienced Observers on Technically Accepted Radiographs
|Experienced observers||% Gr:A||% Gr:B||% Gr:C||% Gr:D||% Gr:E||Sum A+B||% Dyspl||Sum C+D+E|
Table 3. Results According to Experienced Observers (nine) on the Technically Accepted Radiographs
|Dog||% Dyspl||% Gr A+B||% Gr C|
After technical quality assessment, the range of scoring decreases by 1 or 2 grades in seven dogs for experienced observers and in six dogs for inexperienced observers, but not significantly. Unfortunately, overall interobserver agreement in determining dysplasia/nondysplasia and FCI-scoring did not increase significantly with the increasing radiographic quality, irrespective of observer experience (P<0.0001).
There was a significant relationship between technical quality assessment and assessability (P<0.0005), with the odds ratio of obtaining a positive technical quality assessment equal to 1.1067 (95% CI: 1.046–1.172) for each percentage increase in assessability, which implies that increased subjective quality of the radiograph correlates with increased technical quality.
This relationship between technical quality assessment and assessability in the inexperienced group was 70%, whereas in the experienced group it was 62%. This difference is not significant (P=0.173).
Figure 3 is an example of two radiographs of the same dog (dog 1). During technical quality control, radiograph 3A was rejected because the femurs were not parallel. Nevertheless, five experienced observers accepted radiograph 3A and scored the dog A–C, while eight experienced observers accepted radiograph 3B and scored the same dog in the range of B–D. The most important trait for rejection of a radiograph after technical quality assessment is nonparallelism of the femurs, followed by the lack of exposure/contrast (Table 4).
Figure 3. Radiographs of dog 1. (A) Was rejected after technical quality assessment because of nonparallelism of femurs while (B) was accepted. Five of nine experienced observers accepted (A) and eight accepted (B). Range of scores among the nine experienced observers was A–D for this dog.
Download figure to PowerPoint
Table 4. Reason for Rejecting Radiographs Using the Technical Quality Assessment
|Symmetry of the pelvis (1)||0|
|Parallelism of the femora (2)||9|
|Positioning of the patellae (3)||0|
|Contrast and exposure time (4)||4|
|2 and 4 only||2|
|Total and 2 only||2|
|Total and 3 only||2|
|Total, 1, and 2 only||3|
|Total, 1, and 3 only||1|
|Total, 1, and 4 only||1|
|Total, 2, and 3 only||1|
|Total, 2, and 4 only||1|
|Total, 3, and 4 only||1|
|Total, 1, 2, and 3 only||1|
|Total, 1, 2, and 4 only||1|
|Total, 2, 3, and 4 only||1|
|Total,1, 2, 3, and 4||3|
|Total of technically refused radiographs||33|
|Total of evaluated radiographs||50|
- Top of page
- Materials and Methods
Although technical quality assessment was able to decrease the range of final scoring in seven dogs and six dogs (experienced and inexperienced observers, respectively), overall interobserver agreement on dysplasia/nondysplasia and final scoring does not significantly increase with increasing radiographic quality using a technical quality control. Surprisingly, there was inconsistency between the final score and the diagnosis of dysplasia/nondysplasia in some observers (Table 3). This could be explained if some observers considered some grade B dogs as dysplastic, despite the FCI-regulations that define B as nondysplastic. High quality radiographs decrease the range of final scoring and causes a polarization toward dysplasia/nondysplasia in some dogs. Nevertheless, even with adequate radiographic quality, the range in final scoring remains unacceptably high, because the same dog can receive a score that permits unrestrictive breeding or a score that would exclude this dog from the breeding population. Only dogs 3 and 7 are considered undoubtedly dysplastic after technical quality assessment in the experienced group. Introducing a technical quality assessment in a separate reading session is therefore not sufficient to improve interobserver agreement. The range of scoring is affected more by the experts' individual scoring method than by film quality.
There is a significant relationship between the assessability of the radiograph and the evaluation after a technical quality assessment. Surprisingly, the relationship between assessability and technical quality assessment is less in the experienced group than in the inexperienced group. A possible explanation may be that experienced observers automatically combine the quality assessment of the radiograph with scoring in a single step, whereas inexperienced observers tend to separate the quality assessment and scoring. Experienced observers may feel more confident in their image interpretation.
Both radiographs of dog 1 (Fig. 3) are examples of how a flaw might affect final scoring. The nonparallel femora might obscure laxity and therefore incorrect up-grading might occur (Fig. 3A). The parallel femora and correct positioning of the patellae have been obtained by rotation of the pelvis over its transverse axis (Fig. 3B). In practice, radiograph 3B will be accepted for both assessability and technical quality assessment and scored. Which evaluation for this dog (B, C, or D) is correct remains obscure because a gold standard to evaluate CHD radiographically on a hip-extended radiograph does not exist.
The NCISD technical quality assessment-method is not as objective as should be. Nevertheless, 60 points are entirely based on (25) or helped by (35) linear measurements or marks on the digitized radiographs. Additional nonlinear measurements can be implemented, if necessary. Visual control of exposure and contrast will always remain subjective. If one is able to measure the Norberg angle or femoral head coverage, one can assume that exposure and contrast is at least acceptable. Despite the drawbacks of this technical quality assessment, this quality assessment is a more structured approach to perform quality assessment of a radiograph. Because the cut-off values in the technical quality assessment system have not been validated, the system needs to be investigated further. The technical quality assessment is designed to reach the most perfect radiographic quality, which is difficult, if not impossible, to obtain. Therefore, radiographs with a quality value of more than 60% after technical quality assessment were regarded as acceptable for reading. There is some bias by eliminating obviously osteoarthritic and subluxated dogs from the data, leading to lower interobserver agreement. However, in reality, radiographs of obviously affected dogs are often withheld from official screening. The results of this study apply only for dogs without anatomic malformations of the pelvis or lumbosacral joint, rendering correct position impossible. Interobserver agreement on these dogs was not investigated.
In conclusion, interobserver agreement in diagnosing CHD and in providing final scoring does not significantly improve with improved image quality, as determined by a technical quality assessment. Seemingly, increasing radiographic quality alone is insufficient in improving interobserver agreement to an acceptable level. This may indicate that the FCI-classification system for scoring hips is inadequately defined or is used in a subjective manner according the experts' individual scoring method, which leads to different evaluations.