Repeated hemarthrosis in hemophilia causes arthropathy with pain and dysfunction. The Hemophilia Joint Health Score (HJHS) was developed to be more sensitive for detecting arthropathy than the World Federation of Hemophilia (WFH) physical examination scale, especially for children and those using factor prophylaxis. The HJHS has been shown to be highly reliable. We compared its validity and sensitivity to the WFH scale.
We studied 226 boys with mild, moderate, and severe hemophilia at 5 centers. The HJHS was scored by trained physiotherapists. Study physicians at each site blindly determined individual and total joint scores using a series of visual analog scales.
The mean age was 10.8 years. Sixty-eight percent were severe (93% of whom were treated with prophylaxis), 15% were moderate (24% treated with prophylaxis), and 17% were mild (3% treated with prophylaxis). The HJHS correlated moderately with the physician total joint score (rs = 0.42, P < 0.0001) and with overall arthropathy impact (rs = 0.42, P < 0.0001). The HJHS was 97% more efficient than the WFH at differentiating severe from mild and moderate hemophilia. The HJHS was 74% more efficient than the WFH at differentiating subjects treated with prophylaxis from those treated on demand. We identified items on the HJHS that may be redundant or rarely endorsed and could be removed from future versions.
Both the HJHS and WFH showed evidence of strong construct validity. The HJHS is somewhat more sensitive for mild arthropathy; its use should be considered for studies of children receiving prophylaxis.
Recurrent joint bleeding in persons with hemophilia (especially severe hemophilia, as defined by a circulating clotting factor activity of ≤0.01 IU/ml) is known to lead to joint damage with pain, loss of range of motion, loss of function, and long-term physical and psychosocial impairments (1, 2). Almost 80% of hemarthroses in hemophilia occur in the elbows, knees, and ankles (3). A key factor in preserving these joints and improving overall outcomes is to reduce the frequency and severity of hemarthroses (4).
A strong body of evidence supports the use of prophylactic clotting factor replacement, which is the regular infusion of clotting factor to maintain clotting activity so that spontaneous bleeding does not occur. Prophylaxis reduces the number and severity of hemarthroses in hemophilia, ultimately resulting in less joint damage (5–7). Accordingly, several groups recommend treating severe hemophilia with prophylaxis, where available.
The management and study of prophylaxis, with its associated benefits to overall joint health, call for measures that are capable of detecting early and subtle changes in joint health and function (8). The current World Federation of Hemophilia (WFH) evaluation system (9) comprises 4 parameters: pain, bleeding, physical examination, and radiographic examination. Although this system is the most widely used joint assessment instrument (8), one of its shortcomings is its lack of established reliability, validity, and sensitivity to change (10). The WFH physical examination scale may be inadequate for the evaluation of younger children and for adults with mild hemophilic arthropathy, and thus may not be adequate for evaluating the results of prophylaxis.
The shortcomings of the WFH scale are due to its original use in evaluating severe joint destruction in the preprophylaxis era. For example, the WFH scale defines swelling as either present or absent without allowing for severity of swelling; the scale scores axial alignment deviations that are part of normal development in healthy young children (11, 12) as abnormal; loss of range of motion in the knees of 10% or less is considered normal, which reduces sensitivity to early damage; the scale does not define muscle atrophy despite normal differences in muscle circumference that occurs during growth (13); and gait is not evaluated on the WFH scale, yet gait analysis is a very sensitive tool to detect subtle abnormalities in lower extremity joint function (14).
In an attempt to address these shortcomings, we modified the WFH system (15, 16) and developed the Hemophilia Joint Health Score (HJHS) (17). The HJHS is an 11-item scoring tool for assessing joint impairment in children ages 4–18 years (Supplementary Appendix A, available in the online version of this article at http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2151-4658). Using the World Health Organization's International Classification of Functioning, Disability and Health, the HJHS is a clinical measure of joint structure and function (18). As such, it is hoped to complement other structural examinations, such as radiography and magnetic resonance imaging (19). We have shown that the HJHS has high inter- and intrarater reliability (20).
The purpose of this study was to determine the convergent and discriminant construct validity of the HJHS in comparison to the WFH joint assessment tool. Specifically, we hypothesized that: 1) the HJHS total score would have a moderate correlation with a physician global score of joint damage in children with hemophilia, 2) the HJHS individual joint scores would have a moderate correlation with physician global scores of individual joint damage, 3) the HJHS would have a weak to moderate correlation with disability (as determined by the modified Child Health Assessment Questionnaire (C-HAQ) in children with hemophilia, 4) the HJHS would discriminate children with severe hemophilia (clotting factor level ≤0.01 IU/ml) from those with mild hemophilia (clotting factor level ≥0.05 and <0.3 IU/ml) and moderate hemophilia (clotting factor level >0.01 and <0.05 IU/ml), 5) the HJHS would discriminate children with hemophilia who had been treated with primary or secondary prophylaxis (21) from those only ever treated with on-demand therapy, and 6) the HJHS would have a weak to moderate correlation with the lifelong number of hemarthroses as reported by the subject/family. In addition, we hypothesized that there would be redundant or rarely endorsed items that can be removed from the HJHS.
SUBJECTS AND METHODS
This study was approved by the Research Ethics Board at The Hospital for Sick Children and by the Research Ethics Boards of the additional participating sites (Karolinska University Hospital; University Medical Center Utrecht, Wilhelmina Children's Hospital; University of Colorado Denver; and Centre Hospitalier Universitaire Sainte-Justine). All of the subjects or their parents provided written consent; young subjects provided verbal assent, as appropriate. The participants were sequentially approached and were studied at a single study visit.
Boys ages 4–16 years who had been diagnosed with hemophilia A and B (factor VIII or IX <0.3 IU/ml) of any severity (defined by clotting factor activity so that ≤0.01 IU/ml is “severe,” >0.01 and <0.05 IU/ml is “moderate,” and ≥0.05 and <0.3 IU/ml is “mild”) were eligible to participate in this prospective study. Enrollment was targeted so that there would be a 3:1 enrollment of children with severe hemophilia as compared to mild or moderate hemophilia. Subjects could have been receiving any treatment for their hemophilia. Primary and secondary prophylaxis was defined according to published consensus definitions (21). Boys within 2 weeks of an acute bleed or with an uncontrolled high-titer inhibitor (antibodies directed against exogenous clotting factor that may impair the action of infused clotting factor; where, in the judgment of the treating hematologist, participation may have posed a high risk of joint bleeding) or those with comorbid illnesses such as juvenile arthritis, muscular dystrophy, neurologic illness/cognitive impairment, or other illnesses that may independently affect the HJHS score were specifically excluded.
At each subject's study visit a number of measures were applied by the study physiotherapist and by the site physician investigator at each site, blinded to each other's findings.
The study physiotherapist at each site examined the subject and scored the HJHS (version 1.0) (20). The HJHS is a structured physical examination score evaluating the 6 joints most affected by hemophilia (so-called “index joints”); specifically, the elbows, knees, and ankles. The score for each joint is the sum of the individual item scores (see Supplementary Appendix A, available in the online version of this article at http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2151-4658), and the overall total score (range 0–148) is the sum of the 6 joint scores. A normal evaluation in all 6 joints gives a score of 0–6. A score of 148 suggests the worst possible damage/impairment in all 6 joints.
From the physiotherapist examination worksheets, the WFH physical examination score was calculated.
Subjects or parents, as appropriate, completed the modified C-HAQ. The C-HAQ is a 30-item, self-report measure of physical disability designed for children of all ages. It was originally designed and validated for use in juvenile arthritis (22), but has subsequently been used and validated in many other arthropathic or myopathic conditions. A new (modified) version of the C-HAQ has been developed to enhance the ability to discriminate between children with musculoskeletal disability and controls. The modified C-HAQ has proved to be more sensitive than the C-HAQ in some situations, and does not have a ceiling effect (23). Possible scores range from −2 (worst) to 2 (best).
Physician global score of joint health.
The physician investigator at each site examined the subject, blind to the physiotherapy assessment, and scored the global joint health score. The global joint health scale was developed for this study in order to quantify the physicians' assessments and to be used as a comparator with the HJHS and WFH scales. It consists of a series of six 10-cm visual analog scales (VAS), one for each index joint (ankles, elbows, and knees), anchored by “no complaints, no findings” and “continuous pain, severe limitations, worst damage,” and one VAS for the overall assessment of arthropathy impact, anchored by “no impact on life” and “most severe impact on life.” Each joint yielded an individual score, and the average of the 6 joints yielded a total physician score. The overall physician arthropathy impact VAS yielded its own score.
The subjects were categorized, based on clotting factor activity, into 3 groups of hemophilia severity: mild (factor ≥0.05 IU/ml and <0.3 IU/ml), moderate (factor >0.01 IU/ml and <0.05 IU/ml), and severe (factor ≤0.01 IU/ml).
Estimated lifetime hemarthroses.
Parents and subjects were asked to estimate the total lifetime number of hemarthroses affecting each of the 6 index joints.
Each site's physical therapist had been involved in the development and reliability testing of the HJHS and demonstrated good knowledge and competence in administration of the examination and scoring of the HJHS. A training video and updated instructional manual were available for reference at each site. The study physicians were senior hematologists, highly experienced in hemophilia care, who participated in the development of the physician global score of joint health. The score comprised the subjective opinion of the treating physician; the score was administered without standardization.
To establish construct validity, a number of a priori hypotheses were established. Fulfillment of these hypotheses indicates good construct validity.
Correlation values of 0.6–0.79 were considered to indicate strong correlation, whereas values of 0.4–0.59 were considered moderate and values of 0.2–0.39 were considered weak (24).
Construct validity was considered to have been established if the correlation (Spearman's rho [rs]) between the HJHS and the physician joint health score was moderate for the individual joints, and for the total HJHS score compared to the physician total score and the physician arthropathy impact score. Because the physician global score of joint health had not been independently validated, we did not attempt to measure criterion validity. In addition, it was expected that the HJHS would be significantly higher in children with severe hemophilia than in those with mild or moderate disease (as assessed by the Kruskal-Wallis nonparametric analysis of variance), indicating discriminant construct validity, and that the HJHS would be correlated weakly to moderately with the number of hemarthroses (joint by joint, and total score when compared to bleeding in all 6 joints). Finally, we expected a weak to moderate correlation with the modified C-HAQ. Similar analyses were carried out with the WFH score.
In order to see if we could reduce the items of the HJHS by removing those items that contributed poorly to the assessment of the construct of joint impairment, we used exploratory factor analysis using varimax rotation. In addition, we examined the internal reliability using Cronbach's alpha and the item-total correlations. Redundant items were examined by correlations using rs. Finally, items getting non-zero scores in fewer than 15% of subjects were considered rarely endorsed.
We calculated that a sample size of 200 would allow us, with more than 95% power at an alpha level of 0.05, to show that an observed correlation of 0.6 (strong) is statistically different from a correlation of less than 0.4 (weak) (25). This sample size would also allow us to show (at a power of 80%) a difference in the HJHS scores between children with mild/moderate hemophilia and children with severe hemophilia of 0.4 to 0.5 SD (small to moderate effect size) (26). To allow for dropout/missing values, we aimed to enroll 225 subjects.
Two-hundred twenty-six boys with hemophilia were enrolled. The study cohort demographics are listed in Table 1. Total HJHS and WFH scores were highly skewed, and indicated that most boys had relatively preserved joint function (Figure 1). The mean ± SD modified C-HAQ score was 0.18 ± 0.47 (range −0.92 to 2.0), indicating good overall physical function.
Table 1. Demographic characteristics of the study cohort*
Severe (n = 153)
Moderate (n = 34)
Mild (n = 39)
Total (n = 226)
Values are the number (percentage) unless otherwise indicated.
Refers to bleeding into index joints (elbows, knees, or ankles).
The mean ± SD 6-joint total physician score of joint health was 2.19 ± 3.51 (range 0–21.2), suggesting mild to moderate impairment. Similarly, the mean ± SD overall impact of arthropathy, also scored by the study physicians, was 0.73 ± 1.29 (range 0–7.97), suggesting a mild arthropathy impact on average.
Moderate correlations were seen between the HJHS and WFH total scores and the total physician score of joint health, the overall impact of arthropathy, and total estimated lifetime hemarthroses (Table 2). No correlation was seen with the modified C-HAQ.
Table 2. Spearman's correlations between the total HJHS and WFH scores and other measures of joint health and impact*
Total physician joint health
Overall arthropathy impact
HJHS = Hemophilia Joint Health Score; WFH = World Federation of Hemophilia physical examination joint scale; C-HAQ = Childhood Health Assessment Questionnaire.
HJHS total score
WFH total score
Table 3 lists the Spearman's correlation coefficients between the HJHS and WFH individual joint scores and the estimated bleeding and global joint health as scored by the study physicians by joint. At the individual joint level, correlations were moderate to weak.
Table 3. Spearman's correlations between the joint-specific HJHS and WFH scores and other measures of joint health and impact*
HJHS = Hemophilia Joint Health Score; WFH = World Federation of Hemophilia physical examination joint scale.
HJHS and patient/parent estimated lifetime hemarthroses into index joints
HJHS and physician score of global joint health
WFH and patient/parent estimated lifetime hemarthroses into index joints
WFH and physician score of global joint health
Both the HJHS and WFH total scores differentiated severe from moderate and mild subjects (despite the fact that most severe patients were receiving prophylactic factor therapy). The median HJHS score for severe hemophilia subjects was 6 (interquartile range [IQR] 11), for moderate hemophilia was 4 (IQR 8), and for mild hemophilia was 3 (IQR 8). The median WFH joint score for severe hemophilia subjects was 6 (IQR 6), for moderate hemophilia was 5.5 (IQR 6), and for mild hemophilia was 4 (IQR 6). The HJHS (Kruskal-Wallis T = 11.42, P = 0.003) was approximately 97% more efficient at differentiating subjects by severity level than the WFH (Kruskal-Wallis T = 5.80, P = 0.06).
Both the HJHS and WFH score differentiated those subjects who were being treated or who had been treated with prophylaxis from those who were never treated with prophylaxis. As all of the subjects were included in this analysis, we expected those treated with prophylaxis (as this subgroup comprised mostly patients with severe hemophilia) to have higher joint scores; mild and moderate patients and those who rarely bleed are not usually treated with prophylaxis. The median HJHS score for subjects treated with prophylaxis was 6 (IQR 11), and for those never treated with prophylaxis was 3 (IQR 8). The median WFH score for subjects treated with prophylaxis was 6 (IQR 6), and for those never treated with prophylaxis was 4 (IQR 6). Likewise, the HJHS (Kruskal-Wallis T = 12.73, P = 0.0003) was approximately 74% more efficient at differentiating these subject groups than the WFH (Kruskal-Wallis T = 7.32, P = 0.007).
When only subjects with severe hemophilia were considered, the HJHS was 63% more efficient at differentiating patients treated with primary prophylaxis (median HJHS 5.0) from those treated with secondary prophylaxis (median 9.0) and from those treated on demand (median 11.5; Kruskal-Wallis T = 19.5, P = 0.00006) than the WFH (median score for primary prophylaxis 6.0, median score for secondary prophylaxis 7.0, median score for on demand 7.0; Kruskal-Wallis T = 12.0, P = 0.003).
Most items had non-zero scores for at least 20% of the subjects. Three items, however, were rarely endorsed. Joint pain was scored as 0 (normal) in 200 subjects (88.5%). Likewise, instability was scored as 0 in 223 subjects (98.7%), and axial alignment was scored as 0 in 225 subjects (99.6%).
Most items in the HJHS had a moderate correlation with each other. Two items showed a high degree of redundancy; global gait and joint gait score were very highly correlated (rs = 0.99).
Overall, the HJHS items had a high degree of internal reliability (Cronbach's α = 0.82). Item-total correlations, however, showed that 3 items, i.e., axial alignment, instability, and joint pain, were poorly correlated with the rest of the score (Table 4). This suggests that these items are less well related to the construct of joint impairment. When these 3 items were removed, the reliability was marginally increased (Cronbach's α = 0.84).
The loadings from an exploratory factor analysis are listed in Table 5. A 4-factor model explained 52% of the variance, and was sufficient to explain the data (test of fit χ2 [24df] = 31.9, P = 0.13). Three items had a high, or very high, uniqueness coefficient (axial alignment = 0.99, instability = 0.71, joint pain = 0.60). The first factor loaded on global gait, joint gait scores, and strength, and might represent the concept of “functional impairment.” The second factor loaded on crepitus, duration of swelling, swelling, and muscle atrophy; it might represent the concept of “synovitis.” The third factor loaded on extension loss, flexion loss, and instability; it might represent the concept of “joint damage.” The final factor accounted for 6% of the variance and loaded only on joint pain, and therefore might represent the concept of “arthralgia.”
In a cohort of 226 boys with hemophilia from 5 academic hemophilia treatment centers, we found that the HJHS is a valid tool with high internal reliability. Moreover, the HJHS was significantly more sensitive than the older WFH joint scale when differentiating groups of patients mostly treated with prophylaxis.
Our study cohort, while typical of children treated with prophylaxis in the developed world, is likely very different from groups of children treated with on-demand therapy in the past or in the developing world. This was a reasonable group to study since the HJHS was designed specifically to be more sensitive for mild disease in intensively treated boys (19, 20, 27). However, we are unable to judge from our study if the HJHS offers any advantages for developing countries or for adult patients with more severe joint damage.
The HJHS and the WFH physical examination score showed excellent convergent construct validity because all of the hypothesized correlations held true, with the exception of the correlation with the modified C-HAQ. Perhaps our hypothesis of a small to moderate correlation between joint impairment and activity limitation (as measured by the modified C-HAQ) (18) was incorrect. However, given the very normal modified C-HAQ scores in our sample, it is more likely that the correlations were poor because the modified C-HAQ is not an accurate measure of activity limitation for individuals with hemophilia treated mostly with prophylaxis (because the impairment level of the study children, if any, was below the level of test detection). For example, the modified C-HAQ is highly weighted toward upper extremity tasks, whereas in persons with hemophilia, the lower extremity joints are most often affected. Correlations between the HJHS and WFH scores and the other measures were, in general, higher for the ankle joints. This is likely because the ankle joints were more often involved in our subjects, had higher average rates of bleeding, and had higher examination scores (data not shown). It is possible that in populations with more joint disease, knees and elbows will also have stronger correlations between the HJHS and WFH scores and other measures of joint bleeding and severity.
Moreover, the HJHS and the WFH score showed excellent construct validity in their ability to discriminate known groups, as both tools could clearly separate children with severe hemophilia from mild to moderate hemophilia and children receiving prophylaxis from those receiving on-demand therapy. The goal of prophylactic therapy is to change the bleeding phenotype of severe hemophilia to be more like moderate hemophilia; the ability of the HJHS and WFH scores to discriminate subjects suggests that these scores can be useful as outcome measures in clinical trials of prophylactic treatment strategies.
We had anticipated that the HJHS would be more sensitive to mild joint changes, and in fact, the HJHS was significantly more sensitive (as demonstrated by its superior discriminatory abilities in our cohort of mostly mildly affected subjects). The implications of this finding are that clinical studies that use the HJHS as an end point will require fewer subjects (smaller sample size) and joint impairment for individual patients will be recognized sooner.
Not all items seem to be necessary for future use of the HJHS. The HJHS measures gait globally and separately by providing a score for each joint thought to be contributing to altered gait. The global and joint-specific gait items are so highly correlated that only one item should be scored. Because attribution of gait alterations to a single joint is difficult, future versions of the HJHS should likely score only global gait. Three items were rarely endorsed and contributed little to the overall score: axial alignment, joint instability, and joint pain. These items, while clearly abnormal in severely damaged hemophilic joints, are perhaps not of great importance for mild to moderate hemophilia and for those treated with primary prophylaxis. If used for adult populations or in other settings where more severe joint disease is expected, these items may be more important. Alternatively, the way these items were scored may have been too insensitive for the level of joint disease seen in our sample. Future versions of the HJHS should consider dropping these items or finding better ways of scoring them; in fact, our group has produced version 2.1 based on our findings (see Supplementary Appendix B, available in the online version of this article at http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2151-4658).
Pain, in particular, is one of the most important morbidities of hemophilic arthropathy that intensive treatment seeks to prevent; the findings of the current study suggest that pain on physical examination may not be a predictable finding early in the course of hemophilic arthropathy. It is likely that subjective pain, measured by questioning the patient, is a more important indicator of early joint disease.
Our findings must be considered in the light of possible study limitations. Our study therapists were highly experienced and involved in the development of the HJHS. It is possible that others would not be able to replicate our findings; however, the HJHS has now been taught in workshops worldwide and appears to be easily used by physiotherapists with even limited experience in hemophilia. Also, our study physicians were not given any specific instruction or training for scoring global joint health. This may have led to imprecision; it is possible that both the HJHS and WFH scores would demonstrate even better validity than we have shown if we had trained our physician assessors. Finally, our results are only applicable to subjects studied in countries where there are widely available factor concentrates, and where the majority with severe hemophilia are treated with prophylaxis. It remains to be seen whether the HJHS offers advantages in developing countries or for adults with existing joint damage.
In conclusion, we have demonstrated excellent construct validity of both the HJHS and the older WFH physical examination joint score. Future versions of the HJHS may be improved by dropping or modifying certain items. The HJHS offers advantages of efficiency over the WFH joint score, and should be considered for use in clinical practice and for clinical studies.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Feldman had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Feldman, Funk, Bergstrom, Hilliard, van der Net, Engelbert, Petrini, van den Berg, Manco-Johnson.
Acquisition of data. Feldman, Funk, Bergstrom, Zourikian, Hilliard, van der Net, Engelbert, Petrini, van den Berg, Manco-Johnson, Rivard, Abad, Blanchette.
Analysis and interpretation of data. Feldman, Hilliard, van den Berg, Abad.
ROLE OF THE STUDY SPONSOR
The industry sponsors of the International Prophylaxis Study Group (IPSG; Bayer HealthCare LLC, Baxter BioScience, Wyeth Pharmaceuticals, CSL Behring, and Novo Nordisk) were not involved with writing of the manuscript. The content of the manuscript is that of the authors and is endorsed by the Steering Committee of the IPSG. Publication of this manuscript was not contingent on the approval of these sponsors.
We would like to express our thanks to the International Prophylaxis Study Group Steering Committee for their continuous scientific encouragement and advice: Drs. Victor S. Blanchette (Chair), Louis M. Aledort (Co-Chair), Rolf Ljung (Co-Chair), Brian M. Feldman, Georges E. Rivard, Marilyn Manco-Johnson, Pia Petrini, Marijke van den Berg, Wolfgang Schramm, and Alessandro Gringeri.