- Top of page
- PATIENTS AND METHODS
Systemic sclerosis (SSc; scleroderma) is a multisystem disease characterized by cutaneous and visceral fibrosis. Skin disease is both a disabling feature of SSc and a predictor of visceral involvement and increased mortality (1, 2); improvement in skin disease correlates with improved survival (3). After an initial period of induration, the dermis becomes infiltrated with collagen and becomes both harder and thicker. Additionally, subdermal connective tissue sclerosis leads to dermal tethering and limited skin mobility (4).
Skin involvement in SSc is currently measured semiquantitatively using the modified Rodnan skin score (MRSS), a summation of physical examination ratings over 17 skin sites (1, 5). Skin scores from the forearm correlate with weight of skin punch biopsy scores (5). Limitations of MRSS include the potential for observer bias; intra- and interobserver variability of 12% and 25%, respectively (6, 7); the need for investigator training (8); varying degrees of examiner experience (8); and uncertainty about the sensitivity of MRSS to change over time (9). Furthermore, although the skin score ostensibly measures skin thickness, it is not clear that this technique differentiates skin thickness from either skin hardness or skin tethering. Despite these limitations, MRSS is the current gold standard measure of skin disease for use in clinical trials in SSc (9–12) and is, at present, the only fully validated skin outcome measure for clinical trials (13). The development of more objective, precise, and sensitive measures of skin disease would allow for smaller sample sizes and enhanced detection of effective therapies in clinical trials.
Durometers are digital, hand-held, spring-loaded devices that measure hardness by applying an indentation load on surfaces (14). Measurements can be performed in seconds without risk or discomfort. Skin hardness may be affected by skin thickness as well as skin density, elasticity, and edema. Prior studies in SSc have found a correlation between clinical skin scores and durometry readings (14, 15). Other studies have found that normal skin hardness as measured by a durometer is relatively constant between the ages of 15 and 65 years (15) and does not correlate with body weight (14). Durometry may be a useful method for the objective measurement of skin involvement in SSc. The current study examined the reliability, construct validity, and sensitivity to change in skin disease of durometry compared with MRSS.
PATIENTS AND METHODS
- Top of page
- PATIENTS AND METHODS
Durometer measurements were made in the context of 3 separate experiments, as described below, with patients who fulfilled the American College of Rheumatology (formerly the American Rheumatism Association) criteria for SSc (16). Patients and controls were recruited from the Boston University Scleroderma Program, and no subjects were included in more than one experimental cohort. Controls had no history of skin disease and were not being treated with systemic or local glucocorticoids. Experienced physicians trained in performing MRSS at an expert SSc center examined subjects in the Boston University General Clinical Research Center. The institutional review board of the Boston University Medical Center approved all experimental protocols followed in this study.
A Latin square experiment was performed to determine the intra- and interobserver reliability of durometer measurements (17), and to provide convergent construct validity by correlating these measurements with MRSS, the conventional measurement of this skin disease. Five physicians experienced in MRSS technique examined 5 patients with diffuse cutaneous SSc (3 women, 2 men, age range 28–64 years) (Table 1) and 1 healthy control (1 man, age 28 years) in a 1-day exercise. Four durometer measurements of 9 predetermined skin sites were obtained (body areas 3–7) (Table 2). MRSS measured skin thickness semiquantitatively (0–3 ordinal scale) over 17 sites. Physicians repeated their measurements on the initial 2 of the 5 patients at the end of the exercise.
Table 1. Clinical data of study patients with systemic sclerosis in the 3 study cohorts
|Age, median (range) years||52 (28–64)||52 (33–68)||51 (32–70)|
|Skin disease, limited:diffuse||0:5||2:11||10:20|
|Disease duration, median (range) months||35 (12–50)||18 (3–76)||30 (6–336)|
Table 2. Landmark sites of skin measurements in study cohorts
|Body area||Landmark site||Latin square cohort||Longitudinal cohort||Ultrasound cohort|
|1. Fingers||Dorsum of mid 3rd proximal phalanx|| || ||X|
|2. Hands||Dorsal aspect of hands; between 2nd and 3rd metacarpals, 2 cm proximal to metacarpal-phalangeal joints|| || ||X|
|3. Forearms||Dorsal aspect of forearms; midway between the elbow and wrist||X||X||X|
|4. Upper arms||Dorsal aspect of upper arms; midway between the acromion and elbow||X||X||X|
|5. Thighs||Dorsal aspects of the thigh; midway between the anterior superior iliac crest and the superior pole of the patella||X||X|| |
|6. Legs||Ventral aspect of legs; 5 cm distal to the popliteal fossa||X||X|| |
|7. Abdomen||5 cm right of the umbilicus||X||X|| |
A longitudinal cohort comprised 13 patients with SSc (8 women, 5 men; 11 with diffuse and 2 with limited skin disease ; median duration of disease 18 months [range 3–76 months]; age range 33–68 years) (Table 1) and 5 controls (3 women, 2 men; age range 26–46 years). These subjects were assessed by repeated durometry and skin scoring over the same skin sites as the Latin square group (body areas 3–7) (Table 2) 3–12 months apart to determine sensitivity to change.
An ultrasound cohort of 30 patients with SSc (25 women, 5 men; 20 with diffuse and 10 with limited skin disease; age range 32–70 years) (Table 1) and 12 healthy controls (9 women, 3 men; age range 27–60 years) underwent 1-time measurements of skin hardness by durometry, skin thickness by ultrasonography (Titan, 10-MHz linear array transducer; SonoSite, Bothell, WA), and clinical skin score over the upper extremities (body areas 1–4) (Table 2) to provide further construct validity for durometry.
Skin hardness was measured using a hand-held digital durometer (Rex Gauge type OO; Rex Gauge, Buffalo Grove, IL) with a continuous scale to 1 decimal point. Measurements were made at predetermined landmark sites at the fingers, hands, forearms, upper arms, abdomen, thighs, and legs (Table 2). Measurements were made with the underlying muscles relaxed and the skin in a horizontal plane. Four consecutive durometry readings were taken at the same site and the results were averaged. Readings were obtained with 2 durometers. Rubber test blocks (Rex Gauge) with hardness similar to normal skin were used to ensure consistent calibration of the durometers through the duration of the study. Durometer measurements are expressed in standardized durometer units (DU).
In the Latin square experiment, inter- and intraobserver reliabilities of durometer and skin score measurements were assessed using intraclass correlation coefficients (ICCs), and correlations were made between 9-site durometer scoring (summation score of 9 body sites including forearms, upper arms, abdomen, thighs, and legs) and 9-site MRSS scoring (summation of same 9 sites as 9-site durometry). In the longitudinal cohort, 17-site MRSS (summation score of 17 body sites including fingers, hands, forearms, upper arms, face, chest, abdomen, thighs, legs, and feet) was correlated with 9-site durometer scoring. In the ultrasound cohort, associations among durometry, ultrasound, and clinical measurements were assessed by Spearman's correlation. Interdurometer correlation of 9-site skin hardness measurements was assessed to determine interdurometer reliability.
- Top of page
- PATIENTS AND METHODS
Reliability of durometry testing was assessed in the Latin square experiment. Comparison of intra- and interobserver reliability between durometry and clinical skin score is presented in Table 3. Overall, intraobserver ICC for durometry was higher than for clinical skin score (0.97 versus 0.85). Durometry reliability was high in all measured body sites (range 0.86–0.94). Reliability of clinical skin scoring was high in the legs (0.97) and forearms (0.79), but was only moderate in the upper arms (0.68), abdomen (0.67), and thighs (0.60). Interobserver variability for durometry scoring and clinical skin scoring was similar (0.75 versus 0.73). Interobserver variability of durometry was good for all body areas (0.61–0.85), but interobserver variability for skin scoring was poor in the abdomen (0.08), feet (0.09), and the fingers (0.27) and was moderate in the legs (0.51).
Table 3. Comparison of reliability between durometry and clinical skin scoring*
| ||Intraobserver ICC||Interobserver ICC||Correlation between durometer and MRSS|
Over the 30 months of the longitudinal study, test block values ranged from 14 DU to 22 DU for one durometer and 17 DU to 27 DU for the other (mean ± SD 18 ± 2 versus 20 ± 2). There was no significant drift in test block values over the course of the study (r = 0.089). One durometer produced measurements consistently higher than the other durometer by an average of 11%; however, there was a high correlation between the 2 durometers' readings (r = 0.84, P = 0.0013) and a high correlation between changes in durometer measurements over time (r = 0.93, P = 0.0002).
Measurements from the initial visits in the longitudinal and ultrasound cohorts were combined to establish the ranges of durometer measurements at skin sites in healthy controls and patients with SSc (Figure 1). Generally, proximal skin regions were softer than distal skin regions. Skin hardness for each measured site ranged from 0 to 41 DU for normal controls, 4 to 39 DU for “uninvolved” skin in patients with SSc, and 12 to 70 DU for involved skin in patients with SSc. Mean ± SE skin site hardness increased from 23 ± 7 DU for uninvolved skin to 33 ± 9 DU for skin score 1, 44 ± 9 DU for skin score 2, and 48 ± 10 DU for skin score 3.
Figure 1. Durometer measurements of skin hardness at various body sites in healthy controls and patients with systemic sclerosis; data are subdivided by clinical skin score. While skin became progressively harder with increasing clinical skin scores, for each clinical skin score there was a wide range of skin hardness measurements. Solid diamonds = hand; solid squares = forearm; solid triangles = upper arm; open diamonds = leg; open squares = thigh; open circles = abdomen; solid bars = mean.
Download figure to PowerPoint
When controlled for skin score, skin from patients with diffuse cutaneous disease was not significantly different from those with limited cutaneous disease in terms of either thickness or hardness. Disease duration did not significantly correlate with either total clinical skin score or total durometer-measured skin hardness. Between men and women with SSc, there were no significant differences in average clinical skin score, skin thickness by ultrasound, or skin hardness by durometry.
There was a wide range of durometer scores among patients with the same clinical skin score at each body site (Figure 1). For example, among patients with forearm skin scores of 2, durometer scores ranged from 31 DU to 58 DU. Thus, a skin site could become harder without necessarily progressing to the next skin score rating. To assess for a floor effect of durometry, skin hardness in healthy controls was compared with measurements of clinically uninvolved skin (skin score 0) in patients with SSc. The uninvolved skin was harder than control skin: mean ± SD 23 ± 7 DU (n = 136) versus 19 ± 6 DU (n = 123; P < 0.0001).
In the longitudinal cohort data, 9-site total durometry scores among healthy controls ranged from 147 DU to 212 DU over the course of the study, whereas in patients with SSc scores ranged from 159 DU to 385 DU. Only those patients with a total MRSS-17 skin score <5 fell within the normal durometry range.
Construct validity was evaluated by correlating durometry scores with clinical skin scores in the 3 study cohorts. In the Latin square exercise, correlation between total (9-site) durometer and total (17-site) skin score was 0.44 (P = 0.03). Site-specific correlation was best at the forearm (r = 0.56, P < 0.01) and worst at the abdomen (r = 0.20, P = 0.34) (Table 3). The poor correlation at the abdomen may be a result of poor clinical skin scoring characteristics with an interobserver ICC of only 0.077. In the longitudinal cohort, there was a strong correlation between the 17-site total clinical skin score and the 9-site total durometry score (r = 0.81, n = 13, P = 0.0008). In the ultrasound cohort, correlations between durometer and skin scores were also high in the hands, forearms, and upper arms, but not in the fingers (Table 4). Average skin hardness increased with increasing skin scores for all regions studied other than fingers (Figure 2).
Table 4. Correlations among ultrasound-measured skin thickness, durometry-measured skin hardness, and clinical skin score at 4 skin sites in the ultrasound cohort
| ||Ultrasound/durometer||Ultrasound/skin score||Durometer/skin score|
Figure 2. Comparison between clinical skin scores and mean durometer scores (hardness) by body site in the ultrasound cohort. There were no patients with systemic sclerosis with a finger skin score of 0.
Download figure to PowerPoint
Comparison of durometry with ultrasound-measured skin thickness further assessed construct validity. Correlation between ultrasound-measured skin thickness and durometry-measured skin hardness was high in the hands, forearms, and upper arms (r = 0.40–0.63, P = 0.001) but was low in the fingers (r = 0.18, P = not significant) (Table 4, Figure 3). The correlation of durometry with skin scoring and ultrasound with skin scoring was, in general, greater than the correlation of durometry with ultrasound-measured skin thickness (Table 4). Average ultrasound-measured skin thickness increased with skin scores for all regions other than fingers.
Figure 3. Correlation between durometry and ultrasound-measured skin thickness at 4 skin sites. There was a high correlation between measurements of skin hardness with durometry and skin thickness with ultrasound in the hands, forearms, and upper arms, but not in the fingers.
Download figure to PowerPoint
The sensitivity to change of durometry was evaluated by serial measurements over 3–12 months in the longitudinal cohort. During the observation period, clinical 17-site skin score increased by ≥5 points in 3 patients, decreased by ≥5 points in 4 patients, and did not significantly change in the remaining 6 patients. There was a strong correlation between the change in 17-site skin score and the change in 9-site durometry score over time (r = 0.77, n = 13, P = 0.002).
- Top of page
- PATIENTS AND METHODS
The current study demonstrated that a hand-held durometer is a valid measure of skin disease in patients with SSc, may offer advantages over MRSS, and should be considered for use in future clinical trials. A new, validated outcome measure for use in clinical trials of SSc would be welcomed by investigators who recognize the need for more reliable assessment instruments (13). The potential increased precision gained through use of a durometer compared with MRSS may help decrease the number of patients needed to detect therapeutic efficacy, an important factor because clinical testing of new therapeutic agents for skin disease in SSc is limited by the rarity of disease. Our data also indicate that investigators can attain an interobserver reliability similar to that of skin scoring after only 10 minutes of instruction, and therefore a greater number of clinicians can more easily be involved in the assessment of skin disease, thereby simplifying the logistics of clinical trials. By comparison, the validity and reproducibility of skin scoring drops considerably when performed by untrained practitioners (8). Additionally, measurement of skin hardness by durometry may help improve the evaluation of therapeutic agents by providing greater intraobserver reliability than physical examination skin scoring. Furthermore, durometry provides an objective measurement that is less susceptible to examiner bias, thus making nonblinded trials more useful. Durometers are portable measurements that fit in a coat pocket, take only a few seconds each, and are completely painless; these features increase both investigators' and subjects' willingness to accept durometry as an outcome measurement.
Because MRSS involves only 3 ordinal levels of scoring at each site, smaller but perceptible changes in skin disease may not be detected by MRSS in some patients with SSc. In contrast, durometers measure hardness on a continuous scale at each site, allowing for detection of smaller changes. Our data suggest that durometry offers greater dynamic range in skin disease assessment compared with MRSS. For each level of clinical skin score, we found a wide range in durometer readings that increased with increasing skin scores. The greater dynamic range of durometry may also help counter the ceiling and floor effects of skin scoring for each skin site. It has previously been demonstrated that changes in skin properties in patients with SSc, such as abnormal endothelial activation and procollagen production, occur prior to clinical detection (18). The current study demonstrates higher durometer scores in clinically uninvolved skin compared with healthy control skin, further suggesting that earlier detection of skin involvement can be achieved with durometry compared with MRSS. In addition, durometry is sensitive to alterations in skin hardness. Change in skin hardness correlates with that in skin score, suggesting applicability to detection of effective therapies in clinical trials.
Because changes in skin hardness and thickness occur in tandem, the correlation between durometer measurements and both clinical skin score and ultrasound-measured skin thickness is not surprising and helps provide construct validity for durometry as an outcomes measurement. However, the correlations among durometry, clinical skin score, and ultrasound are not expected to be perfect because these 3 methods mostly measure different skin properties. Because durometers are designed to measure hardness, they may not directly provide information about other aberrant skin properties in SSc such as thickness or skin tethering. Therefore, durometry should be considered for use in conjunction with other outcome measurements suited for monitoring other skin properties such as clinical skin scoring and skin ultrasound.
Several logistic issues are important to consider when using durometers to measure skin disease in patients with SSc. First, not all body areas are ideal for durometry. Underlying bone confounds durometer measurements over bony prominences as noted by other investigators (14). Therefore, we did not find a correlation between durometry and other skin measures at the fingers. However, we also found a low interrater reliability of clinical skin scoring for the fingers, and skin score in the fingers rarely changes during the course of a trial, thus making measurements at this site less valid than other areas (19).
Variability in durometer readings may result from not allowing the entire weight of the durometer to bear down on the bevel, not holding the durometer perpendicular to the plane of the skin site, or not positioning the skin horizontally. Additionally, because differences in absolute hardness reading exist among durometers (11% in our study), 2 durometers should possibly be used at any 1 center for longitudinal measurements to avoid losing data in cases of instrument malfunction. Durometers should be checked for malfunction with rubber test blocks regularly, and substantial changes in test block values warrant recalibration of the instrument by the manufacturing company.
There are several strengths of the current study. We confirmed findings in a number of different patient cohorts. Durometry was validated against 2 other measures of skin disease: MRSS (the current standard in skin disease assessment) and ultrasonography (an objective tool for evaluation of skin thickness that has previously been found to be reliable and to correlate with clinical skin involvement [20, 21]). Durometry was validated against skin scoring performed by clinicians experienced in the use of MRSS in multicenter clinical trials at a scleroderma center, where similar inter- and intraobserver intraclass coefficients for skin scoring reliability were found, as had been previously reported (22, 23).
There are also several limitations of the current study. Our data stem from a single center and should be confirmed by other investigators. The study was not performed in the context of a clinical therapeutic trial, therefore detection of therapeutic efficacy could not be evaluated; durometry is currently being assessed in the context of 2 therapeutic trials. The followup time for skin evaluation in the current study was relatively short, therefore there were few substantial changes in skin disease during the study. However, the time under observation was at least as long as the usual time in clinical therapeutic trials.
Because durometry is highly reproducible, is easily performed, correlates well with the established skin outcome measure MRSS, and provides specific, objective, continuous, and quantitative information about skin hardness, its use should be considered in evaluation of skin disease in clinical trials of SSc as a complementary measure with the MRSS. Further studies will assess the validity and usefulness of durometry skin measurements in therapeutic trials.