1. Top of page
  2. Introduction
  3. Acknowledgements

For the management of rheumatoid arthritis (RA), there is general agreement that rheumatoid inflammation should be controlled as soon as possible, as completely as possible, and for as long as possible, while assuring patient safety (1, 2). When accepting that the goal of RA treatment is to reach optimal control of inflammation, it is clear that rheumatoid inflammation should be evaluated continually. Then, the treatment program can be adjusted from perspectives of both benefit and harm (3). The combination of systematic evaluation and clinical guidelines could be a valuable decision support in optimizing the management of RA (4). The effects of such decision support should preferably be studied using a randomized controlled trial (RCT) design. In this article, a proposal is made for the design and analysis of an RCT to evaluate the effects of a clinical decision support system on physician performance and health outcome in the management of RA. The design of a cluster RCT, the choice of the relevant outcome and outcome measure, an approach for statistical analysis, and sample size calculations are emphasized. Solutions to prevent bias in this particular design and ethical considerations are described.

Choice of trial design.

In a classic RCT design, patients from single physicians are randomized to receive either an experimental or a control intervention, while patients and physicians are blind to treatment allocation. A cluster RCT design is appropriate when interventions, like guidelines or decision support, are primarily directed to the physicians instead of the patients. In that case, the physicians cannot be blinded. To avoid contamination, it is the physicians or practices that are randomized, and all their study patients (the cluster) will receive the same intervention (5). Then it is not appropriate to regard patients as independent. The degree of dependency of patients in clusters is indicated by the height of the intracluster correlation coefficient (ICC); an ICC equal to 0 indicates independence. The higher the ICC, the less unique information is contributed by the single patient and the more the power of the study is reduced. Therefore, it is necessary to correct for the effect of clustering in sample size calculations and analysis (6, 7).

Choice of the relevant outcome.

Decision support, such as guidelines, aims to persuade physicians to change their practice behavior (5). Therefore, when studying decision support, it is physician performance that is the outcome of interest (5). Concerning the management of RA, a measure of physician performance would thus reflect the adequacy of decisions on medication. However, it is difficult to use treatment decision as an outcome measure because in the management of RA visit frequency is not usually standardized and a large number of treatment options are available. Because guidelines may alter visit frequency, more opportunity for guideline adherence or nonadherence may be present in one of the trial arms, possibly leading to bias. Furthermore, with many treatment options available, judgment of the adequacy of treatment decisions is not straightforward, and single visits may not be judged in isolation. Because the aim of RA management is to control rheumatoid inflammation (1), the level of inflammation or the proportion of patients with adequately controlled inflammation could be used as the primary outcome. However, it must be noted that these are essentially measures of health outcome, and are not direct measures of physician performance. To explain possible changes in outcome, physician performance should be documented. Secondary outcomes that can be considered are toxicity, disability, joint damage, quality of life, satisfaction with care, resource use, and direct costs.

Choice of the primary outcome measure.

Multiple measures for the clinical assessment of rheumatoid inflammation are available, but all are approximating rather than measuring the underlying disease process. The Disease Activity Score in 28 joints (DAS28) is a well-validated measure, calculated from the results of erythrocyte sedimentation rate, swollen joint count, tender joint count, and a general health item (8). The DAS28 provides a single index reflecting the level of disease activity. It is a continuous measure with a Gaussian distribution, which is an advantage for analysis and power calculations. The proportion of patients with adequately controlled disease activity can also be derived, e.g., by using a dichotomy with a cutoff point of DAS28 ≤3.2 (8), but power may be lost by dichotomizing a continuous measure.

Statistical analysis.

The proportion of patients with rheumatoid inflammation under control is a simple combination of patients from 1 rheumatologist into a single summary statistic (5). Then, a 2-sample t-test with or without weighting for cluster size, an adjusted chi-square test, or computation of an odds ratio with adjusted confidence intervals can be carried out (5, 9). Similarly, the mean change in DAS28 per practice can be used as a summary statistic, and the 2-sample t-test with or without weighting for cluster size can be performed (5, 9). The rheumatologist, or practice, was the unit of randomization and is also the unit of analysis in these techniques, with the advantage of being relatively simple.

Alternatively, for dichotomous (low or high DAS28) as well as for continuous (change in DAS28) outcomes, multilevel analysis (MLA) can be carried out (9). MLA is not performed on the cluster level (rheumatologist or practice) but on the individual patient level, while correcting for the dependency within clusters. MLA is quite a complex technique. However, its advantage is, as with multiple regression, that it is much easier to correct for differences in comparability between trial arms and that, for example, a correction for the level of DAS28 at baseline can be made (9).

Sample size when the outcome is dichotomous.

Ignoring the clustering of the data, the sample size per trial arm was calculated with the usual level for α = 0.05, and a power (1–β) of 0.90, using n = 2(Zα/2 + Zβ)2 p (1–p)/(p1–p0)2; see ref. 10 for an introduction. Zα/2 is the Z value on the standard normal distribution corresponding with the chance α to find a difference when in truth none exists; if α = 0.05 (2-sided) then Zα/2 = 1.96. Zβ is the Z value on the standard normal distribution corresponding with the chance β of not finding a difference, when in truth it does exists; if β = 0.10 then Zβ = 1.28. The proportion of patients with disease activity under control at the end of the trial in experimental group and control group are denoted as p1 and p0, and p is the pooled proportion of patients with disease activity under control (p1 + p0)/2.

A sample resembling the target population was taken from an existing cohort (11); it consisted of 570 RA patients and 50 rheumatologists (a mean of 11.4 RA patients per rheumatologist). The proportion of patients with a DAS28 ≤3.2 was 20%. It was hypothesized that it was a manageable target to reach low disease activity (DAS28 ≤ 3.2) in 50% of the RA patients in the experimental group, p1 was then defined as 0.50 and p0 as 0.20. Accordingly, the number of patients needed per trial arm was calculated as 53.

It was already indicated that the clustered nature of data reduces power. Therefore, sample size has to be increased with a factor known as the design effect (DE). The DE is the ratio of sample size with adjustment for clustering to sample size without adjustment for clustering, and can be calculated using DE = 1 + (m–1)ρ, where m is the average cluster size and ρ denotes the ICC. The ICC was calculated using the data from our cohort sample (11). An ICC of ρ = 0.13 was obtained, calculated as ρ = σinline image–σinline imageinline image + (m–1)σinline image, where σinline image is the between-cluster variance, σinline image is the within-cluster variance, and m represents the average cluster size (9). The resulting design effect is DE = 1 + (11.4–1) × 0.13 = 2.35. The sample size needed follows from multiplying the unadjusted sample size with the DE: 53 × 2.35 = 125 patients per treatment arm. For our cohort sample, this would mean that (125/11.4) 11 rheumatologists would be needed. Calculation of DE assumes equal cluster size. However, cluster sizes (practice sizes) usually are unequal. Then again, power is lost and the DE calculated is too small (7). A correction should be applied, for which the use of minimum variance weights is recommended (7), using

  • equation image

where m is the average cluster size, M is the number of clusters, and mi is the observed cluster size. This increases the design effect to DE = 2.53, leading to a sample size per treatment arm of 134 patients requiring 12 rheumatologists.

Sample size when the outcome is continuous.

Again, sample size needed per trial arm was first calculated ignoring the effect of clustering, with α = 0.05 and 1–β = 0.90 using n = 2(Zα/2 + Zβ)2 σ2/(m1–m0)2. The standard deviation for the DAS28 (σ = 1.48) was again estimated from our cohort sample (11). A relevant difference between mean group changes (m1–m0) of 1.2 in the DAS28 was chosen (8). Because the mean DAS28 in the cohort sample was 4.4, it can be expected that with a mean improvement of 1.2 in the intervention group, approximately 50% of those patients will have low disease activity (DAS28 ≤ 3.2) (8). Accordingly, the number of patients per treatment arm was calculated as 32. The ICC was again calculated from our cohort sample, according to ρ = σinline image/(σinline imageinline image) (12), resulting in an ICC of 0.25. The design effect was calculated as DE = 1 + (11.4–1)0.25 = 3.60. The resulting sample size per trial arm is 32 × 3.6 = 115 patients, requiring 11 rheumatologists. The design effect adjusted for unequal cluster size was calculated as DE = 3.79 (7). The sample size has to be increased to 121 patients per trial arm, needing 11 rheumatologists.

Choice of the interventions.

An experimental intervention could be a computerized clinical decision support system (CDSS), allowing for the systematic evaluation of rheumatoid inflammation and the provision of clinical guidelines. A threat for the validity of the study is nonuse of the CDSS by the rheumatologists, which can especially occur when rheumatologists treat only a small number of RA patients. Adherence can be enhanced by raising familiarity and agreement (13). The CDSS must be practical, include feasible measurements, and produce immediate and meaningful results. It is important to include a proficiency-building period for CDSS use and training in the study.

A control intervention ideally should resemble usual care. However, it can be difficult to control for contamination, as it can easily occur by information in literature, by exchange between rheumatologists, or by the informed consent procedure.

Recruitment of rheumatologists and patients.

The study intervention requires from rheumatologists a high degree of participation and willingness to change practice style. The influence of losing clusters (e.g., by recruitment failure or by dropout) on power can be large if the number of clusters is small (e.g., 20) (6). Thus, appropriate recruitment strategy, adequate selection criteria for rheumatologists, and ongoing motivation by the study management are needed. Because rheumatologists will play an important role in patient recruitment, it is important to use clear patient-selection criteria. If randomization is stratified using patient variables (e.g., level of disease activity and disease duration), patient recruitment has to take place before randomization. That may also be helpful in the prevention of selection bias because rheumatologists randomized to usual care might be less motivated to recruit patients or might recruit patients with a different profile (14).


With complex interventions, such as decision support, that are directed to the physicians instead of the patients, it is impossible to keep physicians blinded. It is also difficult to keep patients blinded, but an effort must be made to keep patients naive to outcome expectancy. In concordance with clinical practice, the assessments of disease activity in the CDSS trial arm, such as the DAS28, can be performed by rheumatologists not blinded to study group. For statistical analysis of this kind of RCT, however, only assessments can be used that are made by independent blinded assessors, e.g., nurse practitioners. The assessment timeframe should be identical for both trial arms.


Disease-modifying antirheumatic drugs (DMARDs) are the therapy of choice in RA. For the evaluation of the efficacy of DMARDs, a followup duration of 12 months is generally seen as adequate. Clear reductions in inflammatory activity can be expected within 3–6 months for the DMARDs now available (2). The primary outcome of CDSS can therefore be analyzed at 6 or 12 months. When the course of disease activity over time is also of interest, it is advisable to perform blinded outcome assessments at least every 3 months.

Ethical considerations.

In the proposed cluster RCT, treatment options are randomized and patients are individually treated and assessed. It is necessary to obtain informed consent from each individual patient (15). Only in the special case that an intervention cannot be targeted at an individual but only at the cluster as a whole (e.g., special medical education) while outcome is on practice level (e.g., number of adequate referrals), consent may be obtained only from the person responsible for the cluster's wellbeing (e.g., a rheumatologist) (15).


A cluster RCT is the most appropriate design for studying the effects of CDSS in the management of RA. Using the proportion of patients with a DAS28 ≤ 3.2 as outcome, it was estimated that a sample of 268 RA patients and 24 rheumatologists would be needed, which is 2.5 times larger than when the clustering effect is ignored. Multilevel analysis may be a useful approach for statistical analysis.

Generally when studying decision support, such as guidelines, it is not necessary to study the effect on health (5). If included guidelines are based on sound evidence, it is already known that targeted behavior will be beneficial (5). As an example, efficacy of guidelines to improve folate supplementation in addition to methotrexate can be evaluated by simply counting the number of correct prescriptions. In the case of CDSS in RA, physician performance is more difficult to measure. Therefore, the proportion of patients with low disease activity (DAS28 ≤ 3.2) was proposed to approximate physician performance. Because there is no gold standard to measure RA disease activity, other outcomes can be considered, e.g., the proportion of patients in remission, the proportion of responders, change in disease activity, or time-integrated disease activity. Furthermore, depending on the target population, other relevant differences and distributions may be chosen in the formulas for power calculation, leading to other sample size estimations. Also, the magnitude of the ICC may be different for other populations and measures. Most of the time, power is lost when dichotomizing a continuous measure and the estimated sample size will therefore increase. In our example, no large difference in sample size between use of the DAS28 as a continuous measure or as a dichotomous one appeared. However, when clustering was ignored initially, the required sample sizes (n = 32 versus n = 53) showed that, indeed, power was lost when dichotomizing the continuous DAS28. The estimated sample sizes became more similar when corrected for the design effect, due to differences in the height of the observed ICCs.

In the past, consequences of clustered data have largely been ignored in clinical studies, and as a result, many published studies are underpowered (16). Currently, cluster RCTs are being used more often, especially in public health and general practice research. Practitioners and researchers in rheumatology also should be aware of the consequences of clustering for trial design and analysis. The methodologic considerations described in this article are applicable to similar research objectives in rheumatology.


  1. Top of page
  2. Introduction
  3. Acknowledgements

The authors thank Paco Welsing, MSc, for critical reading of an earlier version of the manuscript.


  1. Top of page
  2. Introduction
  3. Acknowledgements