SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

Objective

To assess the impact, in terms of statistical power and bias of treatment effect, of approaches to dealing with missing data in randomized controlled trials of rheumatoid arthritis with radiographic outcomes.

Methods

We performed a simulation study. The missingness mechanisms we investigated copied the process of withdrawal from trials due to lack of efficacy. We compared 3 methods of managing missing data: all available data (case-complete), last observation carried forward (LOCF), and multiple imputation. Data were then analyzed by classic t-test (comparing the mean absolute change between baseline and final visit) or F test (estimation of treatment effect with repeated measurements by a linear mixed-effects model).

Results

With a missing data rate close to 15%, the treatment effect was underestimated by 18% as estimated by a linear mixed-effects model with a multiple imputation approach to missing data. This bias was lower than that obtained with the case-complete approach (−25%) or LOCF approach (−35%). This statistical approach (combination of multiple imputation and mixed-effects analysis) was moreover associated with a power of 70% (for a 90% nominal level), whereas LOCF was associated with a power of 55% and a case-complete power of 58%. Analysis with the t-test gave qualitatively equivalent but poorer quality results, except when multiple imputation was applied.

Conclusion

Our simulation study demonstrated multiple imputation, offering the smallest bias in treatment effect and the highest power. These results can help in planning trials, especially in choosing methods of imputation and data analysis.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

Rheumatoid arthritis (RA) is the most common chronic inflammatory joint disease and is responsible for symptomatic manifestations (e.g., functional status, pain) and structural damage (i.e., damage of the articular cartilage and bone) (1). Effective disease-modifying antirheumatic drugs are increasingly available as therapy (2). Assessing such treatments requires the measurement of structural outcomes in randomized controlled trials (RCTs) to demonstrate a retardation of disease progression. Radiographic outcomes are often used as primary end points for assessing structural severity (3–6).

Because retardation of structural damage in RCTs requires observation over time, followup of patients often necessitates intermediate visits, requiring more than 2 sessions of radiography in most trials (7). Specific methods such as linear mixed-effects models, which exploit the richness of the dynamics obtained with such longitudinal, or repeated, measurements, could be applied to estimate the treatment effect (8). Despite repeated measurements, calculating the mean change between baseline visit and end of the study in each group and comparing the mean change using the classic t-test (or Mann-Whitney test for nonparametric comparisons) remains the standard analysis.

The intent-to-treat (ITT) principle is the cornerstone of RCTs (9–11) and is widely recommended to demonstrate the superiority of one treatment over another (12, 13). However, few researchers use this principle in analyzing their data (14, 15), particularly in trials evaluating radiographic outcomes in RA (16). The ITT principle requires that all patients, whether their data are complete or incomplete, be included in the statistical analysis. Approximately two-thirds of RCTs of RA have a missing data rate greater than 10% for radiographic outcomes (16), and researchers must use methods for dealing with missing data to apply the ITT principle.

In trials involving longitudinal measurements of radiographic outcome, missing data can result from a lack of efficacy or adverse events, for example. When data are incomplete, results of the trial can be affected in 2 major ways. First, missing data can result in a bias of treatment effect estimates. For example, patients experiencing greater deterioration in structural damage may be less likely to complete the visits. If missing data are ignored and analyses are based on only the data of patients who are doing well, then the disease progression could be underestimated (17). Second, missing data can result in a loss of statistical power (i.e., the ability of the trial to detect a difference between groups) if data for some patients are excluded from the analysis (17).

Several methods exist to adjust for missing data (18). However, conclusions of trials (i.e., superiority of one treatment over another or not?) and treatment effect may be affected by the method used to handle missing data. Our goal was to compare the impact of different approaches chosen to deal with missing data under a scenario that mimics trials of RA with a radiographic outcome. We performed a simulation study. Such studies, increasingly common in the medical literature, are used to assess the performance of statistical methods in relation to the known truth (19). We compared approaches to handling missing data in terms of statistical power and magnitude of bias introduced by missing data on treatment effect.

MATERIALS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

The underlying clinical trial.

We conducted a simulation trial based on a 2-armed RCT resembling RA trials with a control group and an experimental group. The trial had a 2-year duration with 3 time points of measurement, 1 year apart, including baseline (7).

The primary end point was the Sharp/van der Heijde score (20, 21), a semiquantitative radiologic measure recommended as 1 of the 2 possible primary end points in evaluating structural damage (22). This score, ranging from 0 to 448, assesses erosions and joint space narrowing separately in the hands and feet. Thirty-two joints in the hands and 12 in the feet are scored for erosions, with a maximum of 5 erosions per joint in the hands and 10 in the feet. Joint space narrowing was graded from 0 to 4 in 30 joints in the hands and in 12 joints in the feet. The Sharp/van der Heijde score is the sum of the erosion and joint space narrowing scores.

Simulations of longitudinal measurements involved use of a linear mixed-effects model with a random intercept and slope. The individual intercept and slope were simulated for each patient, and radiographic data were simulated with a linear model from these individual parameters. All simulation values were chosen according to published data of the Trial of Etanercept and Methotrexate with Radiographic Patient Outcomes (TEMPO) study (23–25). We assumed that the baseline distribution of the radiographic score could be approximated by a log-normal distribution (mean ± SD 45 ± 45). The mean progression can be assumed to be linear (26), although the evolution of data for individual patients shows high variability (27). The slope was simulated by a normal distribution.

Simulations were performed under the alternative hypothesis and the null hypothesis. Under the alternative hypothesis, the slope and its standard deviation were assumed to be greater in the control group than in the experimental group (mean ± SD change over 2 years 3 ± 10 versus 0 ± 5), which reflects fewer benefits from treatment (i.e., greater deterioration of structural damage) in the control group. A sample size estimation for a 2-sided test of efficacy (t-test) resulted in a sample size of 150 patients per group to achieve a Type I error of 5% and a power of 90%. Under the null hypothesis, the mean change over 2 years in each group was the same (also with 150 patients per group).

Missingness mechanism.

After the complete data sets were simulated, patients' data were deleted according to a predefined missingness mechanism. We considered only missing data with a monotone pattern (i.e., data for a patient up to a certain time). We assumed that all baseline data were observed. Based on our previous literature review (16), we assumed a dropout rate of 15% at 2 years. Patients with disease progression greater than or lower than a defined limit between 2 occasions dropped out of the trial. The probability of dropout was 1) 2.5% if the slope between 2 successive visits was negative (i.e., improvement), 2) 5% if the slope was between 0 and 5 points (i.e., slight deterioration), and 3) 20% if the slope was >5 points (i.e., substantial deterioration) (scenario A). The limit of 5 points was chosen in accordance with published estimations of the minimal clinically important difference and the smallest detectable difference of the Sharp/van der Heijde score, which are very close (∼5 points) (21). In scenario A, the probability of a dropout was arbitrarily chosen to ensure a dropout rate of ∼15% at the final visit. In scenario B, the probability of a missing value was divided by 2 to achieve a dropout rate of ∼7.5%.

Statistical analysis strategies.

The 2-sided t-test was used to test the absolute change between the 2 groups (progression estimated by the change between the baseline and 2-year visit). A linear model with mixed effects for repeated measurements with random intercepts and slopes was also considered. This model takes into account intermediate measurements (not just 2 measurements). With missing data, this model uses all available measurements. Restricted maximum likelihood estimation was performed. In the model, the interaction of group by time of visit (i.e., the difference in slopes) was a fixed effect. An F test on this fixed effect was used to compare slopes.

Methods of managing missing data.

We considered 3 methods of managing missing data. The first 2 methods are traditionally used (16), and the third is promising. Of the first 2 methods, the case-complete analysis ignores the problem of missing data. When considering absolute change, the case-complete analysis affects only patients with complete data (i.e., complete data at baseline and final visits). In a linear mixed-effects model, the analysis refers to all available measurements. The second and most popular method of single imputation is the last observation carried forward (LOCF) method, whereby the last observation is carried forward and used for all missing observations at the remaining time points. The third method is multiple imputation (28, 29). Instead of filling in a single value for each missing value, this technique replaces each missing value of an incomplete data set with a set of plausible values (n = 5 in our study) that represent the uncertainty in the correct value to impute. The data augmentation Markov chain Monte Carlo replacement method is used. Then, each completed data set is analyzed by the analysis of choice, and results of imputed data sets are combined in a single analysis yielding point estimates and standard errors. Methods of managing missing data and data analysis strategies were combined (Table 1) and applied to the 2 scenarios, A and B, of missing data.

Table 1. Possible combinations of methods of managing missing data and data analysis strategies
ApproachManagement of missing dataData analysis strategiesTest
  • *

    All available data in the context of a linear mixed-effects model.

1Case-complete analysisAbsolute changet-test
2Case-complete analysis*Linear mixed-effects modelF test
3Last observation carried forwardAbsolute changet-test
4Last observation carried forwardLinear mixed-effects modelF test
5Multiple imputationAbsolute changet-test
6Multiple imputationLinear mixed-effects modelF test

Type I error, power, and bias

The Type I error and power of the t-test and F test under different approaches were computed. To estimate the empirical Type I error of each approach, the entire trial simulation was repeated 1,000 times under the null hypothesis. The empirical Type I error was calculated as the proportion of P values less than 0.05 from testing the null hypothesis of no difference in each simulated trial. Similarly, to estimate the empirical power of each approach, the entire trial simulation was repeated 1,000 times under the alternative hypothesis. The estimated power for each approach was the proportion of these 1,000 simulated trials showing statistically significant results (P < 0.05).

Under the alternative hypothesis (i.e., the hypothesis of a treatment effect), estimators of treatment effect were computed. Treatment effect was the difference in the absolute mean change between the 2 groups by t-test analysis or estimation of difference in slopes between the 2 groups by linear mixed-effects analysis. Averaging the estimates derived from the 1,000 simulated trials allowed for estimating the expected mean of the treatment effect. Then comparing this expected mean to the “true” value (used to simulate data) allowed for calculating the bias. Bias was expressed as relative bias (in percentage) and absolute bias (in Sharp/van der Heijde units).

Type I error, power, and bias were also computed before the missingness mechanism was applied (i.e., on complete data sets for which an “ideal” analysis could be performed) as a reference and to check the validity of simulations. Simulations of trials involved use of R 2.2.0 (R Foundation for Statistical Computing, Vienna, Austria). Management of missing data and data analysis involved SAS 9.1 (SAS Institute, Cary, NC).

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

Under the alternative hypothesis, the missingness mechanism provided a missing data rate at 2 years of 15.3% for scenario A and 7.9% for scenario B (Table 2). Proportions of dropout in scenario A for conditions 1, 2, and 3 were 2.5%, 5.0%, and 20.0%, respectively. In scenario B, these proportions also approximated expected values (1.2%, 2.5%, and 10.1%, respectively).

Table 2. Simulated rate of missing data in each arm of the trial at each radiology visit for scenario A, with 15.3% missing data, and scenario B, with 7.9% missing data under the alternative hypothesis
Scenario/radiology visitRate of missing data, %
Experimental groupControl groupGlobal
A   
 1 year6.110.58.3
 2 year11.119.515.3
B   
 1 year3.15.14.1
 2 year6.09.97.9

For scenario A under the null hypothesis, Type I errors were maintained (i.e., ∼5%) except for the following strategies: the case-complete and LOCF approaches by t-test (i.e., when comparing absolute change), and the LOCF approach by F test (i.e., when comparing slopes by linear mixed-effects model) (Table 3). As expected, the power for complete data sets (i.e., without missing data) was close to 90% whatever the data analysis used (absolute change or linear mixed-effects model). By scenario of missing data and when data were analyzed after using imputation techniques (i.e., LOCF or multiple imputation) or not (i.e., case-complete), power ranged from 51% (t-test performed on case-complete data without missing-data handling methods) to 74% (t-test on missing data handled by multiple imputation). Whatever the data analysis, the all-available data and LOCF methods gave equivalent power. By multiple imputation, the power was always higher than that with other imputation strategies (Table 3).

Table 3. Results of a simulation study for a dropout rate of 15.3% (scenario A) and 7.9% (scenario B) at 2 years*
Scenario/data analysisManagement of missing dataNull hypothesisAlternative hypothesis
Type I errorPowerTreatment effect: relative biasTreatment effect: absolute bias
  • *

    LOCF = last observation carried forward.

  • Expressed as percentage.

  • Expressed in terms of Sharp/van der Heijde units.

  • §

    Before applying missingness mechanism (i.e., results provided are obtained on complete data sets).

A     
 Absolute changeComplete data set§4.991.10.10.00
 Case-complete12.650.6−34.0−1.02
 LOCF8.258.1−35.8−1.07
 Multiple imputation6.274.3−17.2−0.52
 Linear mixed-effects modelComplete data set§4.288.40.40.01
 All available data5.757.9−25.0−0.75
 LOCF7.055.0−35.4−1.06
 Multiple imputation4.169.5−17.7−0.53
B     
 Absolute changeComplete data set§4.690.4−0.8−0.02
 Case-complete5.975.4−16.8−0.51
 LOCF5.577.8−18.7−0.56
 Multiple imputation4.484.0−8.6−0.26
 Linear mixed-effects modelComplete data set§3.787.4−0.8−0.02
 All available data5.472.0−12.9−0.38
 LOCF4.772.4−14.4−0.43
 Multiple imputation4.281.2−8.2−0.25

As expected, the bias of treatment effect with complete data sets (i.e., without missing data) was very low (i.e., close to 0%) whatever the data analysis (Table 3). For scenario A, with 15.3% missing data at 2 years, the treatment effect estimated by linear mixed-effects model with a multiple imputation approach was underestimated by 17.7% (equivalent to an underestimation of 0.53 points in Sharp/van der Heijde units) (Table 3). This bias was lower than that obtained with all-available-data (−25.0%) or LOCF (−35.4%) methods. When considering absolute change, again, multiple imputation gave the most precise treatment-effect estimates (underestimation of 17.2% versus −34.0% and −35.8% for case-complete and LOCF, respectively).

For scenario B, Type I error was maintained for all strategies under the null hypothesis. Results under the alternative hypothesis with 7.9% missing data at 2 years showed less bias than with scenario A but followed approximately the same pattern (Table 3). Bias was minimal with multiple imputation (−8.6% for absolute change and −8.2% for linear mixed-effects model) and power with multiple imputation was better than with other imputation strategies.

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

This simulation study evaluated the impact of various approaches dealing with missing data in RCTs of RA with radiographic outcomes. With a rate of missing data of ∼15% for scenario A, our simulation results demonstrated that multiple imputation had better power and more precise estimations of treatment effect, on average, than other approaches, whether the data analysis is absolute change or linear mixed-effects model. With a linear mixed-effects model, treatment effect was underestimated by 17.7%. The power was lower than the nominal power (70% versus 88%), but the ability to detect a difference was superior than with other methods of managing missing data. When using this statistical approach, Type I error was maintained.

The problem of dealing with missing data is tackled extensively in methodologic work involving radiographic end points in RA (30–32). The advice is generally to perform sensitivity analyses (i.e., a set of analyses showing the influence of different methods of handling missing data on the study results [17]) to ensure that the qualitative conclusion of a randomized trial provided by the main analysis is not affected by how missing data are handled. Sensitivity analyses allow for assessing the robustness of the results and are used as an additional support to the main analysis. Recently, 2 sensitivity analyses, evaluating different methods of handling radiographic missing data, were performed to confirm the robustness of radiographic results in published reports of RA trials (24, 33). However, although sensitivity studies should always be included in the statistical analysis plan of an RCT, they do not allow for drawing general conclusions (i.e., conclusions applicable to different trials) regarding the most appropriate method to use in dealing with missing data for the main analysis. Moreover, we thought it also of interest to assess whether the treatment effect is affected by the method used to handle missing data. This situation cannot be assessed in the framework of a sensitivity analysis (because the “true” effect is unknown) but can be studied in the framework of a simulation study. To our knowledge, this is the first time a simulation approach has been used to investigate consequences of methods for dealing with missing data on power and bias of treatment effect in RA. Such an approach has been used for osteoporosis (34), another progressively deteriorating disease. In that study, the authors found no strategy adequate for universal use, particularly with a high missing data rate.

In RA trials involving longitudinal measurements of radiographic outcomes, 2 main sources of missing data are identified: lack of efficacy and adverse events. Unexpected selective dropout (preferentially in one group) because of lack of clinical efficacy may bias the trial results. In general, patients with a worse prognosis (greater disease activity, greater radiographic evidence of disease progression) have a higher prior probability of premature discontinuation in any clinical trial, and patients completing the entire trial have a more favorable prognosis, either by nature or by treatment (35). In our missingness scenario, we particularly focused on missing data due to lack of efficacy by excluding data for patients with deteriorated disease with a high probability of dropout. We did not deal with missing data related to adverse events. Adverse events can cause patients to leave the trial regardless of good or bad treatment results. Because we compared 2 active treatments, we assumed that missing data due to adverse events would have been similar in both groups and that they would have equally affected treatment results in each group (36), therefore we did not consider missing data related to adverse events. However, patients experiencing adverse events might have more comorbidities than others, causing them to leave the trials.

In trials designed with longitudinal measurements, the use of appropriate statistical methods for repeated measurements is now recommended (24). Use of a linear mixed-effects model does not handle missing values; estimates take into account all available data whether longitudinal data are complete or not. If data are missing completely at random (i.e., when the missingness is not related to the observed or potential outcome of patients) or just missing at random (i.e., when the missingness can be explained entirely by the observed outcomes but not unobserved outcomes) (37), then the estimates will be unbiased. However, this case is no longer true when the missingness depends on the unobserved data (i.e., when the missing data are not missing at random) (37). This case cannot be excluded with missing radiographic data (e.g., selective dropout).

With linear mixed-effects analysis, tests theoretically tend to be more powerful than with absolute change analysis. In fact, the linear mixed-effects model takes into account how the disease and treatment affect each patient over time and how the radiographic data of the same patient are correlated. To exploit the richness of all measurements, this strategy could be particularly interesting when the number of intermediate visits is high (38). In this study, we considered only a 2-year trial duration and 1 intermediate visit. Therefore, our results do not totally confirm this improvement in power. However, in trials involving more visits, this method could be promising.

The case-complete approach conflicts with the ITT principle and its use is decreasing. However, the use of this approach gave us a reference for quantifying bias in treatment effect introduced by missing data. This bias cannot be neglected (i.e., ∼34% with 15% missing data rate).

With the LOCF approach, the missing radiographic value is replaced by the last available value, assuming no change in radiographic score after the patient drops out. In RA trials, this concern can lead to considering dropouts as showing less radiographic evidence of deterioration compared with completers. This approach is widely criticized and, not surprisingly, introduced the highest bias in our estimates of treatment effect.

As compared with the LOCF approach, multiple imputation has theoretically good statistical properties (e.g., unbiased estimates), provided that data are missing at random. In our study, whether the data analysis strategy was absolute change or a linear mixed-effects model, multiple imputation gave substantially improved values for power and minimized the bias in estimates of treatment effect as compared with other approaches. However, this result could be improved by including selected covariates in the imputation model such as baseline predictors of dropout or correlated variables of the radiographic score (37).

A well-designed study should always consider the problem of minimizing missing data (e.g., ensuring appropriate followup for all randomized patients by scheduling patients for radiography even if they drop out of the study) (39). Statistical methods, however well designed, cannot address missing values of high proportions. In our work, the proportion of missing values was sufficiently low (i.e., 7.9% or 15.3%) so as to be reasonably considered with statistical methods. However, some statistical approaches dealing with missing data can minimize bias with reasonably low rates of missing data but cannot avoid it completely, particularly with selective dropout.

This study has caveats and limitations. First, these results may not be generalizable to all radiographic scores used in RA. Even if similar results in relative bias could be expected, extrapolation to another radiographic score would require a new simulation study. Second, the random simulations carried out may not reflect all the patterns of missing data that could occur in real situations. Furthermore, we did not explore all methods of dealing with missing data. In particular, Cook has proposed a specific method to deal with data not missing at random, which uses measurements obtained from out-of-study sources to estimate values for missing study data in a random effects model (40). In the same way, some other sophisticated methods, such as selection models or pattern mixture models, were not considered (41). However, because these models rely on many assumptions, their use suggests many assumptions, therefore they cannot be used as a main strategy when analyzing the results of an RCT. These models can be used in the framework of a sensitivity analysis.

In this simulation study, we demonstrated the influence of the choice of analysis and methods for handling missing data when analyzing results of RCTs with radiographic outcomes. Our results, especially those obtained with multiple imputation, can help investigators in planning clinical trials, especially in choosing methods of imputation and data analysis. Regardless of the method of handling missing data chosen for the main analysis, sensitivity analysis is essential to confirm the robustness of the results.

AUTHOR CONTRIBUTIONS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

Dr. Baron had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study design. Baron, Ravaud, Samson, Giraudeau.

Acquisition of data. Baron.

Analysis and interpretation of data. Baron, Ravaud, Samson, Giraudeau.

Manuscript preparation. Baron, Ravaud, Samson, Giraudeau.

Statistical analysis. Baron, Giraudeau.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES