INTRODUCTION
 Top of page
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHOR CONTRIBUTIONS
 REFERENCES
Rheumatoid arthritis (RA) is the most common chronic inflammatory joint disease and is responsible for symptomatic manifestations (e.g., functional status, pain) and structural damage (i.e., damage of the articular cartilage and bone) (1). Effective diseasemodifying antirheumatic drugs are increasingly available as therapy (2). Assessing such treatments requires the measurement of structural outcomes in randomized controlled trials (RCTs) to demonstrate a retardation of disease progression. Radiographic outcomes are often used as primary end points for assessing structural severity (3–6).
Because retardation of structural damage in RCTs requires observation over time, followup of patients often necessitates intermediate visits, requiring more than 2 sessions of radiography in most trials (7). Specific methods such as linear mixedeffects models, which exploit the richness of the dynamics obtained with such longitudinal, or repeated, measurements, could be applied to estimate the treatment effect (8). Despite repeated measurements, calculating the mean change between baseline visit and end of the study in each group and comparing the mean change using the classic ttest (or MannWhitney test for nonparametric comparisons) remains the standard analysis.
The intenttotreat (ITT) principle is the cornerstone of RCTs (9–11) and is widely recommended to demonstrate the superiority of one treatment over another (12, 13). However, few researchers use this principle in analyzing their data (14, 15), particularly in trials evaluating radiographic outcomes in RA (16). The ITT principle requires that all patients, whether their data are complete or incomplete, be included in the statistical analysis. Approximately twothirds of RCTs of RA have a missing data rate greater than 10% for radiographic outcomes (16), and researchers must use methods for dealing with missing data to apply the ITT principle.
In trials involving longitudinal measurements of radiographic outcome, missing data can result from a lack of efficacy or adverse events, for example. When data are incomplete, results of the trial can be affected in 2 major ways. First, missing data can result in a bias of treatment effect estimates. For example, patients experiencing greater deterioration in structural damage may be less likely to complete the visits. If missing data are ignored and analyses are based on only the data of patients who are doing well, then the disease progression could be underestimated (17). Second, missing data can result in a loss of statistical power (i.e., the ability of the trial to detect a difference between groups) if data for some patients are excluded from the analysis (17).
Several methods exist to adjust for missing data (18). However, conclusions of trials (i.e., superiority of one treatment over another or not?) and treatment effect may be affected by the method used to handle missing data. Our goal was to compare the impact of different approaches chosen to deal with missing data under a scenario that mimics trials of RA with a radiographic outcome. We performed a simulation study. Such studies, increasingly common in the medical literature, are used to assess the performance of statistical methods in relation to the known truth (19). We compared approaches to handling missing data in terms of statistical power and magnitude of bias introduced by missing data on treatment effect.
RESULTS
 Top of page
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHOR CONTRIBUTIONS
 REFERENCES
Under the alternative hypothesis, the missingness mechanism provided a missing data rate at 2 years of 15.3% for scenario A and 7.9% for scenario B (Table 2). Proportions of dropout in scenario A for conditions 1, 2, and 3 were 2.5%, 5.0%, and 20.0%, respectively. In scenario B, these proportions also approximated expected values (1.2%, 2.5%, and 10.1%, respectively).
Table 2. Simulated rate of missing data in each arm of the trial at each radiology visit for scenario A, with 15.3% missing data, and scenario B, with 7.9% missing data under the alternative hypothesisScenario/radiology visit  Rate of missing data, % 

Experimental group  Control group  Global 

A    
1 year  6.1  10.5  8.3 
2 year  11.1  19.5  15.3 
B    
1 year  3.1  5.1  4.1 
2 year  6.0  9.9  7.9 
For scenario A under the null hypothesis, Type I errors were maintained (i.e., ∼5%) except for the following strategies: the casecomplete and LOCF approaches by ttest (i.e., when comparing absolute change), and the LOCF approach by F test (i.e., when comparing slopes by linear mixedeffects model) (Table 3). As expected, the power for complete data sets (i.e., without missing data) was close to 90% whatever the data analysis used (absolute change or linear mixedeffects model). By scenario of missing data and when data were analyzed after using imputation techniques (i.e., LOCF or multiple imputation) or not (i.e., casecomplete), power ranged from 51% (ttest performed on casecomplete data without missingdata handling methods) to 74% (ttest on missing data handled by multiple imputation). Whatever the data analysis, the allavailable data and LOCF methods gave equivalent power. By multiple imputation, the power was always higher than that with other imputation strategies (Table 3).
Table 3. Results of a simulation study for a dropout rate of 15.3% (scenario A) and 7.9% (scenario B) at 2 years*Scenario/data analysis  Management of missing data  Null hypothesis  Alternative hypothesis 

Type I error†  Power†  Treatment effect: relative bias†  Treatment effect: absolute bias‡ 


A      
Absolute change  Complete data set§  4.9  91.1  0.1  0.00 
 Casecomplete  12.6  50.6  −34.0  −1.02 
 LOCF  8.2  58.1  −35.8  −1.07 
 Multiple imputation  6.2  74.3  −17.2  −0.52 
Linear mixedeffects model  Complete data set§  4.2  88.4  0.4  0.01 
 All available data  5.7  57.9  −25.0  −0.75 
 LOCF  7.0  55.0  −35.4  −1.06 
 Multiple imputation  4.1  69.5  −17.7  −0.53 
B      
Absolute change  Complete data set§  4.6  90.4  −0.8  −0.02 
 Casecomplete  5.9  75.4  −16.8  −0.51 
 LOCF  5.5  77.8  −18.7  −0.56 
 Multiple imputation  4.4  84.0  −8.6  −0.26 
Linear mixedeffects model  Complete data set§  3.7  87.4  −0.8  −0.02 
 All available data  5.4  72.0  −12.9  −0.38 
 LOCF  4.7  72.4  −14.4  −0.43 
 Multiple imputation  4.2  81.2  −8.2  −0.25 
As expected, the bias of treatment effect with complete data sets (i.e., without missing data) was very low (i.e., close to 0%) whatever the data analysis (Table 3). For scenario A, with 15.3% missing data at 2 years, the treatment effect estimated by linear mixedeffects model with a multiple imputation approach was underestimated by 17.7% (equivalent to an underestimation of 0.53 points in Sharp/van der Heijde units) (Table 3). This bias was lower than that obtained with allavailabledata (−25.0%) or LOCF (−35.4%) methods. When considering absolute change, again, multiple imputation gave the most precise treatmenteffect estimates (underestimation of 17.2% versus −34.0% and −35.8% for casecomplete and LOCF, respectively).
For scenario B, Type I error was maintained for all strategies under the null hypothesis. Results under the alternative hypothesis with 7.9% missing data at 2 years showed less bias than with scenario A but followed approximately the same pattern (Table 3). Bias was minimal with multiple imputation (−8.6% for absolute change and −8.2% for linear mixedeffects model) and power with multiple imputation was better than with other imputation strategies.
DISCUSSION
 Top of page
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHOR CONTRIBUTIONS
 REFERENCES
This simulation study evaluated the impact of various approaches dealing with missing data in RCTs of RA with radiographic outcomes. With a rate of missing data of ∼15% for scenario A, our simulation results demonstrated that multiple imputation had better power and more precise estimations of treatment effect, on average, than other approaches, whether the data analysis is absolute change or linear mixedeffects model. With a linear mixedeffects model, treatment effect was underestimated by 17.7%. The power was lower than the nominal power (70% versus 88%), but the ability to detect a difference was superior than with other methods of managing missing data. When using this statistical approach, Type I error was maintained.
The problem of dealing with missing data is tackled extensively in methodologic work involving radiographic end points in RA (30–32). The advice is generally to perform sensitivity analyses (i.e., a set of analyses showing the influence of different methods of handling missing data on the study results [17]) to ensure that the qualitative conclusion of a randomized trial provided by the main analysis is not affected by how missing data are handled. Sensitivity analyses allow for assessing the robustness of the results and are used as an additional support to the main analysis. Recently, 2 sensitivity analyses, evaluating different methods of handling radiographic missing data, were performed to confirm the robustness of radiographic results in published reports of RA trials (24, 33). However, although sensitivity studies should always be included in the statistical analysis plan of an RCT, they do not allow for drawing general conclusions (i.e., conclusions applicable to different trials) regarding the most appropriate method to use in dealing with missing data for the main analysis. Moreover, we thought it also of interest to assess whether the treatment effect is affected by the method used to handle missing data. This situation cannot be assessed in the framework of a sensitivity analysis (because the “true” effect is unknown) but can be studied in the framework of a simulation study. To our knowledge, this is the first time a simulation approach has been used to investigate consequences of methods for dealing with missing data on power and bias of treatment effect in RA. Such an approach has been used for osteoporosis (34), another progressively deteriorating disease. In that study, the authors found no strategy adequate for universal use, particularly with a high missing data rate.
In RA trials involving longitudinal measurements of radiographic outcomes, 2 main sources of missing data are identified: lack of efficacy and adverse events. Unexpected selective dropout (preferentially in one group) because of lack of clinical efficacy may bias the trial results. In general, patients with a worse prognosis (greater disease activity, greater radiographic evidence of disease progression) have a higher prior probability of premature discontinuation in any clinical trial, and patients completing the entire trial have a more favorable prognosis, either by nature or by treatment (35). In our missingness scenario, we particularly focused on missing data due to lack of efficacy by excluding data for patients with deteriorated disease with a high probability of dropout. We did not deal with missing data related to adverse events. Adverse events can cause patients to leave the trial regardless of good or bad treatment results. Because we compared 2 active treatments, we assumed that missing data due to adverse events would have been similar in both groups and that they would have equally affected treatment results in each group (36), therefore we did not consider missing data related to adverse events. However, patients experiencing adverse events might have more comorbidities than others, causing them to leave the trials.
In trials designed with longitudinal measurements, the use of appropriate statistical methods for repeated measurements is now recommended (24). Use of a linear mixedeffects model does not handle missing values; estimates take into account all available data whether longitudinal data are complete or not. If data are missing completely at random (i.e., when the missingness is not related to the observed or potential outcome of patients) or just missing at random (i.e., when the missingness can be explained entirely by the observed outcomes but not unobserved outcomes) (37), then the estimates will be unbiased. However, this case is no longer true when the missingness depends on the unobserved data (i.e., when the missing data are not missing at random) (37). This case cannot be excluded with missing radiographic data (e.g., selective dropout).
With linear mixedeffects analysis, tests theoretically tend to be more powerful than with absolute change analysis. In fact, the linear mixedeffects model takes into account how the disease and treatment affect each patient over time and how the radiographic data of the same patient are correlated. To exploit the richness of all measurements, this strategy could be particularly interesting when the number of intermediate visits is high (38). In this study, we considered only a 2year trial duration and 1 intermediate visit. Therefore, our results do not totally confirm this improvement in power. However, in trials involving more visits, this method could be promising.
The casecomplete approach conflicts with the ITT principle and its use is decreasing. However, the use of this approach gave us a reference for quantifying bias in treatment effect introduced by missing data. This bias cannot be neglected (i.e., ∼34% with 15% missing data rate).
With the LOCF approach, the missing radiographic value is replaced by the last available value, assuming no change in radiographic score after the patient drops out. In RA trials, this concern can lead to considering dropouts as showing less radiographic evidence of deterioration compared with completers. This approach is widely criticized and, not surprisingly, introduced the highest bias in our estimates of treatment effect.
As compared with the LOCF approach, multiple imputation has theoretically good statistical properties (e.g., unbiased estimates), provided that data are missing at random. In our study, whether the data analysis strategy was absolute change or a linear mixedeffects model, multiple imputation gave substantially improved values for power and minimized the bias in estimates of treatment effect as compared with other approaches. However, this result could be improved by including selected covariates in the imputation model such as baseline predictors of dropout or correlated variables of the radiographic score (37).
A welldesigned study should always consider the problem of minimizing missing data (e.g., ensuring appropriate followup for all randomized patients by scheduling patients for radiography even if they drop out of the study) (39). Statistical methods, however well designed, cannot address missing values of high proportions. In our work, the proportion of missing values was sufficiently low (i.e., 7.9% or 15.3%) so as to be reasonably considered with statistical methods. However, some statistical approaches dealing with missing data can minimize bias with reasonably low rates of missing data but cannot avoid it completely, particularly with selective dropout.
This study has caveats and limitations. First, these results may not be generalizable to all radiographic scores used in RA. Even if similar results in relative bias could be expected, extrapolation to another radiographic score would require a new simulation study. Second, the random simulations carried out may not reflect all the patterns of missing data that could occur in real situations. Furthermore, we did not explore all methods of dealing with missing data. In particular, Cook has proposed a specific method to deal with data not missing at random, which uses measurements obtained from outofstudy sources to estimate values for missing study data in a random effects model (40). In the same way, some other sophisticated methods, such as selection models or pattern mixture models, were not considered (41). However, because these models rely on many assumptions, their use suggests many assumptions, therefore they cannot be used as a main strategy when analyzing the results of an RCT. These models can be used in the framework of a sensitivity analysis.
In this simulation study, we demonstrated the influence of the choice of analysis and methods for handling missing data when analyzing results of RCTs with radiographic outcomes. Our results, especially those obtained with multiple imputation, can help investigators in planning clinical trials, especially in choosing methods of imputation and data analysis. Regardless of the method of handling missing data chosen for the main analysis, sensitivity analysis is essential to confirm the robustness of the results.
AUTHOR CONTRIBUTIONS
 Top of page
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHOR CONTRIBUTIONS
 REFERENCES
Dr. Baron had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study design. Baron, Ravaud, Samson, Giraudeau.
Acquisition of data. Baron.
Analysis and interpretation of data. Baron, Ravaud, Samson, Giraudeau.
Manuscript preparation. Baron, Ravaud, Samson, Giraudeau.
Statistical analysis. Baron, Giraudeau.