• Addictions treatment;
  • controlled clinical trials;
  • data processing;
  • randomized clinical trials;
  • RCTs;
  • statistical analysis


  1. Top of page
  7. Acknowledgements
  8. References

Purpose  This is the third paper in a series that reviews strategies for optimizing the validity and utility of randomized clinical trials (RCTs) in addictions treatment research. Whereas the two previous papers focused on design and implementation, here we address issues pertaining to data processing and statistical analysis.

Scope  Recommendations for enhancing data quality and utility are offered in sections on data coding and entry; and data format, structure and management. We discuss the need for preliminary data analyses that examine statistical power; patterns of attrition; between-group equivalence; and treatment integrity and discriminability. We discuss tests of treatment efficacy, as well as ancillary analyses aimed at explicating treatment processes.

Conclusions  Safeguards are necessary to protect data quality, and advance planning is needed to ensure that data formats are compatible with statistical objectives. In addition to treatment efficacy, statistical analyses should evaluate study internal and external validity, and investigate the change mechanisms that underlie treatment effects.


  1. Top of page
  7. Acknowledgements
  8. References

This is the third and final article in a series of three papers that describe strategies for enhancing the validity and utility of randomized clinical trials (RCTs) in addictions treatment research. As noted in the first two reviews, there is a general consensus that the RCT is the research design of choice for evaluating treatment efficacy, primarily because internal validity is maximized. Ideally, random assignment to study conditions succeeds in producing the between-group equivalence that allows investigators to attribute differential outcomes solely to the experimental manipulation of treatment.

The first two papers in this series [1,2] focused on the design and implementation of addictions RCTs. The first paper addressed issues with respect to treatment and research design, whereas the second focused on participant samples and assessment methods. In this third paper, we shift our attention to data processing and statistical analysis. In part because of the attention paid to design and implementation issues, data processing is often a neglected domain in papers that deal with treatment research methodology. Here we offer recommendations for enhancing data quality and utility in sections dealing with data coding and entry; and data format, structure and management. With respect to statistical analyses of the data, we describe the sequencing of analyses that should be performed to evaluate the internal and external validity of RCTs; to test treatment efficacy; and to examine treatment processes and mechanisms of action. As in our earlier reviews, we rely heavily on the adult alcohol dependence treatment literature, and we address many issues only briefly. However, throughout we refer readers to numerous sources that provide more in-depth coverage of issues that arise in the analysis of RCT data.


  1. Top of page
  7. Acknowledgements
  8. References

Data quality is an important, if often neglected, research concern, and procedures that facilitate the accuratecoding and entry of information into appropriately formatted data files are essential for reducing error and maximizing the likelihood that tests of treatment efficacy produce valid results. In the two sections that follow, we offer strategies for improving data quality as clients' responses are coded and entered into electronic files, and we underscore the importance of attention to data formatting, structure and management for facilitating the conduct of analyses that address study aims.

Data coding and entry

Flawed data coding and entry procedures can systematically bias results or introduce random variability that reduces statistical power to detect treatment effects. Data quality can be maximized by attention to assessment, as well as data processing and procedures. If feasible, researchers should consider the use of data collection procedures that minimize or detect errors in the recording or coding of responses (e.g. computerized or computer-assisted assessments see [3]; inclusion of validity checks in questionnaires).

To ensure the accuracy of responses, research assistants should be instructed to record answers to open-ended queries verbatim, and the coding of responses should be reviewed independently by data processing staff. Similarly, many assessments of alcohol and other substance use require that study participants, or research assistants, convert reported quantities of consumption into metrics that are consistent across respondents, such as ‘standard drinks’ of alcohol. Exact quantification may not be required for many research purposes; however, for cases where precision is desired, interviewers should simply record responses as accurately as possible and perform standard drink conversion calculations at a later time.

After assessments are completed, it is important to minimize errors during data entry. Training and monitoring of data processing staff are essential for ensuring accurate entry. Additionally, training and supervision communicate the importance of the data processing function to research staff, who often view this task as mundane and tedious. Accurate data entry can be facilitated by double-entry software that allows the specification of potential or allowable responses and that will not accept illegal values. Programming might also be used to identify improbable, but possible, values so that their accuracy can be determined. In the absence of an automated system for such alerts, data cleaning should always include an examination of response frequencies so as to identify and verify illegal, improbable or statistically rare entries.

Data accuracy can also be enhanced by examining answers to related questions in different instruments for logical inconsistencies in response. In addition to revealing possible errors in data entry or coding, such cross-checks may identify problems with assessment procedures that interfere with participants' ability to respond consistently; similarly, they can assist staff in identifying participants who have difficulty completing assessments or who fail to respond honestly. Finally, periodic audits should be used to check random subsets of data records for consistency between paper records of responses and electronic data.

Data format, structure and management

To facilitate the conduct of treatment efficacy tests and other critical statistical analyses of RCT data, issues regarding the structure and management of data sets require attention during the study's planning stage. That is, as specific hypotheses are stated explicitly and design and method decisions are made, the data structure required to meet design requirements and test hypotheses should also be determined. These issues are especially pertinent to investigations that use the daily estimation procedures that we have recommended previously to assess substance use and other variables and to studies that employ electronic recording technologies in data collection.

Daily estimation procedures generate data that can be structured in a variety of different ways and used in different types of analyses (e.g. survival analysis and hazards models; latent growth curve modeling). However, different analytical procedures and different software packages often require that input data matrices be organized in different ways, and they may also have different conventions for dealing with missing values. To avoid time-consuming and potentially expensive manipulations of data format, investigators should consider the optimal, and most flexible, approach to data structure, given particular study aims. In brief, initial data formatting should be both design- and hypothesis-driven.

One important issue in structuring daily data records concerns the treatment of time. For example, certain hypotheses may require that participants' data values are linked to calendar days (e.g. [4,5]), whereas other hypotheses and methods (e.g. survival analysis) will require that a common event (e.g. first day of treatment) be used as a start point for data sequences (e.g. [6]). In either event, it is important to include interview and treatment dates in the data set to facilitate the manipulation of data records and the investigation of causal chains that depend on the temporal ordering of measurement.

As noted in our previous review, there are currently numerous technologies in use for the collection of data in addictions RCTs. In addition to personal interviews, these include web-based assessment, computer-assisted telephone interviewing and personal data assistants (PDAs) or other hand-held devices that participants use to record ‘in-the-moment’ responses. Because responses are entered directly by the participant, these methods reduce errors associated with research staff data entry. However, the use of these technologies requires attention both to factors that influence self-report veracity (e.g. [3]) and to the transportability and merging of individual data files. In the absence of hard copy records, potential data loss is also a concern, requiring that some form of backup or data upload occur with relative frequency.

A final data management issue concerns the computation of summary scores and indices based on the item-level responses of participants. Scoring routines can often contain errors that can influence results. Thus, we recommend that an independent review of all scoring routines be undertaken to ensure their accuracy, with particular attention paid to item recodes and the handling of missing values (see [7,8] for further consideration of data processing and management issues).


  1. Top of page
  7. Acknowledgements
  8. References

The application of statistical techniques is not independent of research design and other methodological considerations, and a number of issues that have been addressed previously resurface in a discussion of data analysis. Because the use of specific statistical procedures will vary across different research designs and outcome measures, our intent here is not to prescribe specific approaches. Rather, we focus primarily on general recommendations regarding the sequencing of analyses that address trial validity, treatment efficacy and mechanisms of action, and we refer interested readers to relevant resources that provide more detailed information.

Evaluating trial validity

Initial data analyses should be directed at establishing that the conditions that permit a valid test of treatment efficacy have been met. Such ‘boundary conditions’ define the limits of causal inference and the generalizability of results, as well as the basic requirements for subsequent analyses. In the sections that follow, we discuss the assessment of observed statistical power; the examination of patterns of attrition and between-group equivalence; and the analysis of treatment integrity and discriminability. Other, often neglected, tasks that should be undertaken during this preliminary stage of data analysis include the evaluation of baseline and outcome measures to ensure adequate reliability and the examination of factor structure using confirmatory analyses to test measurement models for constructs of interest.

Statistical power calculations

Statistical power derives from sample size, effect size and alpha. Enhanced statistical power increases the probability of detecting a significant treatment effect; conversely, higher power, although improving the odds of detecting an effect, might also reveal spurious results. These concerns represent the balance between Type 1 (falsely rejecting the null hypothesis) and Type 2 (falsely accepting the null hypothesis) errors. Historically, researchers have been concerned with minimizing Type 1 error by adjusting alpha for multiple tests of significance (e.g. Bonferroni adjustment; see [9]; see also [10] for a critique; see [11] regarding determination of alpha). However, many trials with negative results have been found to be under-powered [12,13] and, hence, may have led to fallacious conclusions as a consequence of Type 2 error. Thus, it is important to recalculate power estimates following completion of the trial to adjust for attrition and to consider the possibility of Type 2 error when evaluating results. For conventional statistical approaches (e.g. anova), there is a variety of resources to assist in estimating power (e.g. [14]), and several software packages are available as adjuncts to larger programs or in stand-alone versions (see [15] for one review). Power calculations for less traditional statistical approaches are more complicated [16] and may require the application of other statistical techniques (e.g. Monte Carlo estimation; [17]).

Examining patterns of attrition

Potential research participants can attrite for a variety of reasons as they progress through the various phases of an RCT (recruitment, enrollment, assessment, treatment, follow-up). Some individuals may be excluded due to stringent enrollment criteria; eligible recruits may decline to enroll, and participants may withdraw during baseline assessment, treatment or follow-up [18]. For the consumer of RCT research, the implications of these different forms of attrition for study validity can often be difficult to ascertain, as many reports of addictions RCTs do not include the range of comparisons or other information on attrition that are needed to judge study quality [13,19].

Investigators should compare the characteristics of those in the recruited sample with available treatment population data (e.g. demographic variables, severity of substance involvement, treatment history, comorbid diagnoses). Additional comparisons should contrast enrollees with individuals who were excluded from the RCT and with those who refused to enroll. Within the participant sample, clients who completed treatment and follow-up evaluations should be compared with those who were lost to attrition. Analyses should compare not only those who attrite with compliers in an omnibus fashion (i.e. regardless of treatment assignment; an external validity issue), but also investigate differential rates of attrition as a function of treatment assignment (an internal validity issue). We caution that the failure to find significant differences does not guarantee validity; the statistical power for these comparison analyses tends to be low. Further, depending on the timing of attrition, existing data with which to evaluate such differences may be available on a limited number of measured variables, and these may be unrelated to treatment response (see [18] regarding attrition and study validity).

Evaluating between-group equivalence

Because random assignment may not succeed in producing equivalent groups, pretreatment differences between experimental conditions cannot be assumed and must be examined (see [19] regarding selection bias in RCTs). If needed, a variety of approaches can be used to control statistically or adjust for pre-existing between-group differences, ranging from more basic approaches such as regression and statistical matching to more complex strategies, such as the use of propensity analysis (see [20,21]). Propensity scoring has been used to create equivalent groups in observational or quasi-experimental studies, but can also be applied in RCTs [22]. Propensity analysis allows the computation of a single variable reflecting multiple covariates; the propensity score reflects a conditional probability of being assigned to a treatment and allows the creation of groups of equal propensity that can then be evaluated for treatment effects (e.g. [21,23]).

When random assignment does create equivalent groups, treatment non-adherence can still affect the evaluation of outcomes in RCTs adversely. In such cases, intent-to-treat (ITT) analyses, which include all randomized participants irrespective of treatment exposure, is one approach that can be used to estimate treatment effects [24]. Because ITT analyses include individuals who did not receive treatment (e.g. would not show a treatment effect), this approach tends to underestimate intervention effects [25]; on the other hand, per-protocol approaches, which include only clients who received their assigned treatment, may overestimate effects.

Alternative approaches that model treatment effects for compliers (e.g. complier average causal effect; CACE, e.g. [26]) have been proposed to estimate the effect of treatment when it is actually received. CACE estimation allows for the retention of the original randomization as opposed to the self-selection inherent in per-protocol analyses. This approach assumes that, given successful random assignment, both treatment assignees and control participants share the same probability of non-adherence and that an offer of, or assignment to, treatment has no effect on attrition [27]. Based on this assumption, the non-adherence effect that might have been observed in control participants, had compliance been a factor in that condition, can be estimated, permitting the comparison of compliant treatment participants with ‘compliant’ (i.e. adjusted for non-compliance) control participants. These three methods for dealing with non-compliance were compared in a trial of screening to prevent colorectal cancer; ITT produced the highest estimates of relative risk (i.e. weakest treatment effect); per-protocol analyses generated the lowest (i.e. strongest treatment effect) and CACE estimates fell between the two extremes [27].

Assessing treatment integrity and discriminability

Valid assertions that a given treatment caused change in observed outcomes require that the interventions be delivered as intended. For pharmacological trials, this is evaluated by examining evidence that participants received the prescribed medication, whether in active drug or in placebo conditions. Further, it is important to show that adjunctive treatment delivery and participant compliance were similar across conditions.

For behavioral interventions, the evaluation of treatment integrity and discriminability can be a more difficult undertaking. As described previously in this series, it should begin with the specification of essential treatment elements. Once the putative active ingredients are delineated, investigators must ensure both that each treatment was delivered with integrity and that alternative treatments were discriminable (e.g. [28]). It is also necessary to demonstrate that factors such as therapist skill, manual adherence and other performance indicators were similar across conditions, and that treatment groups did not differ in participant compliance [29].

Requisite data analyses necessitate the development of a rating system (e.g. the Yale Adherence and Competence Scale; YACS; [30]) that can be used by trained independent observers (blind to treatment condition and study hypotheses) to code randomly selected audio- or videotaped segments of treatment sessions in terms of relevant therapeutic categories and dimensions. Briefly, treatment integrity and discriminability are demonstrated by analyses showing that study treatments: (i) are rated highly on those dimensions that define the respective therapies; (ii) differ significantly (in the predicted direction) on relevant dimensions of difference; and (iii) are similar in terms of non-specific treatment factors such as therapist skill and client engagement (see [28] for an example of this type of evaluation).

Testing treatment efficacy

RCTs are often valued because they are assumed to provide straightforward tests of treatment efficacy (e.g. [31]); however, the evaluation of efficacy is often more complex in practice than in theory. The choice of statistical tests of treatment effects must be informed by the nature of the hypotheses under study, and the data collected must be in a format that is amenable to the tests of specific hypotheses. If single end-point outcomes are of interest (e.g. depression scores at 6 months post-treatment), then simple mean comparisons of scores at that time-point may provide a sufficient test. On the other hand, outcomes in addictions RCTs have been conceptualized increasingly as developmental, reflecting changes in behavioral patterns over time (e.g. drinks per drinking day or heavy drinking occasions). Outcome data to evaluate such changes are collected at multiple time-points, and trajectory analyses (e.g. linear growth modeling or growth mixture modeling) may be more appropriate. Regardless of statistical approach, the valid use of any method relies on certain assumptions that must be tested.

Assumptions regarding data distributions

Most statistical techniques, including those that are used widely to test treatment efficacy, assume that data are distributed normally. However, substance use outcome measures are typically non-normal, most often exhibiting positive skew. In these cases, a variety of different transformations may be applied, depending on distributional abnormalities (see [32]) for recommendations on matching transformations to different distributional abnormalities). Frequently, however, transformations do not ameliorate the problem completely. Alternative analytical techniques that do not assume a normal distribution, or that are robust to the violations of the normality assumption, may need to be considered. Even when transformations succeed in normalizing the data distribution, the use of transformed data metrics can complicate the interpretation of results [33].

Most analytical methods make additional assumptions that also should be examined. For example, most commonly used procedures assume that residuals (deviations between observed and predicted values) occur randomly, and an analysis of residuals can provide an important aid for interpreting overall model fit (e.g. [34]). An additional issue is the possibility of floor or ceiling effects. For example, it may be difficult to demonstrate significant reductions in drinking if clients report low baseline levels of consumption [35].

Multi-level analysis

In addictions RCTs, existing units, rather than the individuals residing within those units, are often allocated randomly to experimental conditions. In prevention studies, for example, schools, or classrooms within schools, are often assigned randomly to receive alternate interventions. Such groupings, however, do not occur at random; individuals within pre-existing structures tend to be more similar to each other than they are to those in other groups, or to a random sample of individuals from the population of interest. A similar problem arises with group therapy; even if individuals (rather than groups) are randomized to treatment condition, group dynamics are likely to vary across groups, a situation that can grow more complicated if the size and composition of units change over time. Such hierarchical structure is ubiquitous in RCTs and must be considered in the analysis of treatment efficacy. When such grouping is inherent in the design of an RCT, participants represent only one level within a hierarchical organization, and individual change may not be the appropriate unit of analysis. In such cases, investigators should consider hierarchical approaches to analysis (e.g. hierarchical linear modeling; [36]), which account for the nesting in such designs and allow for the examination of complex multi-level interactions.

Dealing with missing data

In addition to threatening the internal and external validity of RCTs, participant non-compliance poses problems for the statistical analysis of treatment outcomes. Many clinical trials involve multiple follow-up assessments over time and, consequently, are susceptible to periodic, as well as complete, non-compliance. Participants may be either lost completely to follow-up at some point, or they may miss one of more follow-up evaluations and be subsequently relocated. Regardless of the particulars, these participation patterns can result in missing data values at various points in the data sequence. Many statistical techniques (e.g. repeated-measures ANOVA) apply a list-wise deletion standard, deleting all cases with any missing data. This can lead to substantial reductions in sample size when a large number of participants are missing small amounts of data and to decreased efficiency and potential bias. Historically, a variety of techniques have been used for estimating missing data for individual cases (e.g. last observation carried forward, imputation).

Multiple imputation allows for the simultaneous estimation of several missing variables based on the available data and permits investigators to compare models with missing and imputed values to determine empirically the effect of missingness and the relative efficiency of the two approaches (e.g. [37]). Recently developed analytical tools such as latent growth curve modeling maximize data retention through estimation techniques (e.g. full information maximum likelihood; FIML) that use all the data available from each participant to compute a likelihood estimate that is then summed across all participants [38]. Statistical methods for dealing with missing values rely on complex assumptions about the nature of the missing data that vary by technique and must be considered in decisions regarding the handling of missing data (see [39,40] for reviews of the most widely used approaches).


Regardless of whether between-group equivalence is achieved via randomization, there will be within-group variability on client attributes (and other factors) that may be related to treatment response. Controlling for these variables in tests of treatment efficacy is likely to enhance the ability to detect significant intervention effects. In particular, investigators should consider using pretreatment levels of the outcome measure as a covariate, even when baseline differences are not observed [41]. It is also advisable to examine other variables that might be expected logically to influence outcomes (e.g. significant life events, participation in non-study treatments). In addition, because multiple therapists are likely to be involved in the delivery of any one treatment, therapist effects should be evaluated systematically, especially for behavioral interventions (e.g. [42]; see [43]).

Sensitivity analyses

Once data have been analyzed, it becomes crucial to evaluate the robustness of the resulting outcome models. Variations with respect to methodological decisions that have been previously reviewed (e.g. primary outcome measures, analytical technique, approach to handling missing data) may affect related model parameter estimates and thereby influence results. The influence of these choices can be assessed by undertaking sensitivity analyses that evaluate the robustness of outcome models in the face of various parameter modifications (e.g. [44–46]). For example, alternative outcome measures (e.g. time-to-first-drink versus percentage of days abstinent) and statistical procedures (e.g. conventional anova versus survival analysis) can be used to test treatment efficacy, and analyses can be performed using different subsets of participants (e.g. those with complete data versus the entire sample with imputed values for missing data; per protocol analyses versus ITT).

Treatment process: how, why and for whom treatment works

If feasible (e.g. statistical power is adequate), ancillary analyses should be conducted to examine the change mechanisms that were hypothesized to underlie treatment effects. However, investigators often fail to specify fully causal pathways that are purported to produce changes in substance use behavior [13]. In the absence of theoretically based hypotheses, investigators should undertake a disaggregation strategy and perform moderator analyses aimed at identifying the characteristics of clients who do and do not benefit from the interventions tested (see [47]).

When plausible mechanisms of action or influence have been articulated, data analyses may involve relatively simple tests of mediation, using conventional statistical models. Perhaps the most commonly used is the Baron & Kenny approach [48]. To demonstrate statistical mediation of a significant treatment effect, an intervening variable must be related significantly (in the predicted direction) to both the independent (treatment) and the dependent (outcome) variables; further, the significant relationship between treatment and outcome must be attenuated with the inclusion of the intervening variable in the model.

MacKinnon and colleagues have tested a range of other approaches to analyzing mediation using Monte Carlo simulations (e.g. [49]), and they provide useful suggestions regarding the analysis of putative causal relationships. More recently, Mackinnon et al. [50] have reviewed experimental design features that might also facilitate the evaluation of mediation in studies using random assignment and have suggested the use of double random assignment. This approach involves a series of randomized experiments that test the various paths it the mediational model (in essence, randomly assigning participants to levels of the putative mediator). Statistical mediation tests are necessary for establishing mechanisms of action; however, as described in our first review paper, they are not sufficient, and other criteria should also be evaluated [33,51–54].

Given the inherent complexity of treatment, causal chains will often include multiple pathways. For example, researchers may believe that an intervention succeeds in reducing alcohol use because it teaches participants skills to manage depression. Even if the treatment is found to be efficacious in decreasing consumption, the underlying model may be suspect if clients in different experimental groups do not show differential skill acquisition or decrements in depressive moods, or if changes in affect are unrelated to changes in skill acquisition or changes in drinking behavior. In addition to client variables, complex causal chains may involve therapist characteristics and multiple therapeutic elements, including context variables. Because they involve complex statistical interactions, these multi-component models can be difficult to test, even with the application of sophisticated analytical procedures (e.g. structural equation modeling [55]) [33] (see, however, [56] regarding mediated moderators and moderated mediators; [57] regarding tests of mediation using growth curve analysis; [50] for a recent review of tests of mediation). Finally, we note that causal chain analyses can be informative, even in cases where treatment efficacy is not demonstrated, by pinpointing which relationships in the model sequence appear valid and which are problematic (see [54]).

In addition to testing putative causal chains, a variety of other ancillary analyses can be conducted to further understanding of the processes that underlie treatment effects. For example, based on conceptual and empirical criteria, analyses may be directed at identifying subtypes of participants who respond differentially to treatment [41]. Additionally, it might be possible to examine the influence of non-specific factors that may be independent of the treatments under study, or interact with therapy content. The potential significance of such factors (including client characteristics such as motivation to change and self-efficacy, as well as therapy process variables) is recognized increasingly (e.g. [58,59]). If sufficient data are available, the influence of therapist variables (e.g. gender, recovery status) on outcomes can be explored, both independently and in terms of client congruity (e.g. [42,60]; see also [51,61]) (see [41] regarding additional considerations in statistical analyses of addictions RCTs).


  1. Top of page
  7. Acknowledgements
  8. References

Whereas our previous two papers on enhancing the validity and utility of addictions RCTs focused on research design and implementation, this final paper has addressed issues with respect to the practical tasks involved in data processing, as well as the multi-stage process that should be undertaken in the statistical analysis of RCT data. We have described procedures for ensuring data accuracy as client responses are coded, and as they are entered into computer files. Similarly, we have discussed how the utility of data sets can be improved by considering the format, structure and management of data files. Regarding statistical analysis, we have described the preliminary analyses that should be undertaken to address issues concerning the internal and external validity of RCTs; tests of treatment efficacy; and ancillary analyses designed to further understanding of treatment processes and mechanisms of change.

Across all three of our reviews, we have highlighted some of the more salient issues involved in planning and implementing an addictions treatment trial and in analyzing the resulting data. As we have tried to illustrate, the many methodological decisions made during the planning and conduct of the RCT cannot be made in isolation; rather, they require the consideration of other methodological factors, including data processing procedures and analytical approaches. For example, the types of treatments that are evaluated should influence the selection of primary outcomes measures which, in turn, will dictate assessment method, data structure and statistical approach.

In our earlier papers, we underscored the importance of advance planning and pilot testing to minimize potential threats to validity; at the same time we encouraged investigators to collect data that could be used in later analyses to evaluate these threats. Here, we have shown that there are often statistical techniques that can be applied to ameliorate potential threats to validity, such as differential attrition across experimental conditions, that may arise during the conduct of the trial. Nevertheless, it should be obvious that it is preferable to deal with these potential issues in advance of, and during, the conduct of the trial, rather than subsequently as statistical analyses are performed.

It has not been possible to provide definitive solutions to all of the dilemmas that researchers face in the design, implementation and analysis of addictions RCTs, because decisions vary depending on the investigator's purpose and the research context. Nevertheless, there are many areas in which the dictates of methodological rigor are clear. We have emphasized the importance of enhancing participant compliance to both treatment and research protocols in order to maintain between-group equivalence. We also have underscored the importance of standardizing and monitoring treatment delivery to ensure that critical therapeutic elements are present and that confounding factors are eliminated. Finally, we have highlighted the importance of advance planning to ensure that appropriate control groups are included in research designs; that statistical power is adequate; and that assessment batteries are sufficiently comprehensive to address study aims.

Although we have reviewed primarily methods for maximizing the internal validity of RCTs in addictions treatment research, we have also encouraged investigators to consider strategies for enhancing external validity. In our view, internal and external validity, like reliability and validity of measurement, are not absolute qualities of research, but matters of degree. Specific methodological choices may diminish or enhance one or the other type of validity, or shift the balance between the two. Internal validity is the hallmark of the RCT, and it is essential for demonstrating treatment efficacy; however, external validity increases the likelihood that efficacious treatment approaches will be adopted in clinical settings where their effectiveness can be tested further in real-world patient populations and treatment facilities.

In addition to optimizing validity, we have made two suggestions which, in our view, can enhance the utility of clinical trials in addictions. First, we recommend the use of daily estimation procedures for outcome assessment. These instruments produce data that are compatible with a conceptualization of outcomes as temporal processes, a perspective that is consistent with current views of addiction as a chronic condition (e.g. [62,63]). Secondly, we encourage researchers to examine the mechanisms of action that are hypothesized to produce behavior change. Recognizing the potential for increasing assessment burden, we suggest that investigators outline putative causal chains a priori and limit assessment batteries to measures that can reasonably be thought to mediate or moderate treatment effects. Analyses directed at how, why and for whom treatment works will enhance the transportability of research findings into clinical practice, as well as increase our understanding of the etiology and course of addictions and the process of recovery.


  1. Top of page
  7. Acknowledgements
  8. References

Preparation of this paper was supported in part by the National Institute on Alcohol Abuse and Alcoholism Grants R01 AA008333 and R01 AA011925; Mark S. Goldman, Principal Investigator.


  1. Top of page
  7. Acknowledgements
  8. References
  • 1
    Del Boca F. K., Darkes J. Enhancing the validity and utility of randomized clinical trials in addictions treatment research: I. Treatment implementation and research design. Addiction 2007; doi:10.1111/j.1360-0443.2007.01862.x.
  • 2
    Del Boca F. K., Darkes J. Enhancing the validity and utility of randomized clinical trials in addictions treatment research: II. Participant samples and assessment methods. Addiction 2007; doi:10.1111/j.1360-0443.2007.01863.x.
  • 3
    Del Boca F. K., Darkes J. The validity of self-reports of alcohol consumption: state of the science and challenges for research. Addiction 2003; 98 (Suppl. 2): 112.
  • 4
    Del Boca F. K., Darkes J., Greenbaum P. E., Goldman M. S. Up close and personal: temporal variability in the drinking of individual college students during their first year. J Consult Clin Psychol 2004; 72: 15564.
  • 5
    Greenbaum P. E., Del Boca F. K., Darkes J., Wang C.-P., Goldman M. S. Variation in the drinking trajectories of freshman college students. J Consult Clin Psychol 2005; 73: 22938.
  • 6
    Project MATCH Research Group. Matching alcoholism treatments to client heterogeneity: treatment main effects and matching effects on drinking during treatment. J Stud Alcohol 1998; 59: 6319.
  • 7
    Pinol A., Bergel E., Chaisiri K., Diaz E., Gandeh M. Managing data for a randomized controlled trial: experience from the WHO Antenatal Care Trial. Paediatr Perinat Epidemiol 1998; 12 (Suppl. 2): 14255.
  • 8
    McRee B. Project MATCH—a case study: the role of a coordinating center in facilitating research compliance in a multisite clinical trial. In: Zweben A., Barrett D., Carty K., McRee B., Morse P., Rice C., editors. Strategies for Facilitating Protocol Compliance in Alcoholism Treatment Research. Project MATCH Monograph Series, Vol. 7. Rockville, MD: National Institute on Alcohol Abuse and Alcoholism; 1998, p. 93110.
  • 9
    Westfall P. H., Johnson W. O., Utts J. M. A Bayesian perspective on the Bonferroni adjustment. Biometrika 1997; 84: 41927.
  • 10
    Perneger T. V. What's wrong with Bonferroni adjustments. BMJ 1998; 316: 12368.
  • 11
    Berger V. W. On the generation and ownership of alpha in medical studies. Control Clin Trials 2004; 25: 61319.
  • 12
    Moher D., Duhlberg C. S., Wells G. A. Statistical power, sample size, and their reporting in randomized controlled trials. JAMA 1994; 272: 1224.
  • 13
    Moyer A., Finney J. W., Swearington C. E. Methodological characteristics and quality of alcohol treatment outcome studies, 1970–98: an expanded evaluation. Addiction 2002; 97: 25263.
  • 14
    Cohen J. Statistical Power Analyses for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum; 1988.
  • 15
    Thomas L., Krebs C. J. A review of statistical power analysis software. Bull Ecol Soc Am 1997; 78: 12639.
  • 16
    Fan X. Power of latent growth modeling for detecting groups differences in linear growth trajectory parameters. Struct Equat Model 2003; 10: 380400.
  • 17
    Richardson D. B. Power calculations for survival analyses via Monte Carlo estimation. Am J Ind Med 2003; 44: 5329.
  • 18
    Howard K. I., Cox W. M., Saunders S. M. Attrition in substance abuse comparative treatment research: the illusion of randomization. In: OnkenL. S., BlaineJ. D., editors. Psychotherapy and Counseling in the Treatment of Drug Abuse. NIDA Monograph no. 104. Washington, DC: U.S. Department of Health and Human Services; 1990, p. 6679.
  • 19
    Berger V. W., Weinstein S. Ensuring the comparability of control groups: is randomization enough? Control Clin Trials 2004; 25: 51524.
  • 20
    D'Agostino R. B. Tutorial in biostatistics: propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med 1998; 17: 226581.
  • 21
    Foster E. M. Propensity score matching: an illustrative analysis of dose–response. Med Care 2003; 41: 118392.
  • 22
    Rosenbaum P. R., Rubin D. B. The causal role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 4155.
  • 23
    Weitzman S., Lapane K. L., Toledano A. Y., Hume A. L., Mor V. Principles in modeling propensity scores in medical research: a systematic literature review. Pharmacoepidem Dr S 2004; 13: 84153.
  • 24
    Unnebrink K., Windeler J. Intention-to-treat: methods for dealing with missing values in clinical trials of progressively deteriorating diseases. Stat Med 2001; 20: 393146.
  • 25
    Sheiner L. B., Rubin D. B. Intention-to-treat analysis and the goals of clinical trials. Clin Pharmacol Ther 1995; 57: 615.
  • 26
    Jo B. Estimation of intervention effects with noncompliance: alternative model specifications. J Educ Behav Stat 2002; 27: 385409.
  • 27
    Hewitt C. E., Torgerson D. J., Miles J. V. Is there another way to take account of noncompliance in randomized controlled trials? Can Med Assoc J 2006; 175: 3478.
  • 28
    Carroll K. M., Connors G. J., Cooney N., DiClemente C. C., Donovan D. M., Kadden R. M. et al. Internal validity of Project MATCH: discriminability and integrity. J Consult Clin Psychol 1998; 66: 290303.
  • 29
    Kazdin A. E. Nonspecific treatment factors in psychotherapy outcome research. J Consult Clin Psychol 1979; 47: 84651.
  • 30
    Carroll K. M., Ball S. A., Fenton L., Frankforter T. L., Nich C., Nuro K. F. et al. A general system for evaluating therapist adherence and competence in psychotherapy research in the addictions. Drug Alcohol Depend 2000; 57: 22538.
  • 31
    DeAngelis T. Shaping evidence-based practice. APA Mon Psychol 2005; 36: 2631.
  • 32
    Singer J. D., Willett J. B. Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. New York: Oxford University Press; 2003.
  • 33
    Wirtz P. W., Longabaugh R. Methodological assessment and critique. In: LongabaughR., WirtzP. W., editors. Project MATCH Hypotheses: Results and Causal Chain Analyses. Project MATCH Monograph Series, Vol. 8. Rockville, MD: National Institute on Alcohol Abuse and Alcoholism; 2001, p. 296304.
  • 34
    Browne M. W., MacCallum R. C., Kim C.-T., Anderson B. L., Glaser R. When fit indices and residuals are incompatible. Psychol Methods 2002; 7: 40321.
  • 35
    Fogg L., Gross D. Threats to validity in randomized clinical trials. Res Nurs Health 2000; 23: 7987.
  • 36
    Bryk A. S., Raudenbush S. W. Hierarchical Linear Models. Newberry Park, CA: Sage Publications; 1992.
  • 37
    Moore L., Lavoie A., LeSage N., Liberman M., Sampalis J. S., Bergeron E. et al. Multiple imputation of the Glasgow Coma Score. J Trauma 2005; 59: 698704.
  • 38
    Llabre M. M., Spitzer S., Siegel S., Saab P. G., Schneiderman N. Applying latent growth curve modeling to the investigation of individual difference in cardiovascular recovery form stress. Psychosom Med 2004; 66: 2941.
  • 39
    Houck P. R., Mazumdar S., Koru-Sengul T., Tang G., Mulsant B. H., Pollock B. G. et al. Estimating treatment effects from longitudinal clinical trial data with missing values: comparative analyses using different methods. Psychiat Res 2004; 129: 20915.
  • 40
    Schafer J. L., Graham J. W. Missing data: our view of the state of the art. Psychol Methods 2002; 7: 14777.
  • 41
    Borkovec T. D. Between-group therapy outcome research: design and methodology. In: OnkenL. S., BlaineJ. D., BorenJ. J., editors. Behavioral Treatments for Drug Abuse and Dependence. NIDA Mongraph no. 137. Rockville, MD: National Institute on Drug Abuse; 1993, p. 24989.
  • 42
    Project MATCH Research Group. Therapist effects in three treatments for alcohol problems. Psychother Res 1998; 8: 45574.
  • 43
    Crits-Christoph P., Beebe K. L., Connolly M. B. Therapist effects in the treatment of drug dependence: implications for conducting comparative treatment studies. In: OnkenL. S., BlaineJ. D., BorenJ. J., editors. Behavioral Treatments for Drug Abuse and Dependence. NIDA Mongraph no. 137. Rockville, MD: National Institute on Drug Abuse; 1993, p. 3949.
  • 44
    Fitzmaurice G. M. Methods for handling dropouts in longitudinal clinical trials. Stat Neerl 2003; 57: 7599.
  • 45
    Leamer E. E. Sensitivity analyses would help. Am Econ Rev 1983; 75: 30813.
  • 46
    Rosenbaum P. R., Rubin D. B. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc 1983; 45: 21218.
  • 47
    Howard K. I., Krause M. S., Lyons J. S. When clinical trials fail: a guide to disaggregation. In: OnkenL. S., BlaineJ. D., BorenJ. J., editors. Behavioral Treatments for Drug Abuse and Dependence. NIDA Mongraph no. 137. Rockville, MD: National Institute on Drug Abuse; 1993, p. 291302.
  • 48
    Baron R. M., Kenny D. A. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol 1986; 51: 117382.
  • 49
    MacKinnon D. P., Lockwood C. M., Hoffman J. M., West S. G., Sheets V. A comparison of methods to test mediation and other intervening variable effects. Psychol Methods 2002; 7: 83104.
  • 50
    Mackinnon D. P., Fairchild A. J., Fritz M. S. Mediation analysis. Ann Rev Psychol 2007; 58: 593614.
  • 51
    Doss B. D. Changing the way we study change in psychotherapy. Clin Psychol Sci Pract 2004; 11: 36886.
  • 52
    Kazdin A. E., Nock M. K. Delineating mechanisms of change in child and adolescent therapy: methodological issues and research recommendations. J Child Psychol Psychiatry 2003; 44: 111629.
  • 53
    Longabaugh R., Donovan D. M., Karno M. P., McCrady B. S., Morgenstern J., Tonigan J. S. Active ingredients: how and why evidence-based alcohol behavioral treatment interventions work. Alcohol Clin Exp Res 2005; 29: 23547.
  • 54
    Longabaugh R., Wirtz P. W. Causal chain analysis. In: LongabaughR., WirtzP. W., editors. Project MATCH Hypotheses: Results and Causal Chain Analyses. Project MATCH Monograph Series, Vol. 8. Rockville, MD: National Institute on Alcohol Abuse and Alcoholism; 2001, p. 1828.
  • 55
    Hoyle R. H. Structural Equation Modeling: Concepts, Issues, and Applications. Thousand Oaks, CA: Sage Publications; 1995.
  • 56
    Muller D., Judd C. M., Yzerbyt V. Y. When moderation is mediated and mediation is moderated. J Pers Soc Psychol 2005; 89: 85263.
  • 57
    Cheong J. W., MacKinnon D. P., Khoo S. T. Investigation of mediational processes using parallel process latent growth curve modeling. Struct Equat Model 2003; 10: 23862.
  • 58
    Litt M. D., Cooney N. L., Kabela E., Kadden R. M. Coping skills and treatment outcomes in cognitive–behavioral therapy and interactional group therapy for alcoholism. J Consult Clin Psychol 2003; 71: 11828.
  • 59
    Moos R. H. Addictive disorders in context: principles and puzzles of effective treatment and recovery. Psychol Addict Behav 2003; 17: 312.
  • 60
    DiClemente C. C., Carroll K. M., Miller W. R., Connors G. J., Donovan D. M. A look inside treatment: therapist effects, the therapeutic alliance, and the process of intentional behavior change. In: BaborT. F., Del BocaF. K., editors. Treatment Matching in Alcoholism. Cambridge: Cambridge University Press; 2003, p. 16683.
  • 61
    Beutler L. E. Methodology: what are the design issues involved in the defined research priorities? In: OnkenL. S., BlaineJ. D., BorenJ. J., editors. Behavioral Treatments for Drug Abuse and Dependence. NIDA Research Mongraph no. 137. Rockville, MD: National Institute on Drug Abuse; 1993, p. 10518.
  • 62
    Dennis M. L., Scott C. K., Funk R. An experimental evaluation of recovery management checkups (RMC) for people with chronic substance use disorders. Eval Program Plann 2003; 26: 33952.
  • 63
    McLellan A. T., McKay J. R., Forman R., Cacciola J., Kemp J. Reconsidering the evaluation of addiction treatment: from retrospective follow-up to concurrent recovery monitoring. Addiction 2005; 100: 44758.