This article is the second of three papers that review strategies for enhancing the validity and utility of randomized clinical trials (RCTs) in addictions treatment research. The RCT is generally regarded as the method of choice for evaluating treatment efficacy because it maximizes internal validity, allowing investigators to eliminate alternative explanations (other than treatment) for observed differences in outcome between experimental groups (for other perspectives, see [1,2]).
The first paper in this series  focused on two fundamental components of RCTs, treatment implementation and research design, and underscored the importance of advance planning; treatment integrity and discriminability; standardization of treatment delivery; clinician training and supervision; client adherence; appropriate comparison groups; and maintenance of between-group equivalence across study conditions. Although emphasis was placed on potential threats to the internal validity of RCTs, researchers were also encouraged to consider methods for enhancing external validity, the ability to generalize study results to real-world treatment populations and clinical settings. Finally, it was suggested that the utility of addictions RCTs for advancing theory and improving clinical practice can be enhanced by investigating the mechanisms of change that underlie treatment effects.
In this second paper, we address topics related to participant samples and assessment methods. There is a variety of ways in which investigator decisions regarding research samples can compromise both internal and external validity, and these are discussed in sections that consider the importance of defining study populations by means of appropriate eligibility criteria; informed consent; sample size determination to optimize statistical power; recruitment and study enrollment procedures; sample retention throughout the various phases of theinvestigation; and automated systems for tracking participants as they progress through the trial. Considerations in the development of assessment batteries parallel many of the issues reviewed in relation to treatment implementation and research design. These are addressed in sections on eligibility screening and baseline assessment; treatment-related variables; treatment outcome measures; frequency of follow-up evaluation; and assessment process. A final section on pilot testing highlights the importance of conducting a feasibility study prior to commencing the actual RCT in order to refine recruitment and assessment procedures and to assess the competence of clinical and research staff.
As in our initial review paper, we rely heavily on the adult alcohol dependence treatment literature in our discussion of methodological issues. However, we also attempt to reference relevant resources from other areas of study. Although the same basic principles are applicable across most addictions RCTs, specific decisions with respect to participant samples and assessment methods will vary depending on the target population (e.g. indigent cocaine abusers versus heavy drinking college students) and the measurement requirements specific to the aims of particular intervention trials.
RCT study populations are defined in terms of specific eligibility criteria that are intended to specify the probable beneficiaries of treatment and to reduce within-sample variability, thereby increasing the statistical power to detect treatment effects. Addictions RCTs usually require that participants meet standardized diagnostic criteria (DSM-IV or ICD-10) for current substance abuse or dependence , and individuals with other drug dependencies or comorbid psychopathology (particularly psychoses) are usually excluded. To avoid confounding with investigational treatments, participants may be required to refrain from utilizing non-study services. Pharmacotherapy trials usually have more stringent criteria and are likely to exclude pregnant women and individuals with physical conditions that preclude medication use.
Eligibility criteria are also used to address factors that may interfere with study participation (e.g. reading deficits and language difficulties) or affect treatment adherence and/or research compliance (e.g. residential mobility). In pharmacotherapy trials, investigators often establish a ‘run-in period’ during which all eligible participants receive placebo; those who show poor medication compliance are excluded . Ethical concerns may also constrain enrollment (e.g. preventive interventions are not appropriate for severely dependent substance users). Finally, over-reliance on particular recruitment sources may limit sample diversity (e.g. Veterans' Administration hospitals where patients are disproportionately male).
Given restrictions on eligibility, it is not surprising that RCTs are frequently criticized for using samples that are insufficiently representative of treatment populations [1,5]. The empirical research on this issue is equivocal ; some studies show good correspondence between research samples and treatment populations (e.g. ), whereas others do not (see ). Although other differences may be present (especially in terms of socio-economic status and race/ethnicity ), recent research suggests that treatment-seeking individuals excluded from RCTs are not likely to be afflicted more severely than actual research participants .
To address the needs of underserved populations and improve the generalizability of research, funding agencies often require that grant submissions deal explicitly with the inclusion of women and racial/ethnic minorities . Although requiring additional resources, it is often possible to expand recruitment opportunities and enroll a relatively broad range of participants (e.g. employ bilingual staff, offer free or inexpensive child care, provide access to transportation). In addition to enhancing external validity, client diversity increases the likelihood that research findings will influence clinical practice. Further, sample variability enables investigators to conduct analyses of client variables that may mediate or moderate treatment effects [11,12] (see  regarding eligibility criteria and recruitment; see  regarding sample size and heterogeneity in alcohol dependence treatment studies).
Ethical concerns demand that RCT enrollees provide written informed consent prior to participation. Human subjects review boards (e.g. Institutional Review Boards, or IRBs) require that consent forms describe study aims; research design (including information regarding random assignment); assessment procedures; confidentiality safeguards; potential risks and benefits of participation; compensation; and alternative, as well as investigational, treatments. Enrollees must also be told that they are free to withdraw at any time without penalty. As noted in our first paper in this series, the nature of the consent process can affect client recruitment and, ultimately, undermine the internal and external validity of the trial. Evidence indicates that many participants fail to understand fully or retain the information presented (e.g. ), and it has been suggested that studies include a systematic evaluation of the quality of the informed consent process . We suggest further that potential participants be queried regarding the extent to which randomization and study treatment options influence their decisions to enroll in RCTs (see [16,17] regarding informed consent procedures).
Sample size and statistical power
Insufficient statistical power is a major shortcoming of many addictions RCTs . To ensure adequate power investigators should determine, prior to the start of the trial, the sample size needed to achieve study aims. Power calculations should consider clinical, as well as statistical, significance. Attention should also be paid to expected attrition rates and subgroup analyses, including analyses of women and ethnic/racial minorities and of client characteristics that may mediate or moderate treatment effects. These analyses involve an examination of statistical interactions which, compared with treatment main effects, require larger sample sizes to attain adequate levels of power. Complicated designs, particularly those that will involve the application of state-of-the-art statistical approaches (e.g. structural equation modeling, latent growth curve modeling), can require very large samples . (The computation of statistical power estimates is discussed in the third paper in this series.)
Recruitment and study enrollment
Once eligibility requirements and sample size are determined, specific enrollment goals should be set and recruitment strategies developed. To ensure that clinical and research staffing are adequate as the investigation progresses, it is advisable to develop a plan to manage flow into and through the overlapping phases of the trial (screening, assessment, treatment, follow-up) [13,20]. The recruitment of women and ethnic minorities necessitates that particular attention be paid to factors that limit the participation of these groups (e.g. child-care needs; access to transportation) (e.g. ).
Most treatment studies rely on referral networks and advertisements for recruitment. Referral networks tend to provide steadier rates of enrollment over time; however, reliance on any single recruitment method tends to limit generalizability. As noted in our first review article, in order to ensure that recruitment goals are met, it is often necessary to engage treatment providers within a referral network, particularly those who may be wary of research. In-service training seminars and ongoing communication regarding the progress made by referred participants can often overcome the resistance or mistrust of network practitioners (for detailed descriptions of recruitment strategies and enrollment procedures, see [13,20,22]).
After the study is under way, participant attrition, especially differential dropout by condition, becomes a factor that can seriously threaten both internal and external validity. Efforts must be made to assess all enrollees at follow-up evaluations, including those who do not comply with treatment. Studies comparing results for easily assessed clients (the first 70% of clients interviewed at follow-up) with samples that include clients who were more difficult to contact (the next 20% of clients interviewed at follow-up) have shown significant biases for reports of alcohol, cocaine, and opioid use, biases which were not eliminated by the use of covariates . Historically, attrition from alcohol dependence treatment studies has been higher for RCTs than for observational investigations . More recently, attention has focused on reducing attrition, and retention rates in many studies now exceed 90% (e.g. ).
Various strategies can be used to enhance retention rates in addictions RCTs. These include financial compensation for the time and effort involved in completing assessments, as well as other incentives, such as logo tee-shirts and mugs, that serve to ‘brand’ the project and increase its salience in participants' lives. Attention to the practical needs of clients (e.g. evening appointments for those who are employed; beverages and snacks) is also important. Remembrances such as birthday cards allow investigators to maintain contact with participants and, at the same time, communicate personal interest in their wellbeing.
Another effective sample retention strategy involves the use of ‘locators’. At each assessment session, participants should be asked to provide the names, addresses and telephone numbers of at least four different individuals who would know their whereabouts should changes in residence or other personal data occur (e.g. telephone number, name changes due to marriage). Following each assessment session with the client, locators should be contacted to confirm the accuracy of their own contact information, and they should be instructed to alert the investigators to any changes that occur during the course of the trial (for additional recommendations for improving follow-up rates, see [23,25–28]).
Participant tracking systems
In planning an RCT, information needs should be determined a priori to ensure that requisite data are available for analyzing participant attrition before, during and following treatment. To facilitate data management, a computerized tracking system should be established to record data regarding potential participants, actual enrollees, and reasons for ineligibility or refusal . This same system can be used to monitor progress in achieving enrollment goals, sample characteristics, blood and urine collection, treatment adherence, follow-up completion, collateral informant interviews and other data-gathering activities.
Assembling an assessment battery requires consideration of the broad range of variables that will measured during the various phases of the trial. Assessments are likely to cover a variety of types of data, including chart reviews; participant self-reports; blood and/or urine test results, breathalyzer readings and information based on physical examinations; treatment provider measures; and collateral informant reports. In terms of participant self-reports, interviews (in-person and telephone) and self-administered questionnaires predominate; however, many additional technologies have become available, including computerized (e.g. ) and web-based (e.g. ) instruments, automated voice response systems (e.g. ), hand-held electronic diaries that prompt participants for responses in real time (e.g. ) and devices aimed at measuring recent (e.g. ) or cumulative (e.g. ) substance use (see  regarding many of these alternatives; see  regarding technologies for assessing daily events). As described below, additional assessments are used to corroborate self-reports, some of which may involve other respondents (e.g. collateral informant interviews [38,39]). Regardless of the type of data gathered, collection mode or respondent role, selected instruments should be standardized and have demonstrated reliability and validity.
The design of any assessment battery should consider the possibility that responses may be contaminated by order effects (e.g. ) or reactivity (e.g. ). Although it may be difficult to eliminate these measurement artifacts completely, their influence can often be assessed or controlled. To minimize order effects, it is generally advisable to administer open-ended instruments early in the assessment sequence, and to randomize or counterbalance the completion order of self-administered questionnaires across participants.
Reactivity occurs when the act of measurement itself influences the behavior that is observed. Reactivity is most likely to occur when assessment sessions are lengthy or when repeated measurements direct the respondent's attention to problematic behavior patterns. Evidence of reactivity in substance abuse research comes from studies showing a suppression of drinking in self-monitoring studies of alcohol consumption (e.g. ; see also ). Reactivity can affect intervention outcomes, interact with treatment manipulations and mask differential treatment response . The potential for reactivity can be reduced by maintaining a non-judgemental attitude in response to clients' reports, by embedding sensitive items within questionnaires that cover a range of different topics and by decreasing the length and frequency of assessment sessions. Measurement reactivity may be detected by statistical techniques that uncover trending in data sequences (see [41,44,45] regarding assessment issues).
Screening for eligibility and baseline assessment
Eligibility determination for research volunteers involves assessment of inclusion and exclusion criteria and, depending on study requirements, may necessitate several stages including, for example, diagnostic interviews, evaluations of physical health, assessments of reading level and locator confirmation. In addition to ascertaining eligibility, information regarding recruitment source should be recorded at intake; these data can be useful in assessing sample representativeness , and they can assist investigators in evaluating the effectiveness of various recruitment methods. It is also advisable to obtain data regarding treatment population characteristics (e.g. aggregate patient data from recruitment facilities) and, as already suggested, it is essential to collect as much information as possible from individuals who do not qualify or who withdraw from the enrollment process (including reasons for exclusion/withdrawal).
Intake assessment batteries for addictions RCTs generally include baseline measures of substance use, related problems and secondary or alternative outcome measures (e.g. liver function tests); treatment history; diagnoses of comorbid, as well as substance use, disorders; characteristics that may predict outcomes (e.g. dependence severity, motivation for change, social support, psychopathology); and pretreatment levels of variables that may mediate treatment effects (e.g. coping skills for cognitive behavioral therapy) . Client expectancies regarding the efficacy of study treatments should also be measured, as these may affect attrition, as well as treatment outcomes (e.g. ; see also [46,48,49]).
As suggested in our first paper, several different types of variables should be assessed during the trial's treatment phase. Therapist and client session checklists can document if, and when, various topics or strategies are covered during treatment [50,51], and session audio- or videotapes can provide indispensable data regarding treatment content and delivery. Both types of data are important for evaluating clinician performance and for establishing treatment integrity and discriminability.
Client adherence to treatment protocols should be measured; both intra- and extra-session activities (e.g. homework completion, attendance at self-help meetings, involvement in other forms of treatment) should be recorded. Depending on the intervention studied, it may be advisable to assess variables such as significant-other involvement.
In addition to tracking adherence during the treatment period, clinical researchers should consider measuring potential mediating variables and substance use outcomes at multiple time-points during the treatment phase in order to facilitate the investigation of change mechanisms . Similarly, both clients and providers should complete instruments related to treatment process (e.g. therapeutic alliance) (see [51,52]).
In pharmacological RCTs, adverse events should be systematically assessed in compliance with government regulations (see ). If the number of providers is sufficient in studies involving behavioral interventions, therapist attributes should be measured (e.g. demographic characteristics, background and experience, beliefs, expectations, theoretical orientations) [54,55]. This assessment will allow investigators to determine the influence of therapist variables on treatment outcomes and to assess whether these factors are confounded with treatment condition. Finally, regardless of the type of trial, participants should be asked to rate their satisfaction with various aspects of treatment delivery .
Treatment outcome measures
Collectively, outcome variables measured during follow-up evaluations should register changes in the condition studied, enable comparisons with other research, be relevant to intervention goals and be both multi-dimensional and clinically meaningful. Although researchers usually examine a range of outcome variables, a small number of indicators, typically measures of substance use, should be designated a priori as primary outcome measures for tests of treatment efficacy. A number of different variables have been used in addictions research . For treatments of alcohol dependence, frequency and intensity of consumption have served as primary outcomes in several major studies, and some investigators have recommended that these variables, together with drinking consequences, be adopted more widely (e.g. ). Percentage days of heavy drinking (six or more standard drinks for men, four or more for women) was endorsed as the optimal outcome measure at a conference held by the National Institute on Alcohol Abuse and Alcoholism in the United States [59,60].
In addition to comparability, investigators should consider treatment goals and hypothesized mechanisms of action in selecting primary outcome measures. For example, sustained sobriety might be an appropriate outcome variable for abstinence-oriented treatment, whereas reductions in heavy drinking occasions and alcohol-related problems might be more relevant for harm reduction interventions. Similarly, the pharmacological agents, naltrexone and acomprosate, tested in the COMBINE study, a multi-site RCT of alcohol dependence treatment, were expected to achieve their effects via different influences on brain chemistry, and primary outcome variables were chosen to reflect these different mechanisms . Outcomes that are tailored to the predicted effects of treatment can yield significant improvements in statistical power .
To increase flexibility for creating measures that serve multiple goals, we recommend a particular type of assessment, daily estimation techniques, rather than a specific outcome measure. Research evidence indicates that both retrospective (e.g. Timeline Followback ; Form 90 ) and prospective (see ) procedures yield reliable and valid data (see  regarding the relative merits of retrospective and prospective approaches; see also [37,65]). Although daily estimation methods involve time and expense, they have several advantages when compared to questions that ask respondents to report their typical behavior or to summarize their substance use for specific time intervals.
First, daily estimation procedures produce data records that permit computation of many different consumption variables, including time-to-event and event duration measures, as well as quantity, frequency and intensity indices [36,37,60]. Event-driven variables have revealed treatment effects in some studies when other summary outcome measures have not. For example, in the Outpatients arm of Project MATCH no treatment main effects were found for the two primary outcome variables (percentage of days abstinent, drinks per drinking day); however, a significant advantage was observed for Twelve-Step facilitation (versus cognitive-behavioral and motivational enhancement therapies) in the analyses of time-to-first-drink . Secondly, measures derived from daily estimation procedures can be computed over differing time intervals to facilitate comparisons across studies with differing follow-up durations. Thirdly, daily estimation methods produce the type of data that are suitable for statistical procedures that can model all or parts of the continuous record of substance use (e.g. latent growth curve modeling [67,68]). These analytical techniques, which compare treatment groups in terms of model parameters reflecting individual change over time (e.g. slope), are consistent with conceptualizations of outcome as a temporal process, rather than a static end-point (e.g. [2,52,69–71]).
The consensus among investigators is that assessment of intervention effects should be multi-dimensional ([4,46,72,73]). Thus, in addition to variables that directly reflect substance involvement, indicators of psychosocial functioning and quality of life should be measured during follow-up sessions . Potential mediators of treatment effects also should be assessed. Additionally, treatment utilization and involvement in self-help groups (e.g. Alcoholics Anonymous) should be measured. Although these variables can be regarded in some studies as outcome variables in and of themselves, differential use of non-study services, especially during the trial's treatment phase, can pose a threat to internal validity by offering an alternative explanation for substance use outcomes.
Clinical relevance is an additional consideration with respect to treatment outcome assessment. Statistically significant mean differences in consumption are not regarded as meaningful in many clinical settings. To address this issue, investigators have often attempted to define a priori criteria that differentiate treatment ‘responders’ from ‘non-responders’. Alternatively, clinical researchers have developed indicators (composite outcomes) that combine consumption status and substance-related problems to produce a set of mutually exclusive outcome categories that are highly descriptive in terms of substance involvement and index outcomes that are progressively more negative (e.g. abstinent, using, using heavily, using heavily with consequences) (e.g. [75,76]; see also ).
Finally, investigators should develop an ‘exit interview’ for clients who withdraw from the trial both to measure the participant's outcome status at termination and to determine the reasons for withdrawal . Although this information may not be attainable for those lost to follow-up, many participants who withdraw are willing to complete a brief interview or questionnaire. In addition to generating information about how treatment and research protocols are perceived, exit interviews can often provide data that can be used in statistical analyses of treatment efficacy. (Methods for dealing with attrition in RCT outcome analyses are discussed in the third paper in this series; see also [57,60–62,72,73,77] regarding outcome measures.)
Frequency of follow-up evaluation
In the first paper in this series we recommended a follow-up period of at least 1 year post-treatment in order to assess the longevity of treatment effects and to compare results across studies. Within this time-frame, investigators must decide how often to measure outcomes. Follow-up evaluations should occur sufficiently often to maintain contact with participants and to capture change reliably but, at the same time, not be reactive or overly burdensome . The time interval between follow-up sessions should be influenced by the type of outcome measures selected and the statistical analysis plan. Some outcome variables (e.g. time-to-event measures) may require monitoring that is sufficiently frequent to permit fine temporal resolution of the occurrence of critical events. Similarly, some statistical procedures may require a minimum number of follow-up points (e.g. growth curve modeling). Power to detect treatment effects is enhanced by increasing the number of assessments. In most studies, quarterly follow-up evaluations (roughly every 3 months) will allow investigators to achieve study objectives without overburdening participants. This interval has been used successfully in previous studies, and the recommended daily estimation methods show good reliability over 90-day time-periods for reports of alcohol  and other substance  use.
To optimize data quality, the process, as well as the content, of assessment requires attention. As with treatment delivery, a manual is needed to standardize assessment procedures, and research assistants require extensive training and ongoing supervision. Training should include role-playing exercises and practice assessments with representatives of the study population, as well as didactic sessions regarding study methods. As the trial progresses, regular staff meetings should be held to review procedures and to discuss issues that arise during the course of conducting assessment sessions. Each research assistant should assess clients in all experimental conditions; to minimize variance associated with the use of different staff (and to build rapport), it may be advisable to assign assessors to clients for the duration of their participation . Client interviews should be audiotaped to facilitate performance monitoring, as well as to provide records of participants' responses (see [22,28,80,81] regarding staff recruitment, training and supervision).
With respect to clients, clinical researchers must attempt to balance assessment needs with sensitivity to participant burden, which can reduce cooperation and response accuracy. Self-reports of substance use and related behaviors are generally reliable and valid when assessment sessions are structured to optimize response veracity [36,82,83]. Many strategies can be used to enhance client cooperation and motivation, including the use of commitment agreements; emphasizing confidentiality; constructing the assessment battery to minimize fatigue, boredom and repetition (e.g. alternating self-administered questionnaires with personal interviews); compensating participants for research time; and collecting data from independent sources to verify participants' self-reports (e.g. biological assays, collateral informants, medical records). In addition to improving the validity of substance use estimates, alternative data sources provide a means for evaluating participant veracity; however, each type of data source has its own limitations, which should be considered in analyses of self-report accuracy ([36,83,84]. Although it is often assumed that discrepancies between self-reports and alternative sources of information occur because participants deny or minimize their substance use, research indicates that the opposite pattern (clients report use that is contradicted by other indicators) is quite prevalent (e.g. ).
A final concern during assessment sessions is the respondent's sobriety, which has been shown to influence self-report veracity (e.g. ). Many alcohol treatment trials have failed to ensure that clients were sober during follow-up evaluations . Alcohol consumption can be assessed with a breathalyzer at in-person assessment sessions; for telephone interviews and for other substances, staff should inquire about substance use prior to the interview and be alert to any indications of use.
The feasibility of conducting the trial according to protocol should be evaluated in a pilot study using clients drawn from the same recruitment sources as those who will be used in the actual trial [46,86]. This will enable investigators to determine whether recruitment and retention levels can be maintained across treatment conditions and to adjust enrollment strategies accordingly. Similarly, treatment adherence and the occurrence of adverse consequences can be examined. If problems occur (e.g. assessment sessions are too lengthy), study procedures should be modified. Additionally, pilot tests provide an opportunity for investigators to monitor the performance of clinical and research staff. For instruments that involve personal interviews (e.g. diagnostic assessments), researchers should evaluate systematically inter-rater reliability [80,87]; if found to be inadequate, additional training should be provided until a performance criterion is achieved (see [28,86,88,89]).
SUMMARY AND CONCLUSIONS
In this second paper on enhancing the validity and utility of addictions RCTs, we have focused upon participant samples and assessment methods. We have highlighted the importance of advance planning to ensure that recruitment strategies target a broad range of appropriate clients and that statistical power will be adequate for achieving study aims. We have emphasized the potential threats to internal and external validity posed by participant attrition during all phases of the trial, and we have offered a variety of specific strategies, as well as resources, that can assist investigators in meeting enrollment and retention goals. We have underscored the importance of collecting information about treatment population characteristics, as well as reasons for ineligibility, refusal to participate and withdrawal from the trial, and we have recommended the use of a computerized client tracking system to manage data regarding recruitment, treatment adherence and completion of research activities.
We have argued for comprehensive assessment at baseline and during the trial's treatment phase to facilitate the investigation of change processes and mechanisms of action. At the same time, in order to minimize participant burden, we have emphasized the need to specify hypotheses regarding change mechanisms a priori so that assessment batteries are limited to those variables that can reasonably be thought to mediate or moderate treatment effects. We have suggested that follow-up evaluations be conducted at 3-month intervals, and rather than endorse a particular outcome measure, we have recommended the use of daily estimation procedures for outcome assessment. These instruments permit the computation of a wide range of outcome indicators that can be used both to tailor outcome measures to particular treatments and to compare results across studies. Further, daily estimation procedures produce data that are compatible with a conceptualization of outcomes as temporal processes, a perspective that is consistent with current views of addiction as a chronic condition (e.g. [69,70]). Finally, we have suggested that investigators should complete a pilot study in order to refine procedures and assess the performance of treatment staff and research assistants.