How often are outcomes other than change in substance use measured? A systematic review of outcome measures in contemporary randomised controlled trials.

ISSUES
Recovery is a theoretical construct and empirical object of inquiry. The aim was to review whether outcome measures used in randomised controlled trials of drug treatment reflect a comprehensive conceptualisation of recovery.


APPROACH
Systematic review using the following databases: Cochrane Database of Systematic Reviews, Cochrane Controlled Register of Trials, Database of Abstracts of Reviews of Effect, Web of Science, MEDLINE, Embase and PsycINFO. Search returned 6556 original articles and 504 met the following inclusion criteria: randomised controlled trial in English-language peer-reviewed journal; sample meets criteria for drug dependence or drug use disorder; reports non-substance use treatment outcomes. Review protocol registration: PROSPERO (CRD42018090064).


KEY FINDINGS
3.8% of the included studies had a follow up of 2 years or more. Withdrawal/craving was present in 31.1% of short-term versus 0% of long-term studies. Social functioning in 8% of short-term versus 36.8% of long-term studies. Role functioning (0.9 vs. 26.3%), risk behaviour (15.6 vs. 36.8%) and criminality (3.8 vs. 21.1%) followed a similar pattern. Housing was not examined short-term and unregularly long-term (2.0%). 'Use of health-care facilities', clinical psychological, behavioural factors were frequently reported. Physiological or somatic health (15.2 vs. 10.5%), motivation (14.2 vs. 15.8%) and quality of life (7.1 vs. 0%) were less frequently reported.


CONCLUSION
The short time interval of the follow up and lack of information on relevant factors in recovery prevents the development of evidence-based approaches to improve these factors. Particularly, measures of social and role functioning should be added to reflect an adequate conceptualisation of recovery.


Introduction
There is little consensus on the conceptualisation of long-term recovery in the drug use disorder (DUD) use literature. Recovery operationalisations influence treatment research, inform clinical practice and determine the efficacy or effectiveness of treatments and interventions. Thus, these operationalisations need to be valid to understand what is and what is not high-quality care. In severe mental illness, the operationalisation of recovery is more developed than in DUD [1]. Concrete operationalisation suggestions have been made (e.g. personal and clinical recovery), including functional and social aspects central to recovery in severe mental illness [2][3][4]. While specific factors, such as reduction in criminality, are more prominent in DUD recovery than in recovery from severe mental illness, general core factors, including an increase in community and social functioning, are common to these conditions [5][6][7][8][9]. The same applies for the reduction in core symptoms, for example substance use and severe psychiatric symptoms, as essential for achieving stable long-term recovery [10][11][12]. In this systematic review, we propose that conceptualisations of recovery from severe mental illness are applicable in DUD. Second, we systematically review to what extent substance use outcome measures used in randomised controlled trials (RCT) of drug treatment reflect a comprehensive understanding of recovery.
Clinical recovery traditionally refers to mental illness or DUD as distinct disorders displaying core symptoms. Clinical recovery is achieved when the core symptoms subside below diagnostic thresholds. Furthermore, the criteria for clinical recovery are based on researcherderived thresholds and predefined objectives, including symptoms and functioning. Recovery also has a temporal criterion intended to indicate the stability of the recovery [4,13,14]. While subject to ongoing debate, a minimum duration of 2 years has been proposed. Two years allows for the possibility of new habits and behaviours to take hold, a relapse may have occurred or not, the maintenance of a drug-free social network has begun to consolidate, etc. [15][16][17]. There is more widespread agreement on symptom criteria for changes in drug use (i.e. use to abstinence or moderation) in the DUD literature [18,19]. However, consensus is lacking regarding criteria for functional and social recovery. Because of the extensive identity changes that are often considered necessary to handle a drug-free life, or even drug moderation, some have set a 5-year temporal criterion for DUD recovery [20][21][22][23].
The personal recovery tradition arose as a reaction to researcher-derived recovery criteria. Personal recovery is conceptualised beyond core symptom reduction as: '…a process of restoring a meaningful sense of belonging to one's community and positive sense of identity apart from one's condition while rebuilding a life despite or within the limitations imposed by that condition' [24,25]. Synthesising the research on personal recovery into an empirically based concept, Leamy et al. [26] outlined the Connectedness, Hope & Optimism, Identity, Meaning and Empowerment framework, in which five main long-term processes characterise recovery: (i) connectedness; (ii) hope and optimism; (iii) identity; (iv) meaning in life; and (v) empowerment. Empirical research suggests that these processes are relevant for DUD recovery [6,12,21,22].
The relational recovery tradition critiques the clinical and personal recovery approaches for not incorporating the interpersonal embeddedness of recovery [27]. This framework sees interpersonal contexts as permeating individualistic or subjective concepts like connectedness and self-agency [28], and advocates against conceptualising recovery as separate from the social and relational reality that partly defines the potentialities for each individual. These issues are just as relevant for DUD as for serious mental illness [29,30].
Though there are differences between these three approaches, the perspectives of clinical, personal and relational recovery share common ground [30]. Consistent with empirical findings, symptom reduction is seen as a necessary but insufficient requirement to maintain recovery over time. Although clinical recovery is unique in its definition of a concrete temporal criterion [15,16], recovery is universally described as a non-linear and cumbersome long-term growth process, with the threat of relapse often present. It is also acknowledged that a good outcome sometimes requires a long-term care effort [11][12][13]31]. Empirical support for these findings is solid and consistent across different clinical disciplines and research traditions [10,17,22,[32][33][34][35]. On this basis, it is proposed that treatment outcome research in DUD should reflect these findings when assessing recovery.
The aim of this review was to systematically review and identify non-substance use (non-SU) treatment outcome measures used in RCTs on illicit drug use over the last 10 years, and to assess the degree to which they reflect any of the above-mentioned perspectives of recovery. RCTs were chosen because this methodology is generally considered the most valuable for both evaluating treatment efficacy and effectiveness and developing treatment guidelines.

Methods
This review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [36] to ensure comprehensive and transparent reporting of procedures and results. The protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO) in March 2018 (registration number: CRD42018090064) (Appendix 1).

Search strategy
Two independent researchers (JB and SN) conducted a search of the literature using the following electronic databases: Cochrane Database of Systematic Reviews, Cochrane Central Register of Controlled Trials, Database of Abstracts of Reviews of Effect, Web of Science, MEDLINE, Embase and PsycINFO. Variations and combinations of terms targeting five main concepts were used in the search: RCTs, substance abuse, substances, therapeutic approaches and recovery success. Subject headings belonging to the individual databases (e.g. MeSH subject terms) and free-text terms (see Appendix 2 for model search) were also used. The search queries were reviewed by an information scientist. In addition, a hand search was performed using reference lists from reviews and meta-analyses identified in the main search. In cases of doubt, the full-text paper was read to determine eligibility. Papers published between January 2008 and January 2019 were included. The last search was conducted on 11 January 2019.

Eligibility criteria
The included articles met the following criteria: • Empirical study published in English-language peerreviewed journal. • Study sample meets the criteria for dependence syndrome (International Classification of Diseases, 10th revision) or moderate-severe DUD (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition). • Randomised controlled trial.

Exclusion criteria
Articles were excluded if the study sample was only or predominantly comprised of individuals with alcohol dependence, or if the study did not include nonsubstance use outcomes.

Data collection
All potential studies were exported into a reference citation manager (EndNote) before removing duplicates. Two independent reviewers (authors JB and SN) separately performed the screening of titles and abstracts, fulltext analysis and selection of non-SU treatment outcome measures. Outcome categories (as presented in Tables 1-3) were developed during 13 consensus meetings (≈60 min each, JB and SN) and existing taxonomies as given below. Disagreements were resolved through discussion until consensus was reached. A third reviewer (JRM) was available to resolve disagreements and provide critical evaluation.
Analytic methods and data extraction procedure A narrative descriptive synthesis was performed for the included articles. The qualitative synthesis was used to determine the taxonomy of non-SU outcomes. We used the suggested taxonomies of Dodd et al. [ [73] were used to adapt the categorisation specifically to DUD. Where we could not find normative taxonomies covering outcomes satisfactory, or we assessed factors as particularly relevant and specific for DUD (e.g. criminality), we used the study authors' outcome operationalisations as a compass for developing categories. In this context the following data extraction procedure was used: first, non-SU treatment outcome measures across different domains (e.g. work, community functioning, social functioning, health behaviour) were identified. Second, the properties of each outcome measure were analysed and categorised based on similarity (e.g. hepatitis C and HIV related to risky sexual behaviour were both organised under the 'Risk behaviour' tab in Table 1). Contemporary recovery perspectives address issues of functioning (e.g. community and social), incorporate various perspectives on outcome (e.g. service user and researcher perspectives) and are explicit that a longterm perspective is crucial particularly with regards to functional recovery [11,12,29,30]. Since research on recovery has been growing over the past 10 years, this became a central rationale for the time limitation in our search-to test whether the DUD field had incorporated this shift in focus, from symptom relief (typically some measure of change in substance use), to more explicitly addressing function and social factors as important outcome measures.
For the same reasons, the second part of the synthesis was a pre-planned sub-analysis to identify long-term studies using non-substance use outcomes. Here, cutoff was set to studies with a follow up of at least 2 years, following Lieberman's criteria of stable recovery [2]. Also, the temporal criterion was set to 2 years, as this is suggested as the temporal requirement for recovery in the clinical recovery literature [15,17]. Acknowledging the debate in this area, and some researchers advocating a temporal criterion up to 5 years [20][21][22][23], our 2-year criteria can primarily be viewed as a practical tool and as a minimum criterion to identify long-term studies. Finally, descriptive statistics were generated, aimed at summarising and quantifying significant treatment effects across studies.

Search results
The electronic search returned 6556 articles. After duplicates were removed, 4545 articles remained. A hand search of reference lists from reviews and meta-analyses returned a further 21 articles. Full-text evaluation was conducted for 761 articles, of which 504 met the inclusion criteria and were included in the final synthesis. Details of the search results are summarised in Figure 1.
Since the number of screened and included articles was extensive, it was necessary to develop superordinate categories (e.g. social functioning). Seven non-SU outcome categories and seven sub-categories were developed.

Non-SU outcome measures
Details of the included non-SU outcomes are summarised in Table 1 (see Appendix 3 for substance use measures used in the included articles). The five most frequently included outcomes were: clinical factors (from the category psychological/behavioural factors) (n = 196); use of healthcare facilities (from the category functioning) (n = 179); risk behaviour (n = 104); physiological/clinical (somatic) (n = 103); and withdrawal/cravings (from the category adverse effects) (n = 93). The five least frequently included outcomes were: housing (n = 11); role functioning (from the category functioning) (n = 28); criminality (n = 40); global functioning-mostly community-related functioning (from the category functioning) (n = 51); and quality of life (from the category functioning) (n = 51). In comparison, all studies had at least one DUD measure, which was also almost always reported as an outcome. Substance use outcome measures were spread across 22 different subcategories (e.g. days of drug use last month, substance use problems past 90 days, illicit opiate use).

Follow-up duration
From the included 504 research studies, 42.1% had less than 13 weeks of follow up, 29.6% had between 13 and 26 weeks, 21.8% had between 27 and 52 weeks, 2.8% had between 53 and 103 weeks and 3.8% had at least 2 years of follow up. The longest follow up was 416 weeks.

Relation between length of follow up and non-SU outcomes included
The most evident differences in non-SU outcome inclusions emerged between studies with less than 13 weeks of follow up and studies with at least 2 years of follow up (see Table 1 Long-term interventions and reported effects on DUD and non-SU outcomes Table 2 displays details on studies with follow ups of between 1 and 2 years, and Table 3 presents details on studies with at least 2 years of follow up. Reported treatment effects are also presented. Slightly over two-thirds (69.7%) of the studies evaluated what may be termed complex interventions, which were primarily treatment programs with multiple components or several treatments/treatment elements merged together. Conversely, slightly less than one-third (30.3%) of the studies evaluated more narrowly focused interventions, usually single treatments such as cognitive behavioural therapy or targeted HIV-prevention programs. Ten percent of the studies showed a positive effect on DUD outcomes but no effect on non-SU outcomes. Conversely, 6.7% had a positive effect on non-SU outcomes but no effect on DUD outcomes. In total, 57.6% of the studies showed a significant positive effect on at least one of the non-SU outcomes examined during the intervention period and/or during follow up. Slightly more than half of the studies (54.6%) had at least one significant positive  COMPLEX refers to complex intervention program. SPECIFIC refers to specific intervention program. ACT, acceptance and commitment therapy; CBT, cognitive behavioural therapy; CM, contingency management; DUD, drug use disorder; EBFT, ecologically based family therapy; non-SU, non-substance use; PTSD, post-traumatic stress disorder; RP, relapse prevention; SUD, substance use disorder.    Outcome measures in substance use research effect on DUD outcome, and 42.4% had a significant positive effect on at least one non-SU outcome and at least one DUD outcome, indicating a more general positive recovery effect.

New agendas for contemporary recovery research
This review reveals that only a limited number of RCTs have been conducted using non-SU factors as treatment outcomes over time. Only 19 of the 504 included studies (3.8%) had follow ups of at least 2 years. Of these, 11 studies (2.2%) had follow ups of longer than 2 years. Given the suggested temporal criterion of a minimum of 2 years' follow up for recovery, this finding alone suggests that the substance use RCT treatment literature from the past decade only reflects the above-mentioned perspectives of clinical, personal and relational recovery to a very limited degree [1,26,27,31]. Focus on functional and social recovery are prominent in all these perspectives. Functional and social recovery are non-linear and cumbersome processes that usually require more time than that required to achieve abstinence [11][12][13]31]. The threat of relapse may continue for years following the achievement of abstinence [5][6][7][8][9]. Hence, contemporary substance use RCT research may omit important social recovery factors and processes, including loneliness, social alienation and the pursuit for citizenship [2,8,29,30]. When poorly handled, these factors are related to poor course development and relapse. Conversely, when overcome, they facilitate personal growth, perceived agency and social inclusion, possibly making the hard work of recovery attractive and seen as a realistic life solution over time [12]. Further, the ways in which people strengthen and maintain functional outcomes over time, such as increased school participation or more frequent social meetings [20,21,26], are difficult to understand, given the current evidence base. This requires a longitudinal study design and focused mediation analyses, which are usually beyond the scope of most RCTs. These limitations make it challenging for clinicians to work from an evidence base in their attempts to tailor phase-specific DUD treatment strategies for long-term recovery efforts.
In line with contemporary recovery research, the 3.8% of studies with a follow up of at least 2 years are more likely to report general health and recovery effects than studies with shorter follow ups. However, one limitation of these 19 studies is that they typically report the non-SU outcomes of psychological health (typically reduction in depression) and use of health-care facilities (typically treatment retention), but do not report on other non-SU outcomes. Only seven studies (1.4%) reported CBT, cognitive behavioural therapy; COMPLEX refers to complex intervention program; DUD, drug use disorder; HAT, heroin-assisted treatment; ICM, intensive case management; MDFT, multidimensional family therapy; NIDA, National Institute on Drug Abuse; non-SU, non-substance use; SPECIFIC refers to specific intervention program.
social functioning outcomes, five (1.0%) on role functioning, four (0.8%) on criminality, two (0.4%) on global functioning and zero studies on quality of life. The severely limited number of studies measuring these factors stands in contrast to the fact that they have consistently been associated with good and stable DUD outcomes in the recovery literature [10][11][12]31,74,75]. Moreover, conclusions that cut across different recovery traditions around what constitutes recovery-for example long-term increase in community and social functioning, along with reductions in or elimination of substance use [4,26,27]-are largely ignored. Likewise, the increasing trend of studies using only one non-SU outcome in addition to change in substance use (41.2% between 2008 and 2013 vs. 55.1% between 2014 and 2019) represents a step away from the longitudinal and multi-dimensional study approach required to investigate long-term recovery.

Limitations
The strengths of the study are evident in its protocol's public availability before the review was conducted (via PROSPERO); this ensured transparency and that the review was conducted according to PRISMA guidelines [36]. One limitation concerns the fact that no advanced statistical tests were used to assess the reliability and validity of the reported findings of the included studies. The scope of the paper was to evaluate outcome measures and not treatment efficacy, per se. Another limitation is that each individual study was not assessed for key sources of biases (e.g. sample characteristics). In addition, and in line with previous research, some studies were based on small samples, and most instruments were constructed and tested within Anglo-American cultures. This typically increases the risk of reporting bias, suggesting that the included studies represent selective research dissemination. However, it should be emphasised that the aim was to identify outcomes with a high level of use within the field and that the search was conducted within several literature databases. The included studies did use samples with somewhat different characteristics (e.g. sex, age and level of symptomatology), which may violate the transitivity assumption and thus raises questions regarding the validity of direct comparisons across the included studies.

Suggested research directions
To improve the scientific knowledge base of treatment outcomes in DUD it will be advisable to incorporate functional and social outcomes into longitudinal research designs more consistently. These outcomes are already actively used by other initiatives, such as the Treatment Episode Dataset discharge data [76]. Empirical studies indicate that future research should focus on detailing the specific effects of social and community functioning in recovery. For example, we need to know more about which treatment interventions bring about sustained improvements in these areas, and which post-treatment factors mediate improvements in social and community functioning. In addition, a more valid temporal criterion that would enable professionals to more accurately identify vulnerable phases in recovery would be useful for tailoring treatment efforts towards expected fluctuations in relapse. A broad investigation should also aim to overcome specific limitations inherent in the RCT study designs, including sensitivity to contextual factors and comparison of single, common clinical metric across different study contexts. As suggested by Donovan et al. [77], applying within-study comparisons may be a more valid alternative to studying complex phenomena, such as recovery in DUD. Furthermore, systematic inclusion of service-user perspectives could prove a viable route to meet this aim [78]. By asking individuals with first-hand experience and those outside of the traditional scientific community for input in the research design, the risk of implementing measures with low ecological validity is considerably reduced [79]. In practical terms, the application of a mixed research design, combining exploration, hypothesis development and further large-scale testing (RCTs), could be a feasible solution.