Summary of findings
Description of the condition
During the past few decades, a wide range of therapeutic interventions backed by some randomised evidence have been developed for people with schizophrenia and related psychotic disorders. These interventions include pharmacological treatment with first- and second-generation antipsychotic agents and cognitive behavioural therapy and other psychological interventions, as well as family interventions and psychoeducation, social skills training, vocational rehabilitation and other psychosocial rehabilitation techniques (NICE 2010). Under ordinary circumstances, however, these evidence-based interventions are not easily translated into practice, as access to and use of the evidence base are not straightforward for most healthcare providers in most countries of the world (Grol 2003; Grol 2008). As a consequence, a huge gap between the production of evidence (what is known) and its upload in practice settings (what is actually done) has been repeatedly highlighted and described in different countries and in different healthcare systems (Sederer 2009).
Description of the intervention
In recent years, to promote the use of evidence, new methodologies for aggregating, synthesising and grading the quality of evidence extracted from systematic reviews have progressively been developed, and approaches for creating clinical practice guidelines based on explicit assessments of the evidence base are commonly employed in several fields of medicine, including schizophrenia and related psychotic disorders (Barbui 2010; Barbui 2011; WHO 2010). It is interesting to note that although the pathway from evidence generation to evidence synthesis and guideline development is highly developed and quite sophisticated, the pathway from evidence-based guidelines to evidence-based practice is much less developed (Fretheim 2006). The key issues are (1) whether guidelines may have any impact on healthcare provider performance and thus on patient outcomes and (2) how implementation should be conducted to maximise benefit (Gagliardi 2012). This is particularly relevant for those involved in producing and delivering evidence-based recommendations, including international organisations such as the World Health Organization (WHO), scientific bodies such as the World Psychiatric Association (WPA) or the American Psychiatric Association (APA) and national institutes such as the UK National Institute for Health and Care Excellence (NICE), and for those with responsibilities in delivering high-quality mental health care, including national and local managers of mental healthcare systems, scientific organisations and even single healthcare professionals.
How the intervention might work
Implementation methods range from simple interventions, such as dissemination of educational materials, to more complex and multifaceted interventions, including tutorial and consultation sessions; treatment algorithms, reminder systems and audit and feedback; and use of psychological theories to overcome obstacles (Grimshaw 2004).
Why it is important to do this review
Current knowledge on how implementation programmes should be developed is very scant. The only systematic review conducted to date—to our knowledge—did not focus on schizophrenia and related disorders and included observational studies in addition to randomised evidence (Weinmann 2007). Additionally, the literature search of this review was last updated in 2006; therefore, it does not include studies published thereafter. Mental health systems that set a commitment to evidence-based practice as a policy priority need to know urgently whether guidelines may have any impact on healthcare provider performance and on patient outcomes, and how implementation plans should be developed to maximise benefit at sustainable costs (Tansella 2009).
The primary objective of this review was to examine the efficacy of guideline implementation strategies in improving process outcomes (performance of healthcare providers) and patient outcomes. We additionally explored which components of different guideline implementation strategies can influence process and patient outcomes.
Criteria for considering studies for this review
Types of studies
We included randomised controlled trials only. When a trial did not report randomisation but was described as 'double-blind', and the demographic details of each group were similar, this study was considered to be randomised. We excluded quasi-randomised studies, such as those allocated by using alternate days of the week. Studies employing 'cluster randomisation' (such as randomisation by clinician or practice) were included. We included studies published in all languages.
Types of participants
Adults, however defined, with schizophrenia or related severe mental disorders, including schizophreniform disorder, schizoaffective disorder and delusional disorder, were included. We omitted studies in non-adult populations because of the differences in medical decision making for children and adolescents, including the parent/guardian role. As we were interested in making sure that information is relevant to the care of individuals with severe mental disorders in specialist settings, only studies with participants recruited in mental healthcare settings were included.
Types of interventions
We included any active or passive guideline implementation strategy. We defined 'guidelines' as systematically developed statements (or algorithms, flow charts or tables) prepared to assist decisions about appropriate health care for specific clinical circumstances. We defined 'implementation' as any planned process and systematic introduction of guidelines with the aim of giving them a structural place in professional practice. Passive strategies, such as guideline distribution, were included. Interventions were classified according to a taxonomy developed by the Cochrane Effective Practice and Organisation of Care Review Group (EPOC).
The following comparisons were included.
- Guideline implementation strategy versus usual care ('no intervention' control).
- Guideline implementation strategy A versus guideline implementation strategy B.
Types of outcome measures
It was expected that outcomes would differ between studies according to the characteristics and purposes of the guideline under scrutiny. Outcomes were grouped into process outcomes (performance of healthcare providers) and patient outcomes.
Outcomes were grouped into short term (less than six months), medium term (> six months to one year) and long term (> one year).
The following process outcome was considered.
1. Practitioner impact.
As defined by each of the studies.
The following patient outcomes were considered.
1. Global state.
1.1 Clinically significant response in global state—as defined by each of the studies.
2. Satisfaction with care.
As defined by each of the studies.
3. Treatment adherence.
As defined by each of the studies.
4. Drug attitude.
As defined by each of the studies.
5. Quality of life.
As defined by each of the studies.
6. 'Summary of findings' table.
We planned to use the GRADE approach to interpret findings and the GRADE Profiler to import data from RevMan 5.1 to create 'Summary of findings' tables (Guyatt 2011). These tables provide outcome-specific information concerning the overall quality of evidence from each included study in the comparison, the magnitude of effect of the interventions examined and the sum of available data on all outcomes.
The following outcomes were to be summarised: practitioner impact, global state, satisfaction with care, treatment adherence, drug attitude and quality of life. For each of these, we would have preferred clear and clinically meaningful binary outcomes.
Search methods for identification of studies
1. Cochrane Schizophrenia Group Trials Register
An electronic search of the register was run using the phrase:
[*guideline* OR ((*Algorithm* OR *disseminat* OR *distribut* OR *health care reform* OR *health plan* OR *health polic* OR *health priorit* OR *health reform* OR *Improving care* OR *improving treatment* OR *knowledge transfer* OR *performance measure* OR *policy making* OR *professional standard* OR *research agenda* OR *research priorit* OR *research program* OR *statement* OR *treatment guid* OR *Treatment protocol*) AND (*assess* OR *evaluat* OR *Impact* OR *implement* OR *validity*) ) in title, abstract, or indexing terms in REFERENCES OR (*guideline* in intervention of STUDY)]
This register is compiled by systematic searches of major databases plus handsearches of relevant journals and conference proceedings (see group module).
The electronic search was developed and run by the Trial Search Co-ordinator of the Cochrane Schizophrenia Group, Samantha Roberts.
Searching other resources
1. Reference lists
We searched all references of articles selected for inclusion for further relevant trials.
Data collection and analysis
Selection of studies
Two review authors (CB and FG) inspected all abstracts of studies identified as described above and selected potentially relevant reports. To ensure reliability, another review author (EA) inspected all abstracts independently. When disagreement occurred, this was resolved by discussion, and when doubt persisted, the full article was acquired for further inspection. The full articles of relevant reports were acquired for reassessment and were carefully inspected for a final decision on inclusion. CB and FG were not blinded to the names of authors, institutions or journals of publication. When difficulties or disputes arose, these were resolved by discussion with a third review author (MK).
Data extraction and management
1. Data extraction
Using a form for data collection, CB and FG extracted data from all included studies. To ensure reliability, MK independently extracted data from these studies. Again, any disagreement was discussed, decisions documented and, if necessary, the authors of studies were contacted for clarification. With any remaining problems AC helped clarify issues and the final decisions were documented.
2. Data management
2.1 Scale-derived data
We included continuous data from rating scales only if: a. the psychometric properties of the measuring instrument have been described in a peer-reviewed journal (Marshall 2000); and b. the measuring instrument is not written or modified by one of the trialists for that particular trial.
2.2 Endpoint versus change data
There are advantages of both endpoint and change data. Change data can remove a component of between-person variability from the analysis. On the other hand, calculation of change needs two assessments (baseline and endpoint) which can be difficult in unstable and difficult to measure conditions such as schizophrenia. We decided primarily to use endpoint data and only used change data if the former were not available. We planned to combine endpoint and change data in the analysis as we used mean differences (MD) rather than standardised mean differences (SMD) throughout (Higgins 2011).
2.3 Skewed data
Continuous data on clinical and social outcomes often are not normally distributed. To avoid the pitfall of applying parametric tests to non-parametric data, we aimed to apply the following standards to all data before inclusion: When a scale starts from the finite number zero, the standard deviation (SD), when multiplied by two, is less than the mean (as otherwise, the mean is unlikely to be an appropriate measure of the centre of distribution (Altman 1996)); if a scale starts from a positive value (such as the Positive and Negative Syndrome Scale (PANSS), which can have values from 30 to 210), we planned to modify the calculation described above to take the scale starting point into account. In these cases, skew is present if 2 SD > (S – S min), where S is the mean score and S min is the minimum score. Endpoint scores on scales often have a finite starting point and ending point, and these rules can be applied. When continuous data are presented on a scale that includes the possibility of negative values (such as change data), it is difficult to tell whether data are skewed. We planned to enter skewed data from studies of fewer than 200 participants into additional tables rather than into an analysis. Skewed data pose less of a problem when means are examined if the sample size is large, and we entered such data into the syntheses.
2.4 Common measure
To facilitate comparison between trials, we planned to convert variables that could be reported in different metrics, such as days in hospital (mean days per year, per week or per month), to a common metric (e.g. mean days per month).
2.5 Conversion of continuous to binary
Where possible, efforts was made to convert outcome measures to dichotomous data. This could be done by identifying cut-off points on rating scales and dividing participants accordingly into 'clinically improved' or 'not clinically improved'. It is generally assumed that if there has been a 50% reduction in a scale-derived score such as the Brief Psychiatric Rating Scale (BPRS, Overall 1962) or the PANSS (Kay 1986), this could be considered as a clinically significant response (Leucht 2005a; Leucht 2005b).
2.6 Direction of graphs
Where possible, we entered data in such a way that the area to the left of the line of no effect indicates a favourable outcome for implementation strategies.
Assessment of risk of bias in included studies
Review authors CB and FG independently assessed the risk of bias of each trial using the Cochrane Collaboration's 'Risk of bias' tool (Higgins 2011). This set of criteria is based on evidence of associations between overestimate of effect and high risk of bias of the article such as sequence generation, allocation concealment, blinding, incomplete outcome data and selective reporting. If the raters disagreed, the final rating was made by consensus, with the involvement of MK. Where inadequate details of randomisation and other characteristics of trials are provided, we contacted the authors of the studies in order to obtain further information. We reported non-concurrence in quality assessment, but if disputes arised as to which category a trial was to be allocated, again, resolution was made by discussion with MK.
Measures of treatment effect
1. Binary data
For binary outcomes, we calculated a standard estimation of the risk ratio (RR) and its 95% confidence interval (CI). It has been shown that RR is more intuitive (Boissel 1999) than odds ratios and that odds ratios tend to be interpreted as RR by clinicians (Deeks 2000). For statistically significant results, we used 'Summary of findings' tables to calculate the number needed to treat to provide benefit/to induce harm statistic and its 95% CI.
2. Continuous data
We analysed continuous data using mean differences (MD) (with 95% confidence intervals (CI) ) or standardised mean differences (SMD) (where different measurement scales are used).
Unit of analysis issues
1. Cluster trials
Studies increasingly employ 'cluster-randomisation' (such as randomisation by clinician or practice), but analysis and pooling of clustered data pose problems (Barbui 2011a). They are commonly analysed as if the randomisation was performed on the individuals rather than on the clusters. In this case, approximately correct analyses were performed by dividing the binary data (the number of participants and the number experiencing the event) as presented in a report by a 'design effect' (Higgins 2011). This is calculated by using the mean number of participants per cluster (m) and the intraclass correlation coefficient (ICC) (Design effect = 1 + (m – 1) * ICC) (Higgins 2011). If the ICC was not reported, it was assumed to be 0.1 (Ukoumunne 1999). For continuous data only, the sample size was reduced; means and standard deviations remained unchanged.
2. Studies with multiple treatment groups
Where a study involved more than two treatment arms, we presented all relevant treatment arms in the comparisons. In case of binary data, we planned to simply add and combine the data within the two-by-two table. For continouus data, we planned to combine the data following the guidelines in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). Where the additional treatment arms were not relevant, these data were not reproduced.
Dealing with missing data
1. Overall loss of credibility
At some degree of loss of follow-up, data must lose credibility (Xia 2009). For any particular outcome should more than 50% of data be unaccounted for, we did not reproduce these data or use them within analyses. If, however, more than 50% of those in one arm of a study were lost, but the total loss was less than 50%, we marked such data with (*) to indicate that such a result may well be prone to bias.
When binary or continuous outcomes were not reported, we asked the study authors to supply the data.
2. Binary data
In the case where attrition for a binary outcome was between 0% and 50% and where these data were not clearly described, we presented data on a 'once-randomised-always-analyse' basis (an intention-to-treat analysis). Those leaving the study early were considered to have the same rates of negative outcome as those who completed, with the exception of the outcome of death. We planned to undertake a sensitivity analysis to test how prone the primary outcomes are to change when 'completed' data only are compared with the intention-to-treat analysis using the above assumption.
When data on people who leave early were carried forward and included in the efficacy evaluation (Last Observation Carried Forward, LOCF), they were analysed according to the primary studies; when these people were excluded from any assessment in the primary studies, they were considered as having the negative outcome.
In cases where attrition for a continuous outcome was between 0 and 50%, and data only from people who completed the study to that point were reported, we presented and used these.
3.2 Standard deviations
For continuous outcomes, if SDs were not reported but an exact standard error (SE) and CIs were available for group means, and either the P value or the t value was available for differences in the mean, we calculated them according to the rules described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). When only the SE was reported, we calculated SDs by using the formula SD = SE * Square root (n) (Higgins 2011). The Cochrane Handbook for Systematic Reviews of Interventions presents detailed formulae for estimating SDs from P values, t or F values, CIs, ranges or other statistics. If these formulae did not apply, we calculated the SDs according to a validated imputation method based on the SDs of the other included studies (Furukawa 2006).
Assessment of heterogeneity
1. Clinical and methodological heterogeneity
First, we considered all of the included studies to judge clinical and methodological heterogeneity, while paying due attention to any differences in types of implementation strategies and outcome measures. If inspection of studies revealed considerable heterogeneity of guideline implementation strategies and outcome measures, we planned to not perform formal meta-analyses. Any disagreement was discussed and final decisions documented.
2. Statistical heterogeneity
2.1 Visual inspection
We visually inspected graphs to investigate the possibility of statistical heterogeneity.
2.2 Employing the I
We investigated heterogeneity between studies by considering the I
Assessment of reporting biases
1. Protocol versus full study
Reporting biases arise when the dissemination of research findings is influenced by the nature and direction of results (Egger 1997). These are described in section 10.1 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). We tried to locate protocols of included randomised trials. If the protocol was available, we compared the outcomes in the protocol and in the published report. If the protocol was not available, we compared the outcomes listed in the methods section of the trial report with the reported results.
2. Funnel plot
We are aware that funnel plots may be useful in investigating reporting biases but are of limited power to detect small-study effects. We planned not to use funnel plots for outcomes where there are 10 or fewer studies, or where all studies are of similar sizes. In other cases, where funnel plots are possible, we asked for statistical advice in their interpretation.
As reported above (Assessment of heterogeneity), we only calculated summary measures of intervention effect for studies assessing the impact of similar guideline implementation strategies and using similar outcome measures. If summary measures were calculated, we employed a random-effects model for analyses throughout, as it takes into account differences between studies even if there is no statistically significant heterogeneity. The disadvantage of the random-effects model is that it puts added weight onto the smaller of the studies, that is those trials that are most vulnerable to bias. The reader is, however, able to choose to inspect the data using the fixed-effect model.
Subgroup analysis and investigation of heterogeneity
1. Subgroup analysis
No subgroup analyses were planned.
2. Investigation of heterogeneity
If inconsistency was high, this was reported. First, we investigated whether data have been entered correctly. Second, if data were correct, we visually inspected the graph and removed outlying studies to see if homogeneity could be restored. Should this occur with no more than 10% of the data being excluded, we planned to present the data. If not, we did not pool the data but discussed these issues.
Should unanticipated clinical or methodological heterogeneity be obvious, we simply stated hypotheses regarding these for future reviews or versions of this review. We pre-specified no characteristics of studies that may be associated with heterogeneity except the quality of the trial method. Should another characteristic of the studies be highlighted by the investigation of heterogeneity, these post hoc reasons were discussed and the data analysed and presented. However, should no reasons for the heterogeneity be clear, the final data were presented without a meta-analysis. If data were clearly heterogeneous we reasoned that it may be misleading to quote an average value for the intervention effect.
No sensitivity analyses were planned.
Description of studies
Results of the search
We inspected 882 records provided by the Cochrane Schizophrenia Group search (March 2012) and an additional five records known to us or suggested by reviewers. Of 19 potentially eligible articles, only eight, describing the results of five studies, met inclusion criteria (see Study flow chart, Figure 1).
|Figure 1. Study flow diagram.|
We found five studies for inclusion. Baandrup 2010 carried out a cluster-randomised comparison of a multifaceted intervention aimed at decreasing antipsychotic polypharmacy versus routine care in people with schizophrenia and related psychotic disorders. Prevalence of antipsychotic polypharmacy was assessed at baseline and after one year. Hamann 2006 conducted a cluster-randomised comparison of a shared decision-making intervention (printed decision aid plus planning talk) versus routine care in a sample of 107 inpatients with schizophrenia. Six wards were allocated to the experimental intervention and six to the control condition. Hudson 2008 conducted a cluster-randomised comparison of a multifaceted intervention to promote medication adherence versus basic education in six psychiatric services. A total sample of 349 participants with schizophrenia were enrolled. Osborn 2010 conducted a cluster-randomised comparison of a nurse-led intervention to improve screening for cardiovascular risk factors in people with severe mental illness. Six community mental health teams were randomly assigned to experimental (three teams) or control (three teams) conditions. A total of 121 people participated in outcome interviews. Thompson 2008 conducted a pragmatic cluster-randomised controlled trial in 19 adult psychiatric units (clusters) from the South West of England with the aim of assessing whether a multifaceted intervention was effective in reducing prescribing of antipsychotic polypharmacy.
1. Length of studies
Participants were adults with schizophrenia (Hudson 2008), schizophrenia and related psychotic disorders (Baandrup 2010; Hamann 2006; Osborn 2010) or a diagnosis of 'severe mental disorders' (Thompson 2008).
4. Study size
Participants numbered 349 (Hudson 2008), 121 (Osborn 2010) and 107 (Hamann 2006). The other two studies carried out two cross-sectional calculations of antipsychotic polypharmacy (at baseline and at follow-up). At follow-up (primary outcome), the number of participants was 216 in the experimental group and 386 in the control group in one study (Baandrup 2010), and 220 in the control group and 260 in the experimental group in the other study (Thompson 2008).
5. Interventions and outcomes
Two studies assessed the efficacy of a multifaceted intervention based on existing guidelines to reduce antipsychotic polypharmacy (Baandrup 2010; Thompson 2008); prevalence of antipsychotic polypharmacy was the primary outcome in these two studies (derived from computerised medical records and from participants' medication charts). In one study, the intervention under scrutiny consisted of written instructions to engage participants in medical decisions, and outcomes included global state, measured by the PANSS; satisfaction with care, measured with the Patient Satisfaction Questionnaire (Fragebogen zur Patientenzufriedenheit, ZUF8); and drug attitude, measured with the Drug Attitide Inventory (DAI) (Hamann 2006). An enhanced implementation strategy designed to promote guideline-concordant prescribing was studied by Hudson 2008, who employed guideline-concordant prescribing as the primary outcome, measured by the participant's self-report of medication use over the previous 30 days and medical record abstraction. Osborn 2010 studied a nurse-led intervention aimed at promoting cardiovascular disease screening; the primary outcome of this study was the proportion of people receiving screening, as reported by participants and determined by their general practitioner (GP) notes (Osborn 2010).
We excluded five studies after careful inspection of full text: In two, the sample included people with bipolar disorder or unipolar depression; in one, a non-randomised design was employed; in another, the psychometric properties of the scale used to assess outcomes had not been validated, the participant population consisted of mental health service users with no details about diagnosis and outcome was a measure of cognitive determinants of implementation behaviour (Michie 2005); in the final study, the focus of the intervention was not the implementation of a guideline (see Characteristics of excluded studies).
We have created a table of suggested future schizophrenia reviews based on studies that we excluded from this review ( Table 1).
Olfson 1998 presented at the 151st Annual Meeting of the American Psychiatric Association the study protocol of a multisite, prospective, controlled study conducted to evaluate an intensive guideline implementation intervention aimed at improving the short-term outcomes of public sector participants with schizophrenia.
In international repositories of trial protocols, we identified four protocols of ongoing randomised trials that might be relevant for this review (see Characteristics of ongoing studies).
Risk of bias in included studies
We used the tool for assessment of bias described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). The overall quality of the studies is generally unclear. For an overall view of risk of bias, see Figure 2 and Figure 3.
|Figure 2. Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.|
|Figure 3. Risk of bias summary: review authors' judgements about each risk of bias item for each included study.|
Each included study indicated that allocation to treatment was made by random assignment. In three studies, the method used to generate a randomisation sequence was described (Hudson 2008; Osborn 2010; Thompson 2008). Allocation concealment was properly described in two studies (Osborn 2010; Thompson 2008).
All five studies adopted an open design. In all studies, this was motivated by a cluster design, which made it very difficult for study participants and researchers to remain blind to allocation (Baandrup 2010; Hamann 2006; Hudson 2008; Osborn 2010; Thompson 2008). Because of lack of information, it is very difficult to make a judgement on whether lack of blindness had an impact on the conduct or outcome of studies.
Incomplete outcome data
A high attrition rate was reported in only one study, with data available for only 66% of the initial sample at follow-up (Hamann 2006). In the other included studies, a lower attrition rate was reported.
Few details on selective reporting were provided. We noted that all study measures mentioned in the Methods were included in the analysis and were reported in the Results.
Other potential sources of bias
In three of the five cluster trials, as random allocation did not occur at the level of participants, participant groups were not well balanced at baseline in terms of sociodemographic and clinical characteristics (Hamann 2006; Hudson 2008; Thompson 2008). In the other two cluster studies, participant characteristics were comparable at baseline. Although an economic conflict of interest seems not relevant in these studies, an intellectual bias might not be excluded, as authors of the trials were also involved in development of the implementation strategies.
Effects of interventions
See: Summary of findings for the main comparison Active education + Support for implementation compared with Routine care or Passive dissemination for participants with schizophrenia and related psychosis
With the exception of Baandrup 2010 and Thompson 2008, in which comparable outcome measures (antipsychotic polypharmacy) were used, critical appraisal of the included studies revealed substantial heterogeneity in terms of focus of the guideline, target of the intervention, implementation strategy and outcome measures.
1. Practitioner (process) outcomes.
1.1 Practitioner impact.
Of the five included studies, practitioner impact was assessed in three.
Meta-analysis of two studies (Baandrup 2010 and Thompson 2008) revealed that a combination of several guideline dissemination and implementation strategies targeting healthcare professionals did not reduce antipsychotic co-prescribing in schizophrenia outpatients (two studies, n = 1,082, RR 1.10, 95% CI 0.99 to 1.23; corrected for cluster design: n = 310, RR 0.97, 95% CI 0.75 to 1.25; Analysis 1.1).
1.1.2 Not screened for cardiovascular risk.
Osborn 2010, in which investigators studied a nurse-led intervention aimed at promoting cardiovascular disease screening, reported a significant effect in terms of proportions of people receiving screening (blood pressure: n = 96, RR 0.07, 95% CI 0.02 to 0.28; cholesterol: n = 103, RR 0.46, 95% CI 0.30 to 0.70; glucose: n = 103, RR 0.53, 95% CI 0.34 to 0.82; BMI: n = 99, RR 0.22, 95% CI 0.08 to 0.60; smoking status: n = 96, RR 0.28, 95% CI 0.12 to 0.64; Framingham score: n = 110, RR 0.69, 95% CI 0.55 to 0.87), although in the analysis corrected for cluster design, the effect was statistically significant for blood pressure and cholesterol only (blood pressure, corrected for cluster design: n = 33, RR 0.10, 95% CI 0.01 to 0.74; cholesterol, corrected for cluster design: n = 35, RR 0.49, 95% CI 0.24 to 0.99; glucose, corrected for cluster design: n = 35, RR 0.58, 95% CI 0.28 to 1.21; BMI, corrected for cluster design: n = 34, RR 0.18, 95% CI 0.02 to 1.37; smoking status, corrected for cluster design: n = 32, RR 0.25, 95% CI 0.06 to 1.03; Framingham score, corrected for cluster design: n = 38, RR 0.71, 95% CI 0.48 to 1.03; Analysis 1.2).
2. Patient outcomes.
2.1 Global state.
Global state was measured in one study (Hamann 2006), which found no impact in terms of psychopathology, as measured by the PANSS (Kay 1987) (total score scale: n = 105, MD -1.30, 95% CI -8.21 to 5.61; corrected for cluster design: n = 59, MD -1.30, 95% CI -10.52 to 7.92, very low quality; Analysis 1.3).
2.2 Satisfaction with care.
Satisfaction with care was measured in one study (Hamann 2006), which found no impact in terms of satisfaction with care, as measured by the Patient Satisfaction Questionnaire (Langewitz 1995) (n = 83, MD 0.10, 95% CI -1.43 to 1.63; corrected for cluster design: n = 46, MD 0.10, 95% CI -1.96 to 2.16; Analysis 1.4).
2.3 Treatment adherence.
Treatment adherence was measured in one study (Hudson 2008). Although researchers found a 22.5% increase from baseline in the proportion of people rated as adherent in the experimental intervention versus a 15.1% increase from baseline in the control group, at follow-up the two groups did not differ in terms of adherence rates (n = 349, RR 0.87, 95% CI 0.66 to 1.15; corrected for cluster design: n = 52, RR 0.90, 95% CI 0.44 to 1.85; Analysis 1.5).
2.4 Drug attitude.
Drug attitude was examined in one study (Hamann 2006), which found no impact, as measured by the DAI (Awad 1993) (n = 57, MD -1.40, 95% CI -2.88 to 0.08; corrected for cluster design: n = 32, MD -1.40, 95% CI -3.38 to 0.58; Analysis 1.6).
2.5 Quality of life.
This outcome was not assessed in any of the included studies.
Summary of main results
The present systematic review found very limited evidence on how implementation programmes should be developed or implemented to bridge the guideline-practice gap in specialist mental health care. Only five randomised studies were included, and meta-analysis was carried out on one outcome only, as critical appraisal of included studies revealed substantial heterogeneity in terms of focus of guideline, target of intervention and implementation strategy. We would have expected that studies assessing the efficacy of different implementation programmes would have reported results in terms of practitioner impact and patient outcome, and we planned to extract data on several of these aspects, including global state, treatment adherence, satisfaction with care and quality of life. This expectation is reflected by the structure of the Summary of findings for the main comparison, where these outcomes are listed. However, studies reported either a measure of practitioner impact or a measure of participant outcome. Additionally, only one study reported more than one measure of participant outcome, and none of the included studies analysed quality of life outcomes.
In summary, these studies suggest that, although small changes in psychiatric practice have been shown, uncertainty remains in terms of clinically meaningful and sustainable effects on participant outcomes.
Overall completeness and applicability of evidence
The identified studies are not sufficient to address the objectives of the present review. For the primary outcome, pooling of results was possible in two studies only, and for many secondary outcomes, data were too sparse to allow reasonable conclusions to be drawn. This situation might change if future implementation studies will focus on key process and participant outcomes, including polypharmacy, symptom improvement, treatment adherence, satisfaction with care and attitude towards antipsychotic drugs. We found no data at all on quality of life ( Summary of findings for the main comparison). In short, many questions about implementation of guidelines remain unanswered.
The value of other interventions that may promote guideline use, including educational activities, social engagement, clinical support systems, incentives and audit and feedback exercises, has not been directly assessed in the included studies, although in some cases, the implementation strategy included some components of these elements.
Available evidence on guideline implementation in mental health care is very sparse and involves a challenging applicability issue: Not only may it be difficult to extrapolate study findings to a context of care different from that in which the findings were generated. Also, it cannot be assumed that an implementation strategy that proved successful for implementing a specific therapeutic behaviour may be similarly successful for implementing other therapeutic behaviours.
Another challenging aspect not covered by the included studies refers to the possibility that characteristics of some guidelines may enhance their uptake in clinical practice. The content and format of guidelines, for example, may facilitate or impede their use, and addressing and taking into consideration these elements in the initial phases of guideline development may significantly increase their chance of being implemented. It would be relevant to formally test whether careful consideration of these aspects may indeed lead to better and sustained guideline implementation in mental health care. The study excluded from this review because it included a heterogeneous participant population and outcomes assessed by a non-validated rating scale provides initial evidence on this compelling issue (Michie 2005).
Quality of the evidence
In all five studies, a cluster design was employed. A major problem with cluster trials is that identification and recruitment of individuals occur after random allocation of clusters has been carried out. It is therefore possible that investigators enrolled participants without being blind to allocation status, and this may have introduced a potential source of bias, as knowledge of whether each cluster is an 'intervention' or 'control' cluster could, in theory, have affected the types of participants recruited. Hence, the potential for selection bias within clusters may not be irrelevant (Barbui 2011a). Additionally, as the unit of allocation is the cluster and not the individual, comparability at baseline for individuals may not be straightforward. We note that in four of the five cluster trials, authors reported some baseline differences in terms of sociodemographic and clinical characteristics.
Another compelling aspect is that the characteristics of the interventions under study do not easily allow blinding of those delivering components of the intervention and those receiving the intervention. Although this inability to blind is a distinctive feature of cluster trials, it may be possible to assess outcomes blind to allocation status, for example, by employing outcome assessors who were not involved in the conduct of the study and are masked to the allocated interventions. It is unclear whether similar approaches were employed in the included studies.
A final issue is that despite our attempt to include all randomly assigned participants in the analyses, the cluster design posed some problems in specific circumstances. For example, in Osborn 2010, six randomised clusters included total numbers of 59 and 62 participants in the experimental and control groups. However, when the primary outcome, the proportion of participants who underwent cardiovascular screening, was measured, only those needing screening were included; this led to the exclusion of up to seven people in the experimental arm and up to 13 in the control arm. Clearly, although it seems clinically reasonable to exclude those who had already been screened, it remains unclear whether these exclusions might have had a negative impact from a methodological viewpoint.
Potential biases in the review process
The present systematic review has limitations. A first concern is the possibility that the search strategy may have missed some studies because publications did not use common keywords or were labelled with subject headings of guideline implementation initiatives that we did not capture. A second concern is that the definition of 'guideline' that we employed (systematically developed statements or algorithms, flow charts and tables to assist decisions about appropriate health care for specific clinical circumstances) (Barbui 2012) has inevitably left some subjectivity in deciding whether a strategy could be considered a guideline, especially when written instructions were embedded into more complex packages of care. We believe that this should not have had a major impact on the review—but such an impact remains a possibility.
Because of substantial heterogeneity in terms of focus of the guideline, target of the intervention and implementation strategy, a formal meta-analysis of individual effect sizes (beyond the qualitative summary of findings) was not feasible.
Agreements and disagreements with other studies or reviews
In 2004 Grimshaw and colleagues published a landmark systematic review of the effects of guideline implementation strategies; this review included 235 studies reporting 309 comparisons of guideline dissemination and implementation strategies (Grimshaw 2004). Both randomised and non-randomised studies were considered. The review authors found that although some studies suggest it is possible to change healthcare practitioner behaviours, the overall evidence base is heterogeneous in terms of study designs employed, populations studied, implementation strategies applied and study quality assessed. Consequently, the review authors concluded that an imperfect evidence base supports decisions about which change strategies are likely to be efficient under different circumstances. The main findings of the present review are in line with this conclusion and suggest that similar considerations may apply to mental healthcare settings.
In the field of mental health care, the issue of whether guidelines may have an impact on doctor/practitioner performance and on patient outcome has been investigated in a systematic review of randomised and non-randomised studies that enrolled participants with any psychiatric disorders. This review included only 18 studies, nine of which were randomised trials (Weinmann 2007). Although 12 studies evaluated the implementation of psychiatric guidelines in primary care settings, only five studies were carried out in mental healthcare settings. Implementation methods ranged from simple interventions, such as dissemination of educational materials, to more complex and multifaceted interventions, including tutorial and consultation sessions and use of treatment algorithms, reminder systems, audit and feedback and psychological theories to overcome obstacles. Analysis of these 18 studies revealed that multifaceted interventions were more likely to have an impact on doctor performance and patient outcome, albeit effect sizes were generally modest. In total, only four studies showed a significant effect on participant outcomes (Weinmann 2007).
Audit of clinical activities and feedback to doctors may be a relevant component of any implementation strategy. Knaup and colleagues, who systematically reviewed controlled studies that evaluated audit and feedback strategies, showed a positive effect on mental health outcomes, at least in the short term (Knaup 2009). This finding seems intuitive, as guideline implementation is meant to be iterative, in the sense that after implementation, guideline use and outcomes should be monitored and the findings used to inform ongoing quality improvement efforts, as the ultimate goal of any implementation activity is continuous quality improvement. We note that none of the studies included in the present systematic review employed formal audit and feedback activities as part of the implementation strategies applied.
Implications for practice
1. For people with schizophrenia
The findings of this review have few implications for people with schizophrenia. Patients and their families should continue to question clinicians about the basis of their care and the reasoning behind using a specific drug, care package or psychological intervention. In this way, people can encourage clinicians to think about the reasons for using one particular approach rather than another, and this, in turn, could encourage use of guidance.
2. For clinicians
Few studies assessing the impact of guideline implementation strategies for people with schizophrenia or related psychotic disorders suggest that significant changes in clinically meaningful and sustainable effects influence participant outcomes. It is surprising that although the pathway from evidence generation to evidence synthesis and guideline development is highly developed, the pathway from evidence-based guidelines to evidence-based practice is much less developed and is examined in only a few studies. This is very relevant for healthcare professionals, who are left with limited instructions on how to make best use of available guidelines. If such instructions are given to practitioners, they should be rolled out within the context of a real-world randomised trial of substantial size for evaluation of their worth.
3. For policy makers and funders
The present systematic review found scant and imperfect evidence to support decisions about which change strategies are likely to be efficient in mental healthcare settings. However, the following practical consideration may be implicitly derived from the existing literature (Barbui 2012a).
Treatment guidelines should be developed as locally as possible, or should be adapted locally, to take into account issues such as value judgements, resource use, local context characteristics and feasibility, which are aspects that may be widely different in different contexts. This may have a profound impact in terms of the likelihood that the guidance is implemented, as healthcare professionals may be reluctant to adhere to standards of care set by others. Recommendations should reflect a balanced approach between care of individual patients and how work is organised. This may be particularly relevant in mental healthcare because new and better interventions (e.g. early interventions for psychotic patients, assertive community treatment, community mental health interventions, vocational and rehabilitative interventions) cannot be delivered in the absence of functioning mental healthcare systems.
Existing evidence suggests that audit and feedback systems are relevant for fidelity reasons, that is, to check the degree of coherence between what is recommended and what is actually done. Audit and feedback of patient outcomes is additionally essential for internal accountability reasons, that is, to provide continuous feedback to professionals, who need to know the true impact of their practice, and to mental healthcare planners, who may wish to include in their decision-making process, among other considerations, local outcome data. Audit and feedback of patient outcomes may be relevant for external accountability reasons as well, that is, to provide patients, families and the public with data that may be used in making more informed choices, and to provide feedback to science by producing processes and outcome data that may generate new research hypotheses, which may be formally tested using experimental designs.
Implications for research
We recommend that any sort of guideline implementation program should be described and documented, thereby increasing our knowledge on how to make best use of available evidence to improve practice. Successful and unsuccessful experiences should be given visibility, even if they have not been studied in experimental conditions, as happens in clinical medicine, where reports on single cases are given visibility to describe new clinical scenarios and new solutions.
If feasible, the impact of guideline implementation programs should be studied using reliable study designs, such as randomised trials (Cipriani 2009) and cluster-randomised trials (Barbui 2011a), and a pragmatic approach to decrease the huge imbalance between what we know and what we actually do. We realise that the design of such studies takes greater care and attention to detail than is possible to provide within a review, but we have considered available data at some length and suggest a broad outline of a design in Table 2.
Most excluded studies enrolled heterogeneous participant populations, including people with affective and non-affective psychosis and unipolar depression. These studies may find their way into reviews related to the impact of guideline implementation strategies in more general populations of psychiatric patients.
We would like to thank members of the editorial base of the Cochrane Schizophrenia Group for their help, and we acknowledge the use of the Group template in the Methods section of this protocol. We have used this and adapted it for the purposes of our protocol. We additionally thank Dr Johannes Hamann for providing us with additional data.
Data and analyses
- Top of page
- Summary of findings [Explanations]
- Authors' conclusions
- Data and analyses
- Contributions of authors
- Declarations of interest
- Sources of support
- Differences between protocol and review
- Index terms
Contributions of authors
Corrado Barbui—study identification, critical appraisal, data entry, interpretation and writing.
Francesca Girlanda—study identification, critical appraisal and data entry.
Esra Ay—study identification and critical appraisal.
Andrea Cipriani—interpretation and writing.
Thomas Becker—interpretation and writing.
Markus Koesters—interpretation and writing.
Declarations of interest
Sources of support
- University of Verona, Italy.
- Grant number 01KG1109, Germany.Federal Ministry of Education and Research
Differences between protocol and review
At the protocol stage, the 'Summary of findings' outcomes were not listed; in the review, we have specified that the 'Summary of findings' outcomes correspond to the primary and secondary outcomes of the review and that we would have preferred these data to be binary for the purposes of the table.
Medical Subject Headings (MeSH)
*Guideline Adherence; *Outcome and Process Assessment (Health Care); *Practice Guidelines as Topic; *Specialization; Antipsychotic Agents [*therapeutic use]; Mental Health; Randomized Controlled Trials as Topic; Schizophrenia [*drug therapy]
MeSH check words
* Indicates the major publication for the study