Appendix 1. Previous plain language summary
Sulpiride versus placebo for schizophrenia
Schizophrenia is a severe mental illness characterised by a mixture of symptoms such as hallucinations, delusions, disorganisation and social withdrawal. For some it can be a life-long condition and people with this diagnosis are usually treated with antipsychotic drugs. There can be quite a large difference in cost between recently developed antipsychotics (second generation) and the older ones (first generation), but the older drugs can have considerably more movement side effects and many people find them difficult to tolerate. In developing countries cost of medication can be a major factor in prescribing, so the first generation drugs are used the most.Sulpiride is a first generation antipsychotic which is said to cause fewer adverse effects. In addition, people whose main symptoms are aspects of social withdrawal may respond better to sulpiride than some of the other older antipsychotics. This review reports trials comparing sulpiride with placebo for people with schizophrenia or similar psychotic illnesses. The two studies contained a total of 113 people with chronic (long term) schizophrenia, were both 12 weeks long and set in hospital. Most of the data from these trials were not reported in a way that would give meaningful statistics. However, in one trial sulpiride was not significantly better than placebo in improving negative symptoms (when measuring all such symptoms). However, the single negative symptom of the social behaviour of the participant, showed a significant improvement in the sulpiride group. The potential side effects of the medication were not measured, but the number of people leaving the trial early was not significantly different between the two groups. Sulpiride is an inexpensive antipsychotic drug that is used all over the world, therefore a well planned, conducted and reported randomised control trial would contribute to our knowledge about this drug.(Plain language summary prepared for this review by Janey Antoniou of RETHINK, UK www.rethink.org).
Appendix 2. Details of past searches for earlier versions of this review
The following search phrase was constructed to assist identification for previous versions of this review (Soares 1999).
(sulpiride-phrase)=(abilit or championyl or coolspan or col-sulpir or digton or dixibon or dobren or dogmatil or dolmatil or drominetas or eglonyl or equilid or eusulpid or guastil or isnamid or kapiride or lavodina or lebopride or lusedan or miradol or mirbanil or misulvan or neuromyfar or normum or omperan or psicocen or quiridil or sato or sernevin or sicofrenol or sulpiride or sulpisedan or suprium or sursumid or tepavil or tonofit or ulpir or vipral)
1. Biological Abstracts (January 1982 to December 1997) was searched using the Cochrane Schizophrenia Group's phrase for randomised controlled trials and for schizophrenia (see Group search strategy) combined with:
2 CINAHL (January 1982 to March 1998) was searched using the Cochrane Schizophrenia Group's phrase for randomised controlled trials and for schizophrenia (see Group search strategy) combined with:
3. Cochrane Schizophrenia Group's Register (March 1998) was searched using:
[(sulpiride-phrase) or #42=110 or #42=563] (#42 is the field in the Register where each intervention is coded. 110 is sulpiride and 563 Dogmatil or Dolmatil).
4. Cochrane Library (Issue 1, 1998) was searched using:
[(sulpiride-phrase) or SULPIRIDE/explode in MeSH] 5. EMBASE (January 1980 to January 1998) was searched using the Cochrane Schizophrenia Group's phrase for randomised controlled trials and for schizophrenia (see Group search strategy) combined with:
[and ((sulpiride-phrase) or explode SULPIRIDE / all)]
6. MEDLINE (January 1966 to April 1998) was searched using the Cochrane Schizophrenia Group's phrase for randomised controlled trials and for schizophrenia (see Group search strategy) combined with:
[and ((sulpiride-phrase) or SULPIRIDE / explode in MeSH)]
7. PsycLIT (January 1974 to September 1997) was searched using the Cochrane Schizophrenia Group's phrase for randomised controlled trials and for schizophrenia (see Group search strategy) combined with:
[and ((sulpiride-phrase) or SULPIRIDE / explode in MeSH)]
8. SIGLE (January 1994 to December 1997) was searched using the Cochrane Schizophrenia Group's phrase for randomised controlled trials and for schizophrenia (see Group search strategy) combined with:
9. Sociofile (January 1974 to December 1997) was searched using the Cochrane Schizophrenia Group's phrase for randomised controlled trials and for schizophrenia (see Group search strategy) combined with:
10. The Cochrane Schizophrenia Group Trials Register was searched (September 2008) using the phrase:
[(ability * or championyl* or coolspan* or col-sulpir* or digton* or dixibon* or dobren* or do?matil* or drominetas* or eglonyl* or equilid* or eusulpid* or guastil* or isnamid* or kapirid* or lavodina* or leboprid* or lusedan* or miradol* or mirbanil* or misulvan* or neuromyfar* or normum* or omperan* or psicocen* or quiridil* or sato * or sernevin* or sicofrenol* or sulp?ride* or sulpisedan* or suprium* or sursumid* or tepavil* or tonofit* or ulpir* or vipral*) in title, abstract and index fields in REFERENCE) OR (sulp?rid* in interventions field in STUDY)]
This register is compiled by systematic searches of major databases, hand searches and conference proceedings (see Group Module). The Cochrane Schizophrenia Group Trials Register is maintained on Meerkat 1.5. This version of Meerkat stores references as studies. When an individual reference is selected through a search, all references which have been identified as the same study are also selected.
Appendix 3. Details of previous methods and data analysis
1. Data Extraction
IMO and JW extracted data from included studies. Again, any disagreement was discussed, decisions documented and, if necessary, authors of studies were contacted for clarification. When this was not possible and further information was necessary to resolve the dilemma, we did not enter data and added the trial to the list of those awaiting assessment.
We extracted the data onto standard, simple forms. Where possible, data were entered into RevMan in such a way that the area to the left of the 'line of no effect' indicates a 'favourable' outcome for clozapine. Where this was not possible, for example for scales that calculate higher scores=improvement, graphs in RevMan analyses were labelled accordingly so that the direction of effects were clear.
3. Scale-derived data
3.1 Valid scales
A wide range of instruments are available to measure outcomes in mental health studies. These instruments vary in quality and many are not validated, or are even ad hoc. It is accepted generally that measuring instruments should have the properties of reliability (the extent to which a test effectively measures anything at all) and validity (the extent to which a test measures that which it is supposed to measure) (Rust 1989). Unpublished scales are known to be subject to bias in trials of treatments for schizophrenia (Marshall 2000). Therefore continuous data from rating scales were included only if the measuring instrument had been described in a peer-reviewed journal. In addition, the following minimum standards for instruments were set: the instrument should either be (a) a self-report or (b) completed by an independent rater or relative (not the therapist) and (c) the instrument should be a global assessment of an area of functioning.
3.2 Binary outcomes from scale data
Where possible, efforts were made to convert outcome measures to binary data. This can be done by identifying cut-off points on rating scales and dividing participants accordingly into "clinically improved" or "not clinically improved". It was generally assumed that if there had been a 50% reduction in a scale-derived score such as the Brief Psychiatric Rating Scale (BPRS, Overall 1962) or the Positive and Negative Syndrome Scale (PANSS, Kay 1986), this could be considered as a clinically significant response (Leucht 2005a, Leucht 2005b). It was recognised that for many people, especially those with chronic or severe illness, a less rigorous definition of important improvement (e.g. 25% on the BPRS) would be equally valid. If individual patient data were available, the 50% cut-off was used for the definition in the case of non-chronically ill people and 25% for those with chronic illness. If data based on these thresholds were not available, we used the primary cut-off presented by the original authors.
Assessment of risk of bias in included studies
IMO and JW worked independently to assess risk of bias by using criteria described in the Cochrane Collaboration Handbook (Higgins 2008) to assess trial quality. This set of criteria is based on evidence of associations between overestimate of effect and high risk of bias of the article such as sequence generation, allocation concealment, blinding, incomplete outcome data and selective reporting.
The categories are defined below:
YES - low risk of bias
NO - high risk of bias
UNCLEAR - uncertain risk of bias
If sequence generation process within the trial was by quasi-random means, such as by odd or hospital record numbers, this was noted and the study was given a "NO - high risk of bias" rating. If data from such studies did not differ from the results of higher grade trials, these were presented. If disputes arose as to which category a trial had to be allocated, again, resolution was made by discussion, after working with the Cochrane Schizophrenia Group’s Co-ordinating Editor (CEA).
Measures of treatment effect
1. Binary data
The review uses relative risk (RR) and its 95% confidence interval (CI) based on the random-effects model, as this takes into account any differences between studies even if heterogeneity is not statistically significant, as the preferred statistic for summation. Relative Risk is more intuitive (Boissel 1999) than odds ratios and odds ratios tend to be interpreted as RR by clinicians (Deeks 2000). This misinterpretation then leads to an overestimate of the impression of the effect. Data were inspected to see if analysis using a Mantel-Haenszel odds ratio and fixed-effect model made any substantive difference. For statistically significant results we calculated the number needed to treat/harm statistic (NNT/H), and its 95% confidence interval (CI) using Visual Rx (http://www.nntonline.net/) taking account of the event rate in the control group.
Where possible, we attempted to convert outcome measures to binary data. This can be done by identifying cut-off points on rating scales and dividing participants accordingly into “clinically improved” or “not clinically improved”. It was generally assumed that if there had been a 50% reduction in a scale-derived score such as the Brief Psychiatric Rating Scale (BPRS, Overall 1962) or the Positive and Negative Syndrome Scale (PANSS, Kay 1986), this could be considered as a clinically significant response (Leucht 2005a, Leucht 2005b). It was recognised that for many people, especially those with chronic or severe illness, a less rigorous definition of important improvement (e.g. 25% on the BPRS) would be equally valid. If individual patient data were available, we used the 50% cut-off for the definition in the case of non-chronically ill people and 25% for those with chronic illness. If data based on these thresholds were not available, we used the primary cut-off presented by the original authors.
2. Continuous data
2.1 Rating scales
A wide range of instruments are available to measure mental health outcomes. These instruments vary in quality and many are not valid, or are even ad hoc. For outcome instruments some minimum standards have to be set. They were that: (i) the psychometric properties of the instrument should have been described in a peer-reviewed journal (Marshall 2000);and (ii) the instrument should either be: (a) a self report, or (b) completed by an independent rater or relative (not the therapist).
2.2 Summary statistic
For continuous outcomes we estimated a random-effects weighted mean difference (WMD) between groups. We did not calculate effect size measures.
2.3 Endpoint versus change data
We preferred to use scale endpoint data, which typically cannot have negative values and is easier to interpret from a clinical point of view. Change data is more problematic and the rule described above does not hold for it. Where both endpoint and change were available for the same outcome the reviewers presented the former in preference.
2.4 Skewed data
Mental health continuous data is often not "normally" distributed. To avoid the pitfall of applying parametric tests to non-parametric data the following standards were applied to all data before inclusion: (i) standard deviations and means were reported in the paper or were obtained from the authors; (ii) if the data were finite number zero, for example 0-100, when the standard deviation was multiplied by two, the result should be less than the mean, otherwise the mean is unlikely to be an appropriate measure of the centre of the distribution (Altman 1996). (III) if a scale starts from a positive value (such as PANSS which can have values from 30 to 210) the calculation described above will be modified to take the scale starting point into account. In these cases skew is present if 2SD>(S-S min), where S is the mean score and S min is the minimum score. Endpoint scores on scales often have a finite start and end point and these rules can be applied.
When continuous data are presented on a scale which includes a possibility of negative values (such as change data), it is difficult to tell whether data are skewed or not. Skewed data from studies of less than 200 participants were entered in additional tables rather than into an analysis. Skewed data pose less of a problem when looking at means if the sample size is large and were entered into syntheses.
Unit of analysis issues
1. Cluster trials
Studies increasingly employ 'cluster randomisation' (such as randomisation by clinician or practice) but analysis and pooling of clustered data poses problems. Firstly, authors often fail to account for intraclass correlation in clustered studies, leading to a 'unit of analysis' error (Divine 1992) whereby p values are spuriously low, confidence intervals unduly narrow and statistical significance overestimated. This causes type I errors (Bland 1997, Gulliford 1999).
Where clustering is not accounted for in primary studies, we presented data in a table, with a (*) symbol to indicate the presence of a probable unit of analysis error. In subsequent versions of this review we will seek to contact first authors of studies to obtain intraclass correlation coefficients of their clustered data and to adjust for this by using accepted methods (Gulliford 1999). Where clustering had been incorporated into the analysis of primary studies, we present these data as if from a non-cluster randomised study, but adjusted for the clustering effect.
We have sought statistical advice and have been advised that the binary data as presented in a report should be divided by a 'design effect'. This is calculated using the mean number of participants per cluster (m) and the intraclass correlation coefficient (ICC) [Design effect=1+(m-1)*ICC] (Donner 2002). If the ICC was not reported it was assumed to be 0.1 (Ukoumunne 1999).
If cluster studies had been appropriately analysed taking into account intraclass correlation coefficients and relevant data documented in the report, synthesis with other studies would have been possible using the generic inverse variance technique.
2. Cross-over trials
A major concern of cross-over trials is the carry-over effect. It occurs if an effect (e.g. pharmacological, physiological or psychological) of the treatment in the first phase is carried over to the second phase. As a consequence on entry to the second phase the participants can differ systematically from their initial state despite a wash-out phase. For the same reason cross-over trials are not appropriate if the condition of interest is unstable (Elbourne 2002). As both effects are very likely in schizophrenia, we will only use data of the first phase of cross-over studies.
3. Studies with multiple treatment groups
Where a study involved more than two treatment arms, if relevant, the additional treatment arms were presented in comparisons. Where the additional treatment arms were not relevant, these data were not reproduced.
Dealing with missing data
1. Overall loss of credibility
At some degree of loss to follow-up data must lose credibility (Xia 2007). We are forced to make a judgment where this is for the trials likely to be included in this review. Should more than 40% of data be unaccounted for by 8 weeks we did not reproduce these data or use them within analyses.
Where attrition for a binary outcome is between 0 and 40%, and outcomes of these people are described, we included these data as reported. Where the outcomes of such people were not clearly described, we assumed the worst primary outcome, and rates of adverse effects similar to those who did continue to have their data recorded.
In the case where attrition for a continuous outcome is between 0 and 40% and completer-only data were reported, we have reproduced these.
Assessment of heterogeneity
1. Clinical heterogeneity
We considered all included studies without any comparison to judge clinical heterogeneity.
2.1 Visual inspection
We visually inspected graphs to investigate the possibility of statistical heterogeneity.
2.2 Employing the I-squared statistic
This provided an estimate of the percentage of inconsistency thought to be due to chance. I-squared estimate greater than or equal to 50% was interpreted as evidence of high levels of heterogeneity (Higgins 2002).
Assessment of reporting biases
Reporting biases arise when the dissemination of research findings is influenced by the nature and direction of results (Egger 1997). We are aware that funnel plots may be useful in investigating reporting biases but are of limited power to detect small-study effects. We did not use funnel plots for outcomes where there were ten or fewer studies, or where all studies were of similar sizes. In other cases, where funnel plots were possible, we sought statistical advice in their interpretation.
Where possible we employed a fixed-effect model for analyses. We understand that there is no closed argument for preference for use of fixed or random-effects models. The random-effects method incorporates an assumption that the different studies are estimating different, yet related, intervention effects. This does seem true to us, however, random-effects does put added weight onto the smaller of the studies - those trials that are most vulnerable to bias. For this reason we favour using fixed-effect models, employing random-effects only when investigating heterogeneity.
Subgroup analysis and investigation of heterogeneity
1. Subgroup analysis
It was expected that several subgroup analyses could be undertaken within this review. The following hypotheses were tested: When compared with placebo, for the primary outcomes of interest (see: "Criteria" for considering studies for this review) sulpiride is differentially effective for:
a. Men and women
b. People who are under 18 years of age (adolescent patients), between 18 and 64 (adult patients), or over 65 years of age (elderly patients).
c. People who became ill recently (i.e. acute episode approximately less than one month's duration) as opposed to people who have been ill for longer.
d. People who are given low doses (1-800mg/day) and those given high doses (over 800 mg/day).
e. People who have schizophrenia diagnosed according to any operational criterion (i.e. a pre-stated checklist of symptoms/ problems/ time periods/ exclusions) as opposed to those who have entered the trial with loosely defined illness.
f. People treated earlier (pre-1990) and people treated in recent years (1990 to 2002).
g. Duration of study: short term (less than 3 months), medium term (3-12 months) and long term (more than 1 year).
2. Investigation of heterogeneity
If data are clearly heterogeneous we checked that data are correctly extracted and entered and that we had made no unit of analysis errors. If the high levels of heterogeneity remained we did not undertake a meta-analysis at this point for if there is considerable variation in results, and particularly if there is inconsistency in the direction of effect, it may be misleading to quote an average value for the intervention effect. We would have wanted to explore heterogeneity. We pre-specify no characteristics of studies that may be associated with heterogeneity except quality of trial method. If no clear association could be shown by sorting studies by quality of methods a random-effects meta-analysis was performed. Should another characteristic of the studies be highlighted by the investigation of heterogeneity, perhaps some clinical heterogeneity not hitherto predicted but plausible causes of heterogeneity, these post-hoc reasons will be discussed and the data analysed and presented. However, should the heterogeneity be substantially unaffected by use of random-effects meta-analysis and no other reasons for the heterogeneity be clear, the final data were presented without a meta-analysis.
If necessary, we analysed the effect of including studies with high attrition rates in a sensitivity analysis. We aimed to include trials in a sensitivity analysis if they were quasi-randomised trials. If we found no substantive differences within primary outcome when these high attrition and 'quasi-randomised' studies were added to the overall results, we included them in the final analysis. However, if there was a substantive difference we only used clearly randomised trials and those with attrition lower than 25%.