Appendix 2. Previous data collection and analysis
1. Selection of trials
Material downloaded from electronic sources will include details of author, institution or journal of publication.
We (MM and AA) will inspect each report in order to ensure reliable selection. We will resolve any disagreement by discussion, and where there is still doubt, we will acquire the full article for further inspection. Once the full articles are obtained, we (MM and AA) will independently decide whether the studies meet the review criteria. If disagreement cannot be resolved by discussion, we will seek further information and add these trials to the list of those awaiting assessment.
2. Assessment of methodological quality
We will assess the methodological quality of each of included trials in this review using the criteria described in the Cochrane Handbook (Higgins 2005) and the Jadad Scale (Jadad 1996). The former is based on the evidence of a strong relationship between allocation concealment and direction of effect (Schulz 1995). The categories are defined below:
A. Low risk of bias (adequate allocation concealment)
B. Moderate risk of bias (some doubt about the results)
C. High risk of bias (inadequate allocation concealment). For the purpose of the analysis in this review, trials will be included if they met the Cochrane Handbook criteria A or B.
The Jadad Scale measures a wider range of factors that impact on the quality of a trial. The scale includes three items:
1. Was the study described as randomised?
2. Was the study described as double-blind?
3. Was there a description of withdrawals and drop outs?
Each item receives one point if the answer is positive. In addition, a point can be deducted if either the randomisation or the blinding/masking procedures described are inadequate. For this review we will use a cut-off of two points on the Jadad scale to check the assessment made by the Handbook criteria. However, we will not use the Jadad Scale to exclude trials.
3. Data collection
We (MM and AA) will independently extract data from selected trials, while KB and MAM will separately re-extract information from the same trials. When disputes arise we will attempt to resolve these by discussion. When this is not possible and further information is necessary to resolve the dilemma, we will not enter data but we will add the trial to the list of those awaiting assessment.
4. Data synthesis
4.1 Data types
We will assess outcomes using continuous (for example changes on a behaviour scale), categorical (for example, one of three categories on a behaviour scale, such as 'little change', 'moderate change' or 'much change') or dichotomous (for example, either 'no important changes' or 'important changes' in a person's behaviour) measures. Currently RevMan does not support categorical data so we will be unable to analyse these.
4.2 Incomplete data
We will not include trial outcomes if more than 50% of people are not reported in the final analysis.
4.3 Dichotomous - yes/no - data
We will carry out an intention-to-treat analysis. On the condition that more than 50% of people complete the study, we will count everyone allocated to the intervention, whether they completed the follow-up or not. We will assume that those who dropped out had the negative outcome, with the exception of death. Where possible, we will make efforts to convert outcome measures to dichotomous data. This can be done by identifying cut-off points on rating scales and dividing participants accordingly into 'clinically improved' or 'not clinically improved'. If the authors of a study have used a predefined cut-off point for determining clinical effectiveness, we will use this where appropriate. Otherwise, we will generally assume that if there has been a 50% reduction in a scale-derived score, this could be considered as a clinically significant response. Similarly, we will consider a rating of 'at least much improved' according to the Clinical Global Impression Scale (Guy 1976) as a clinically significant response.
We will calculate the relative risk (RR) and its 95% confidence interval (CI) based on the fixed-effect model. We will calculate the relative risk of statistically significantly heterogeneous outcomes using a random-effects model. When the overall results are significant, we will calculate the number needed to treat (NNT) and the number needed to harm (NNH) as the inverse of the risk difference.
4.4 Continuous data
4.4.1 Normally distributed data: continuous data on clinical and social outcomes are often not normally distributed. To avoid the pitfall of applying parametric tests to non-parametric data, we will apply the following standards to all data before inclusion: (a) standard deviations (SDs) and means are reported in the paper or are obtainable from the authors; (b) when a scale starts from the finite number zero, the SD, when multiplied by two, is less than the mean (as otherwise the mean is unlikely to be an appropriate measure of the centre of the distribution, (Altman 1996); (c) if a scale started from a positive value (such as PANSS which can have values from 30 to 210), the calculation described above will be modified to take the scale starting point into account. In these cases skew is present if 2 SD > (S-Smin), where S is the mean score and Smin is the minimum score. Endpoint scores on scales often have a finite start and end point and these rules can be applied to them. When continuous data are presented on a scale which includes a possibility of negative values (such as change on a scale), it is difficult to tell whether data are non-normally distributed (skewed) or not. We will enter skewed data from studies of less than 200 participants in additional tables rather than into an analysis. Skewed data poses less of a problem when looking at means if the sample size is large and we will enter these into a synthesis.
For change data (endpoint minus baseline), the situation is even more problematic. In the absence of individual patient data it is impossible to know if data are skewed, though this is likely. After consulting the ALLSTAT electronic statistics mailing list, we have decided to present change data in MetaView in order to summarise available information. In doing this, we will assume either that data are not skewed or that the analyses could cope with the unknown degree of skew. Without individual patient data it is impossible to test this assumption. Where both change and endpoint data are available for the same outcome category, we will only present endpoint data. We acknowledge that by doing this we will exclude much of the published change data, but argue that endpoint data is more clinically relevant and that if we present change data along with endpoint data, it would be given undeserved equal prominence. We will contact authors of studies reporting only change data for endpoint figures. We will report non-normally distributed data in the 'other data types' tables.
4.4.2 Rating scales: A wide range of instruments are available to measure mental health outcomes. These instruments vary in quality and many are not valid, or even ad hoc. For outcome instruments some minimum standards have to be set. It has been shown that the use of rating scales which have not been described in a peer-reviewed journal (Marshall 2000) are associated with bias, therefore we will exclude the results of such scales. Furthermore, we stipulate that the instrument should either be a self-report or be completed by an independent rater or relative (not the therapist), and that the instrument could be considered a global assessment of an area of functioning. However, as it is expected that therapists would frequently also be the rater, we will include such data but comment on the data as 'prone to bias'.
Whenever possible we will take the opportunity to make direct comparisons between trials that use the same measurement instrument to quantify specific outcomes. Where continuous data are presented from different scales rating the same effect, we will present both sets of data and inspect the general direction of effect.
4.4.3 Summary statistic
For continuous outcomes, we will estimate a weighted mean difference (WMD) between groups, again based on the fixed-effect model. Afterwards, we will carry out a sensitivity analysis to find heterogeneous data. We will re-assess heterogeneous data using a random-effects model.
4.5 Cluster trials
Studies increasingly employ 'cluster randomisation' (such as randomisation by clinician or practice) but analysis and pooling of clustered data poses problems. Firstly, authors often fail to account for intraclass correlation in clustered studies, leading to a 'unit of analysis' error (Divine 1992) whereby P values are spuriously low, confidence intervals unduly narrow and statistical significance overestimated. This causes type I errors (Bland 1997; Gulliford 1999).
Where clustering is not accounted for in primary studies, we will present the data in a table, with a (*) symbol to indicate the presence of a probable unit of analysis error. In subsequent versions of this review we will seek to contact first authors of studies to obtain intra-class correlation co-efficients (ICCs) of their clustered data and to adjust for this using accepted methods (Gulliford 1999). Where clustering has been incorporated into the analysis of primary studies, we will also present these data as if from a non-cluster randomised study, but adjust for the clustering effect.
We have sought statistical advice and have been advised that the binary data as presented in a report should be divided by a 'design effect'. This is calculated using the mean number of participants per cluster (m) and the ICC [Design effect = 1+(m-1)*ICC] (Donner 2002). If the ICC is not reported it will be assumed to be 0.1 (Ukoumunne 1999).
If cluster studies are appropriately analysed taking into account ICCs and relevant data documented in the report, synthesis with other studies will be possible using the generic inverse variance technique.
5. Investigation for heterogeneity
Firstly, we will consider all the included studies within any comparison to judge clinical heterogeneity. Then we will visually inspect graphs to investigate the possibility of statistical heterogeneity. This will be supplemented using, primarily, the I-squared statistic. This provides an estimate of the percentage of variability due to heterogeneity rather than chance alone. Where the I-squared estimate is greater than or equal to 75%, we will interpret this as indicating the presence of high levels of heterogeneity (Higgins 2003). If inconsistency is high, we will not summate data, but present this separately and investigate the reasons for heterogeneity.
6. Addressing publication bias
We will enter data from all identified and selected trials into a funnel graph (trial effect versus trial size) in an attempt to investigate the likelihood of overt publication bias.
7. Subgroup analyses
We will carry out a subgroup analysis to compare results between different interventions, as defined in 'Types of interventions'.
Where possible, we will enter data in such a way that the area to the left of the line of no effect indicates a favourable outcome for intermittent treatment.