Criteria for considering studies for this review
Types of studies
We will include randomised controlled trials, including cluster-randomised trials and cross-over trials, published or unpublished. While every effort will be made to obtain individual participant data (IPD) for trials which meet the selection criteria, an eligible trial will not be excluded if we cannot obtain access to the raw data.
Types of participants
We will include male and female participants over the age of 17, irrespective of culture and setting. As several sets of criteria are currently used to diagnose CFS (Sharpe 1991; Fukuda 1994; Carruthers 2011) we will include trials given that the patients fulfil the following diagnostic criteria for CFS:
Fatigue or a synonym is a prominent symptom;
Fatigue is medically unexplained (i.e. other diagnosis known to cause fatigue such as psychiatric disorders and cancer should be excluded);
Fatigue is sufficiently severe to significantly disable or distress the patient; and
Fatigue has persisted for at least six months.
We will include trials which include patients with disorders other than CFS as long as more than 90% of the patients had a primary CFS diagnosis according to the criteria above. Trials in which less than 90% of participants had a primary diagnosis of CFS will only be included in the analysis of this review if data for CFS are reported separately.
Studies involving participants with comorbid physical or common mental disorders are eligible for inclusion, as long as the diagnoses of CFS is not excluded by the comorbid condition.
Types of interventions
Exercise therapy as monotherapy or as an adjunctive treatment (e.g. exercise combined with pharmacological treatment). We define exercise therapy as aerobic or anaerobic interventions aimed at exercising big muscle groups, for example walking, swimming, jogging, strength or stabilising exercises. Both individual and group treatment modalities are eligible, but interventions should be clearly described and supported by appropriate references. We do not impose restrictions with regard to the duration of each treatment session, number of sessions or time between each session.
Trials presenting data from one of the following comparisons are eligible for inclusion:
Standard care – to include passive conditions of treatment as usual and waiting list, and active conditions of relaxation/flexibility, pacing and supportive listening.
Psychological therapies – to include cognitive behavioural therapies, psychodynamic therapies and humanistic/supportive therapies.
Pharmacological treatments – to include antidepressants, hypnotics, antiviral drugs and immunotherapy.
A non-active supportive listening is regarded as non-active by the trialists and active by participants, whereas a supportive therapy is regarded as active by both trialists and participants.
Types of outcome measures
Note: 'Validated' is defined as having undergone psychometric development and having been published in a peer-reviewed journal.
1. Fatigue, measured using a validated scale (e.g. Fatigue Scale (FS) (Chalder 1993) or the Fatigue Severity Scale (FSS) (Krupp 1989)).
2. Drop out from treatment.
3. Physical functioning, measured using a validated scale (e.g. SF-36, physical functioning sub scale (Ware 1992)).
4.Serious adverse reactions and events, measured using any reporting system (e.g. Serious Adverse Reactions (SAR) (European Union Clinical Trials Directive 2001)).
5. Pain, measured using a validated scale (e.g. Visual Analogue Scale for Pain (VASpain) (Finch 2002)).
6. Mood disorders, measured using a validated scale (e.g. Hospital Anxiety and Depression Scale (Zigmond 1983)).
7. Sleep duration and quality, measured using a validated scale (e.g. Pittsburgh Sleep Quality Index (Buysse 1989)).
8. Self-perceived changes in overall health, measured using a validated scale (e.g. Global Impression Scale (Guy 1976)).
9. Symptom severity, treatment response and efficacy, measured using a validated scale (e.g. Clinical Global Impression - Severity scale (CGI-S) (Guy 1976)).
10. Overall functional status, measured using a validated scale (e.g. Work and Social Adjustment Scale (Mundt 2002)).
11. Objective measures of fitness e.g. VO2max or VO2 at subjective maximal effort, measured for example by 100 watts on an exercise bicycle or running at six miles per hour.
Timing of outcome assessment
Data on each outcome will be extracted for short-term (end of treatment), medium-term follow up (three to nine months) and long-term follow up (nine months or more).
Search methods for identification of studies
The Cochrane Collaboration's Depression, Anxiety and Neurosis (CCDAN) Review Group's Trials Search Coordinator (TSC) will search their Group's Specialized Register (CCDANCTR-Studies and CCDANCTR-References). This register is created from routine generic searches of MEDLINE (1950- ), EMBASE (1974- ) and PsycINFO (1967- ). Details of CCDAN's generic search strategies, used to inform the CCDANCTR can be found on the Group‘s website.
The CCDANCTR-Studies Register will be searched using the following terms:
Diagnosis = ("Chronic Fatigue Syndrome" or fatigue) and Free Text = (exercise or sport* or relaxation or "multi convergent" or "tai chi")
The CCDANCTR-References Register will be searched using a more sensitive list of free-text search terms to identify additional untagged/uncoded references, e.g. fatigue*, myalgic encephalomyelitis*, exercise, physical active* and taiji. Full search strategy listed in Appendix 1.
A complementary search of the following bibliographic databases and international trial registers will also be conducted (see Appendix 2):
SPORTSDiscus (1985 to present);
The Cochrane Central Register of Controlled Trials (CENTRAL, all years); and
WHO International Clinical Trials Portal.
Searching other resources
We will contact the authors of included studies, and screen reference lists to identify additional published or unpublished data. We will also conduct citation searches using the ISI Science Citation Index on the Web of Science.
Data collection and analysis
Selection of studies
Two review authors will screen the titles and abstracts obtained from the searches, independently. Trials that appear to fulfil the selection criteria will be noted and full-text articles retrieved. Two review authors will independently assess the full-text articles for adherence to the selection criteria. In the case of disagreement, we will attempt to reach a resolution through discussion. Should this prove unsuccessful, a third review author or staff at the CCDAN editorial base will be consulted.
The trialists of the included trials will be invited to take part in a collaborative group by a letter (Appendix 3) stating the main aims and purpose, importance of contribution, publication policy and confidentiality of data. The trialists who provide data will be offered co-authorship according to the Recommendations for the Conduct, Reporting, Editing, and Publication of Scholary Work in Medical Journals (ICMJE 2014). The local secretariat will be based at the Norwegian Knowledge Centre of the Health Services, and the first author will act as project manager. The CCDAN editorial group will act as advisory board. We will ask the trialists to provide their raw data to enable us to assemble the most complete data set possible, including all randomised participants, using flexible data formats containing pseudo-anonymised patient data. The data will be kept and used in accordance with the Norwegian Data Inspectorate's recommendations, and the parties will sign a contract (Appendix 4) to this end. An eligible trial will not be excluded if we cannot obtain access to the raw data.
We will prepare a flow diagram decipiting the flow of references/studies through the different phases of the review in accordance with the PRISMA statement (Mother 2009). The flow diagram will map out the number of records identified, included and excluded and reasons for exclusion.
Data extraction and management
Trialists of included studies will be invited to collaborate in accordance with section 18.2 of the Cochrane Handbook (Stewart 2011). They will be offered authorship and asked to provide IPD for all randomised participants to be used in the review. A list of variables that we will particularly request is attached (Appendix 5), but we will accept data sets containing more variables than requested. We will accept all data sent in data formats that can be read by SAS or SPSS, Microsoft Access databases, Excel and delimited or (comma-)separated text-files. The review authors will adopt data security measures to ensure data protection, and to ensure that the data cannot be violated or tracked. All data sets will be stored securely and pseudo-anonymously; that is, all identifiers that potentially could be linked directly to the actual participants will be deleted, and identifiers will only be identifiable to the original investigators.
Once the raw data is received from the trialists, checks for consistency and comparison with results presented in the journal papers, will be performed by JO-J. Any queries arising from these checks will be resolved in cooperation with the trialists. All analyses based on the Individual Patient Data from the included trials for this review will be undertaken by JO-J, while meta-analyses will be performed by JO-J and KGB, independent of trialists. Descriptive data (methodology, treatment, comparator, and instruments used for measuring outcomes) for each of the included studies will be independently extracted by JO-J and LL using a standardised data collection form (Appendix 6) which will include the following variables.
Study methodology specific:
Diagnostic criteria used for identifying eligible patients;
Method of recruitment for trial;
Allocation concealment; and
Deliverer of intervention;
Explanation and material;
Type of exercise;
Duration of sessions;
Initial exercise level;
Patient self monitoring; and
Criteria for (non) increment.
Scales used for assessment
Quality of life.
Measurement time points/follow-up will be collected and study authors contacted to provide additional information where gaps have been identified.
For included studies from which we are not able to gain access to IPD, relevant information (both the above-mentioned data and results for the prespecified outcomes) will be extracted. For adverse events, we will extract the number of events and number of participants in each group. For the remaining (continuous) outcomes, we will extract N, mean and standard deviation.
Exercise therapy versus treatment as usual/waiting list/supportive listening/pacing/cognitive treatment. We also plan to conduct subgroup comparisons for each comparison above to see if one of the more passive comparators is more effective than another.
Exercise therapy (as monotherapy or adjunctive therapy) versus cognitive behavioural therapy. Monotherapy and adjunctive therapy will be analysed separately.
Exercise therapy (as monotherapy or adjunctive therapy) versus pharmacological treatment.
If there are several drug trials we will divide the comparators into antidepressants, hypnotics, antiviral and immunotherapy. Monotherapy and adjunctive therapy will be analysed separately.
Data checking and cleaning
To reduce potential bias (Tierney 2005), we will request information for all randomised patients including those who had been excluded from the investigators' original analyses. A number of standard checks will be applied to all incoming trials, including checks for missing values, data validity and consistency across variables. To assess the randomisation integrity, we will look for unusual patterns in the sequencing of allocation or imbalances in baseline characteristics between treatment arms. Follow-up of patients will be assessed to ensure that it was balanced in the treatment arm, and as up-to-date as possible.
All incoming data will be checked thoroughly for consistency and completeness of follow-up. We will tabulate summary measures (frequencies for categorical variables and mean, standard deviations, minimum, maximum, median, 25th and 75th percentile for continuous variables) for the individual parameters (demographic variables, illness-specific variables and outcomes) for each study to identify missing data and outliers, and to describe differences in distribution between the studies. We will analyse patient and disease characteristics and treatment outcome by trial and treatment arm to check consistency with published results. If we encounter any problems regarding missing data, obvious errors, discrepancies (e.g. between published data and raw data), inconsistencies between variables or extreme values or inability to replicate the results presented in the retrieved papers, we will resolve these in discussion with the original investigator. We will maintain a log of all changes made to the data originally supplied by the trialists, and the reasons for these changes. Any queries will be resolved and the final database entries verified by the responsible trial investigator or statistician.
As the summary statistics for outcomes in this review may differ from the summary statistics presented in the retrieved papers due to the possible use of imputed data in the original analyses by trial investigators, we will present summary statistics for all outcomes by trial and treatment arm in a table (showing sample size, mean, standard deviation, minimum and maximum).
Assessment of risk of bias in included studies
Two authors (LL and JO-J) will use the Cochrane Collaboration's tool for assessing risk of bias (Higgins 2011a), published in the most recent version of the Cochrane Handbook (Higgins 2011b). This tool encourages consideration of how the allocation sequence was generated, how allocation was concealed, the integrity of blinding at outcome level, the completeness of outcome data, selective outcome reporting (only applicable in cases where we do not gain access to complete data sets) and other potential sources of bias. When it comes to blinding, we will distinguish between performance bias (blinding of participants and personnel) and detection bias (blinding of outcome assessors; for outcomes not reported/assessed by the participants). As all the outcomes we intend to look at are subjective (self report by participants, or the use of scales that are based on judgements) we will include an item for objective outcomes. Each item in the 'Risk of bias' assessment will be assessed as low, high or unclear risk of bias using the guidelines outlined in the Cochrane Handbook (Higgins 2011a). In addition, we will make an overall assessment of the risk of bias across all items for each included study. If one or more of the items sequence generation, allocation concealment or completeness of outcome data are assessed as being at high risk of bias, the overall assessment of the study will be high risk of bias.
We will perform sensitivity analyses in which studies assessed to be at high risk of bias (across all items) are excluded. The reasons for the judgement 'high risk of bias' might vary between studies where IPD are available and studies where IPD are not available. Selective outcome reporting will only be a problem for non IPD-studies as we have access to all data from the included studies with IPD. Furthermore we can reduce the risk of bias due to non-completeness of outcome data for studies with IPD by using statistical methods that do not exclude participants based on missing data (such as analysis of longitudinal data, or the use of censoring when analysing time-to-event data).
Measures of treatment effect
1. Binary data
We will calculate the odds ratio (OR) and its 95% confidence interval (CI).
2. Continuous data
When the same scale has been used in all included studies, we will calculate the mean differences (MD) and their 95% CI, as it preserves the original units and is therefore easier to interpret. Where different scales are used for the same outcome, effect sizes will be calculated separately for each scale.
2.1 Change versus endpoint data
Mean differences will primarily be based on endpoint data. For studies where IPD are not available, we will only use change data when endpoint data are not available. In cases where results from some studies are based on change data and results from other studies are based on endpoint data, all studies will be included in the same meta-analysis with change data and endpoint data as subgroups.
Unit of analysis issues
Studies increasingly employ 'cluster randomisation' (such as randomisation by clinician or practice) but analysis and pooling of clustered data poses problems. Firstly, authors often fail to account for intra-class correlation in clustered studies, leading to a 'unit of analysis' error (Divine 1992) whereby P values are spuriously low, confidence intervals unduly narrow and statistical significance overestimated. This causes type I errors (Bland 1997; Gulliford 1999).
Where clustering is not accounted for in primary studies, we will present data in a table, with a (*) symbol to indicate the presence of a probable unit of analysis error. We will seek to contact first authors of studies to obtain the intra-class correlation co-efficient (ICC) of their clustered data and to adjust for this using accepted methods (Gulliford 1999). Where clustering is incorporated, we will present the data as if from a parallel-group randomised study, but adjusted for the clustering effect. We will additionally exclude such studies using a sensitivity analysis.
If cluster studies are appropriately analysed taking into account ICC and relevant data documented in the report, synthesis with other studies will be possible using the generic inverse variance technique.
A major concern of cross-over trials is the potential for carryover effect. It occurs if an effect (e.g. pharmacological, physiological or psychological) of the treatment in the first phase is carried over to the second phase. As a consequence on entry to the second phase the participants can differ systematically from their initial state despite a wash-out phase. For the same reason cross-over trials are not appropriate if the condition of interest is unstable (Elbourne 2002). As both effects are very likely in CFS/ME, randomised cross-over studies will be eligible for inclusion, but only data up to the point of first cross-over will be used, while data from the following (second) period of the cross-over trial will not be considered for analysis. This might introduce bias due to the possibility of selectively reporting results from the first period based on the results. We will exclude cross-over studies using a sensitivity analysis.
Studies with multiple treatment groups
1. Multiple dose groups
We expect that some studies will address the effects of different levels of supervision and follow-up in regards to the exercise intervention to the comparator (e.g. sessions for designing exercise therapy, sessions for designing exercise therapy and planned telephone contacts, sessions for designing exercise therapy and seven face to face treatment sessions, and usual care). In the case of dichotomous outcomes we will sum up the sample sizes and the number of people with events across all intervention groups. For continuous outcomes, we will combine means and standard deviations using methods described in Chapter 7 (section 188.8.131.52) of the Cochrane Handbook (Higgins 2011b).
2. Multiple medications
We expect that some other studies will combine several interventions with one comparison group. In this case we will analyse the effects of each intervention group versus placebo separately, but will divide up the total number of participants in the placebo group. In the case of continuous outcomes the total number of participants in the placebo group will again be divided up, but the means and standard deviations will be left unchanged (see chapter 16, section 16.5.4 in Higgins 2011b).
Dealing with missing data
Analyses of all endpoints, subsets and subgroups will carried out on the basis of the intention-to-treat principle but based on the available data; that is, participants will be analysed according to their allocated treatment, irrespective of whether they received that treatment or not, but no attempt will be made to impute missing data. In our request for the raw data from the included trials it will be made clear that data are needed for all randomised participants.
Assessment of heterogeneity
We will assess clinical heterogeneity across the included trials in terms of interventions, participants and settings. We expect that the trials might differ when it comes to the ingredients of the active interventions and the components of the passive controls (treatment as usual, waiting list, etc.). We furthermore anticipate that the severity and duration of CFS might differ between the trials. If we judge that the included trials are too heterogeneous (e.g. we expect the effect sizes across trials to be unrelated) to warrant a formal meta-analysis, we will not perform meta-analysis but present the results of the included trials narratively.
We will assess statistical heterogeneity on the basis of the Cochrane Handbook recommendations (I2 values of 0% to 40%: might not be important; 30% to 60%: may represent moderate heterogeneity; 50% to 90%: may represent substantial heterogeneity; 75% to 100%: considerable heterogeneity). In addition to the I2 value (Higgins 2003), we will present the χ2 and its P value and consider the direction and magnitude of the treatment effects. As in meta-analysis with few studies, the χ2 test is underpowered to detect heterogeneity should it exist; a P value of 0.10 is used as a threshold of statistical significance.
Assessment of reporting biases
Reporting biases arise when the dissemination of research findings is influenced by the nature and direction of results. These biases included publication bias and selective outcome reporting bias. Funnel plots can be useful in reporting biases (Sterne 2011, section 10.4). It is important to bear in mind when interpreting a funnel plot, that publication and selective outcome reporting biases are not the only reasons for asymmetry. Poor methodological quality of studies, inappropriate analysis and true heterogeneity between trials can also lead to funnel plot asymmetry.
We will only produce funnel plots if at least 10 studies are included for that specific outcome and the studies are not similar in size. Funnel plots will be inspected visually and Egger's test for asymmetry (Egger 1997) used to assess the risk of reporting bias. We will interpret the results with caution; all test results will be assessed in light of the visual inspection. We acknowledge that tests for funnel plot symmetry in general have relatively low power to detect funnel plot asymmetry. Thus bias cannot be excluded even if the test does not provide evidence of funnel plot asymmetry.
For studies where we obtain IPD, we do not consider selective outcome reporting to be a problem, as we have access to all data collected. For the remaining non-IPD studies, selective outcome reporting can be an issue, which will be addressed during the 'Risk of bias' assessment.
Data from the included studies will be analysed using a two-step approach (Riley 2008).
At the first step, we will analyse the IPD for each trial, separately. For continuous outcomes the study-level analyses will be based on repeated measurements with a reference group coding of independent factors, thus taking into account the correlation between baseline and post-intervention measurements. Data from all measurement points will be included in one single model. The post intervention measurements will be modelled as depending on the baseline measurement, time. group (intervention or control) and the interaction between time and group. The repeated measurements (from the same person) will be assumed to have an unstructured covariance structure. The analyses of data from the individual included trials will be conducted using the MIXED procedure in SAS (SAS 2009). For each trial the estimate of effect at any given measurement point will be calculated as the difference between the estimated value of the dependent variable in the intervention and control groups, respectively (at that measurement point); the corresponding 95% confidence intervals will also be calculated.
If a study has performed more than one measurement of the same outcome at baseline, the baseline value for our analysis will be defined as the last value. If measurement of an outcome has been performed more than once at each measurement point, then we will base our analysis on the first-measured value.
For studies where no IPD are available, the estimate of effect (MD with standard error) will be based on the sample sizes, means, standard deviations, confidence intervals and P values extracted from papers.
At the second step we will combine the estimates of effect across studies in meta-analysis. The primary analyses will be based on all included studies, both IPD and non-IPD, but we will conduct sensitivity analyses in which studies where IPD are not available will be excluded. The estimates of effect from all included studies will be pooled using the generic inverse variance technique in a random-effects model.
Adverse events will be counted and reported for each study using odds ratios (ORs) and their 95% confidence intervals. A random-effects model will be used to estimate effects across studies.
We will use GRADE to examine the quality of evidence and the strength of recommendations. Judgment of the strength of a recommendation will require consideration of the following factors: the balance between benefit and harm, the quality of the evidence, translation of the evidence into specific circumstances and the certainty of the baseline risk (Guyatt 2008).
Subgroup analysis and investigation of heterogeneity
To examine the potential impact of trial design and the treatments used, we plan to group trials by important aspects that might influence the effect of exercise therapy.
Heterogeneity will be explored on the following trial-specific items:
Control groups (treatment as usual/waiting list versus relaxation/flexibility);
Diagnostic criteria used for assessing eligibility of participants (Oxford (Sharpe 1991) versus CDC1994 (Fukuda 1994) versus London ME criteria (The London Criteria 1994) versus ICC (Carruthers 2011)); and
Setting (primary versus secondary versus tertiary care).
For each of these analyses a pooled measure of treatment effects will be calculated for each group of trials and for all trials together.
Age (less than 45 years of age versus 45 years or older);
Length of syndrome history (less then 5 years versus more than 5 years);
Baseline illness severity (dichotomised);
Baseline anxiety (Yes/No);
Baseline depression (Yes/No);
Diagnostic criteria met (Met least strict criteria versus met two sets of criteria versus met three sets of criteria versus met all four sets of criteria); and
Illness beliefs (virus/psychological/combination).
We will furthermore explore heterogeneity on the following intervention-specific items:
Baselining (no determination/patient-centred/physiological);
Type of exercise (aerobic/anaerobic-strengthening/anaerobic-non-strengthening);
Explanations and materials (no cognitive component/educational-didactic/educational-didactic plus therapist using cognitive approaches); and
Incremental steps (none/pacing/mutually planned and expected/physiological response to exercise).
For the participant-specific items we will calculate separate estimates of treatment effect for each subgroup using the same methods as for the main analyses. Pooled measures of treatment effect will be calculated for each subgroup of the population, but not across subgroups due to the dependency between estimates of effect from the same trial.
If enough trials are included (a minimum of 10 studies with all relevant data available per comparison), meta-regression (random-effects) will be performed to formally explore heterogeneity or differences between subgroups of trials or populations. We will perform one meta-regression for each variable we wish to explore (not adjusted for the other proposed subgroup variables). The meta-regression will be conducted using the MIXED procedure with random error terms for each trial in SAS (SAS 2009). Due to the number of subgroup analyses we will adjust the level of significance to 0.05/(#number of subgroup analyses) using the Bonferroni correction method.
Sensitivity analysis will be performed, separately, by excluding:
studies assessed as being at high risk of bias on one or more of the 'Risk of bias' items: sequence generation, allocation concealment or completeness of outcome data;
cross-over trials (possible selective reporting of results from first period in non-IPD studies); and
studies where IPD are not available (estimates of treatment effect depends on the method of analysis).
We will, in addition, perform sensitivity analyses based on standardised mean differences (SMD) for continuous outcomes that have been measured using different instrument/scales in the included studies. For the outcome Clinical Global Impression Scale (CGI) we will perform sensitivity analyses based on dichotomised values:
'Summary of findings' tables
We will prepare 'Summary of findings' tables to summarise the key findings of the systematic review in line with the standard methods described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011b) using GRADEpro (Brozek 2008). These findings will include: