Criteria for considering studies for this review
Types of studies
We will include:
randomised controlled trials
randomised cross-over trials, in which participants are randomly allocated to one of two groups; one group receives the active intervention and then the control intervention, and the other group receives the control and active interventions in the other order (see Higgins 2011, Chapter 16.4)
cluster randomised trials
controlled before-after trials: non-randomised trials in which the intervention takes place in one group but not in another, which serves as a control group. The outcomes are measured once before and once after the intervention.
(non-randomised) cross-over trials: the same as cross-over trials above, but allocation is not random
interrupted time series: uncontrolled before-after trials that have measured the outcomes at least three times before the intervention and three times after the intervention.
For possible meta-analyses we will have the following groups:
randomised trials including cluster randomised trials and randomised cross-over trials
controlled-before after trials including non-randomised cross-over trials,
interrupted time series.
We will also examine laboratory trials. We define laboratory trials as trials in which recruited individuals are subject to the intervention in a laboratory setting, not their respective workplace. We will present data from laboratory studies in a separate table, and use the data for comparison in the Discussion section, rather than for primary decision making.
We will include studies reported as full-text, those published as abstract only, and unpublished data.
Types of participants
We will include any adult workers (age > 18) in shift work schedules that include night shift work, irrespective of industry, country, age or co-morbidities. We will describe and analyse studies examining treatments for workers with sleep disorders separately from those examining general, unselected worker populations. We will also include laboratory studies of people in simulated night shift work in a table in the Discussion section (see above).
Types of interventions
We will include any intervention that deals with a shift work schedule. Both the intervention and the control group (or data) should examine the effects of shift work schedules. Any intervention that includes a change in the shift work schedule will be included.
Types of outcome measures
Sleep-wake disturbance associated with shift work is one of the core health problems of shift workers. In order to characterise the sleep-wake disturbance, we will include studies that have measured intervention effects with the following outcome measures:
Sleep quality off shift: measured with a validated questionnaire such as the Bergen Insomnia scale (Pallesen 2008), Pittsburg Sleep Quality Index (Buysse 1989), Basic Nordic Sleep Questionnaire (Partinen 1995), Jenkins Sleep Questionnaire (Lallukka 2011), Karolinska Sleepiness Scale (Kaida 2006), relevant questions in the Standard Shift Work Index, or wrist-worn actigraphy-based measures. These questionnaires measure more than just sleep quality. However we are only interested in the questions relating to sleep quality and sleep length. We will also accept sleep quality as measured in sleep diaries.
Sleep length off shift: Average length of sleep based on the relevant questions in validated questionnaires (see examples above) or on sleep diaries, or wrist-worn actigraphy.
Sleepiness during shift: Sleepiness measured at the beginning, middle and end of the shift. The disadvantage of sleepiness outcomes is that they are measured at specific time points and do not provide overall measures for sleepiness. Sleepiness can be operationalised as:
Self-rated (subjective) sleepiness measured with a validated questionnaire such as the Karolinska Sleepiness Scale (Kaida 2006), Stanford Sleepiness Scale (Herscovitch 1981; Hoddes 1972), relevant questions in the Standard Shift Work Index (Barton 1995), or other visual analogue scales, or
Physiological sleepiness measured by electrophysiological methods while working (e.g. electroencephalogram or electro-oculogram measurement while driving a train) or by standardised physiological tests of sleepiness such as the Multiple Sleep Latency Test (Carskadon 1986), the Maintenance of Wakefulness Test (Mitler 1982) or the pupillometric assessment, or
Behavioural sleepiness measured as performance in a validated vigilance test such as the Psychomotor Vigilance Test (e.g. Basner 2011; Thorne 2005), the MackWorth Clock Test (Mackworth 1950), or single or multiple-choice reaction time tests, or
behavioural sleepiness measured as characteristics of overt behaviour that are identified through video recording methods such as an Observer Rating of Drowsiness (e.g. Wierwille 1994), or PERCLOS (percentage of eyelid closure) (Dinges 1998; Sommer 2010).
Fatigue usually refers to exhaustion or tiredness due to long-lasting exertion. However, because there are some differences in the use of these terms in different countries (e.g. between Europe and Australia), we will also include fatigue as an outcome measure when it is used as a measure of sleepiness at work. Therefore, we will include studies measuring fatigue at any moment during the shift as a self-reported outcome measured with a validated questionnaire or interview.
In studies that report primary outcomes of this review we will examine the following secondary outcomes:
number of staff
number of hours worked
A full cost-effectiveness analysis is beyond the scope of this review, as it would require information not only on our primary outcomes and their 'value' (e.g. willingness to pay) but also of potential adverse effects of shift systems such as errors or injuries and their costs and 'values'. Errors and injuries in shift workers are being evaluated in another Cochrane review (Ker 2009).
Search methods for identification of studies
We will search the following databases from inception to the present:
Cochrane Central Register of Controlled Trials (CENTRAL, The Cochrane Library)
Web of Knowledge (http://isiknowledge.com/ )
ProQuest Dissertations & Theses
We present search strategies for the first six databases as Appendix 1; Appendix 2; Appendix 3; Appendix 4; Appendix 5 and Appendix 6. As we will conduct another Cochrane review assessing the effects of person-directed non-pharmacological interventions for preventing and treating sleep disturbances caused by shift work (Herbst 2013) in conjunction with this review, we will run only one joint systematic search to avoid needless duplication of work. We will search Proquest using subject headings and keywords (defined by the authors of the publications) only.
Since the search term 'shift' alone leads to a very high number of citations, we have included many relevant combinations of the term 'shift' with other terms used to describe specific shifts. Examples are shift work, night shift, shift schedule and graveyard shift. We also account for terms that describe shift work, but do not use the word 'shift'. such as duty time or hours (e.g. transport industry), rota (medicine) or the 4-day week alias compressed work week used to denote a series of 12-hour shifts. The search is limited by terms for different outcomes or types of interventions. Since only abstracts are searched, we have included terms near, but not exactly covered by the inclusion criteria. A second limitation is by type of trial (not for all databases).
Searching other resources
We (CH, MK) will check reference lists of all primary studies and review articles for additional references. We will contact experts in the field to identify additional unpublished materials. We (GC, MK) will search the conference proceedings of the biannual symposium on shift and night work. We (CH, MK) will search the World Health Organization Trial Register (www.who.int/ictrp/) as well as the most important trial registers within this register directly (www.clinicaltrials.gov; https://www.clinical trialsregister.eu/).
Data collection and analysis
Selection of studies
Two review authors (CH, MK) will independently screen titles and abstracts for inclusion of all the potential studies we identify as a result of the search and code them as 'retrieve' (eligible or potentially eligible/unclear) or 'do not retrieve'. We will retrieve the full-text study reports and two review authors per study (CH, MK) will independently screen them for inclusion. They will also identify and record reasons for the exclusion of ineligible studies. We will resolve any disagreement through discussion or, if required, we will consult a third person (TE). We will identify and exclude duplicates and collate multiple reports of the same study so that we include studies rather than reports of studies in the review. We will record the selection process in sufficient detail to complete a PRISMA flow diagram and 'Characteristics of excluded studies' table. We will also seek to obtain further information from the study authors when a paper is found to contain insufficient information to enable us to reach a decision on eligibility.
Data extraction and management
Two review authors per study (CH, TD, LF, MK, GC, JL, RF) will independently extract data from each of the included trials. We will extract the following information and present it in the review:
methods: type of trial, allocation, inclusion criteria, statistical analysis
basic information: country, dates of study (beginning and end of allocation or study), duration of study, number of participants, number of participants evaluated, information about shift schedules
basic information about the participants: age, sex, occupations, chronotype (morningness-eveningness score or similar)
intervention: details of interventions being compared, other interventions also performed at the same time
which outcomes were measured, their definitions, which ones are reported
outcome data for the outcomes relevant to this systematic review
funding for trial, and notable conflicts of interest of trial authors.
For randomised laboratory studies we will briefly extract the following:
details of the interventions compared, including any interventions performed at the same time in both groups,
number of participants,
country and duration of the trial,
which outcomes were measured and how, and
results of outcomes relevant to this review.
Assessment of risk of bias in included studies
Two review authors per study (CH, MK, TD, LF, JL, RF) will independently assess the risk of bias of the included studies. We will consult a third review author (TE) when disagreements occur, and mutual agreement will be obtained. If information is absent for evaluation of the methodological criteria, we will contact the trial authors to seek additional information. Where possible, we will use quotes from the text to support the review authors' judgements.
We will use the Cochrane 'Risk of bias' tool for all study types. In order to avoid empty Risk of Bias tables for certain studies and many additional tables, we will construct a single list of domains that can be used to assess all studies in the review. We will write "not relevant to this study type" in Risk of Bias domains not relevant to the study type.
Randomised controlled trials
We will use the 'Risk of bias' tool according to the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011) as implemented in the RevMan software program. We will select 'Unclear' risk of bias if there is insufficient information to evaluate the domain.
We will evaluate the following domains (taken directly or modified where applicable from Higgins 2011):
random sequence generation:
We will consider a trial to be low risk of bias if it describes a random element in sequence generation, such as using:
a random number table,
a computer random number generator,
shuffling cards or envelopes,
We will consider a trial to be high risk of bias if the authors describe sequence generation using:
odd or even date of birth,
or a rule based on e.g. work record number.
We will consider a trial to be low risk of bias if it reports:
central allocation (including telephone, web-based and pharmacy-controlled randomisation),
or sequentially-numbered, opaque, sealed envelopes.
We will consider a trial to be high risk of bias if it reports:
using an open random allocation schedule (e.g. a list of random numbers),
assignment envelopes without appropriate safeguards (e.g. if envelopes were unsealed or nonopaque or not sequentially numbered),
alternation or rotation,
date of birth,
or any other explicitly unconcealed procedure.
blinding of participants and personnel: We will omit this domain as it is not possible to blind participants or organising personnel to different shift schedules.
blinding of outcome assessors (evaluated for each outcome separately):
We will consider a trial to be low risk of bias if:
there is no blinding of outcome assessment, but the review authors judge that the outcome measurement is not likely to be influenced by lack of blinding, or
blinding of outcome assessment is ensured, and it is unlikely that the blinding could have been broken.
We will consider a trial to be high risk of bias if:
there is no blinding of outcome assessment, and the outcome measurement is likely to be influenced by lack of blinding, or
there is blinding of outcome assessment, but it is considered likely that the blinding could have been broken, and the outcome measurement is likely to be influenced by lack of blinding.
incomplete outcome data (evaluated for each outcome separately):
We will consider a trial to be low risk of bias if:
there are no missing outcome data,
reasons for missing outcome data are unlikely to be related to the true outcome,
missing outcome data are balanced in numbers across intervention groups, with similar reasons for missing data across groups,
in dichotomous outcome data, the proportion of missing outcomes compared with the observed event risk is not large enough to have a clinically-relevant impact on the intervention effect estimate,
in continuous outcome data, a plausible effect size (difference in means or standardised difference in means) among missing outcomes is not large enough to have a clinically-relevant impact on observed effect size, or
missing data have been imputed using appropriate methods.
We will consider a trial to be high risk of bias if:
the reason for missing outcome data is likely to be related to the true outcome, with either imbalance in numbers or reasons for missing data across intervention groups,
in dichotomous outcome data, the proportion of missing outcomes compared with observed event risk is large enough to induce clinically-relevant bias in the intervention effect estimate,
in continuous outcome data, the plausible effect size (difference in means or standardised difference in means) among missing outcomes is large enough to induce clinically-relevant bias in observed effect size,
‘as-treated’ analysis done with substantial departure of the number of participants assigned at randomisation (or beginning of the trial),
there is potentially inappropriate application of simple imputation, or
in cluster randomised trials, loss of full clusters is likely to introduce bias.
selective outcome reporting
We will consider a trial to be low risk of bias if:
the study protocol is available and all of the study’s pre-specified (primary and secondary) outcomes that are of interest in the review have been reported in the pre-specified way, or
the study protocol is not available but it is clear that the published reports include all expected outcomes, including those that were pre-specified (convincing text of this nature may be uncommon).
We will consider the trial to be of high risk of bias if:
not all of the study’s pre-specified primary outcomes have been reported,
one or more primary outcomes is reported using measurements, analysis methods or subsets of the data (e.g. sub-scales) that were not pre-specified,
one or more reported primary outcomes were not pre-specified (unless clear justification for their reporting is provided, such as an unexpected adverse effect),
one or more outcomes of interest in the review are reported incompletely so that they cannot be entered in a meta-analysis, or
the study report fails to include results for a key outcome that would be expected to have been reported for such a study.
outcome reliable or objectively measured (for each of the outcomes relevant to the review)
We will consider the outcome to have a low risk of bias if the outcome is measured objectively (e.g. psychomotor vigilance test) or two or more raters have an agreement > 90% or kappa => 0.8.
We will consider the outcome to have a high risk of bias if two or more raters have an agreement < 90% or kappa < 0.8.
other sources of bias. We will mention any other sources of bias identified in this field.
Randomised cross-over trials
We will assess all items for randomised controlled trials and in addition we will assess the following domain. We will report 'unclear' risk of bias if there is insufficient information to evaluate the domain.
Cluster randomised trials
We will assess all items for randomised controlled trials in addition to the domains below.We will report 'unclear' risk of bias if there is insufficient information to evaluate the domain.
recruitment bias (e.g. Puffer 2003). Recruitment of individuals to different clusters after randomisation may occur. This may lead to different types of participants being recruited to the different clusters. Should there be evidence of this which may change the interpretation of the results, we will consider the trial to be at high risk of bias. We will consider trials to be of low risk of bias if we judge the effect of recruitment to different clusters after randomisation not to influence the outcome, or if the trial reports no or minimal recruitment after randomisation.
appropriate statistical analyses. These trials do not always take the cluster effect into account.
We will consider the trial to be of low risk of bias if:
the analysis takes the cluster effect into account (see section 16.3 in Higgins 2011), or
the analysis can be corrected provided the following information is available a) the number of clusters (or groups) randomised to each intervention group or the average (mean) size of each cluster b) the outcome data ignoring the cluster design for the total number of individuals (for example, number or proportion of individuals with events, or means and standard deviations) and c) an estimate of the intracluster (or intraclass) correlation coefficient (ICC) (section 16.3.4 in Higgins 2011).
We will consider the trial to be at high risk of bias if the analysis does not take the cluster effect into account and the analyses cannot be corrected.
Interrupted time series
Ramsay 2003 offers a risk of bias assessment method for interrupted time series. We will assess all items included in Ramsay 2003,with the addition of selective outcome reporting. Items 3 (blinding of outcome assessors), 4 (outcome measure reliable) and 5 (incomplete data) in that list are included in the 'Risk of bias' tool for assessing randomised controlled trials, described above.
We will use domains four to seven from the above list for randomised controlled trials in addition to the domains below. We are therefore following Ramsay 2003, but adding 'selective outcome reporting'. We will report 'unclear' risk of bias if there is insufficient information to evaluate the domain.
intervention done independently of other changes over time
We will consider the study to have a low risk of bias if we judge based on reports that the intervention was independent of other changes over time.
We will consider the study to have a high risk of bias if we judge based on reports that the intervention was not independent of other changes over time.
intervention unlikely to affect data collection
We will consider the study to have a low risk of bias if we judge based on reports that the intervention was unlikely to affect data collection e.g. same sources and methods of data collection before and after the intervention.
We will consider the study to have a high risk of bias if we judge based on reports that data collection was likely affected by the intervention.
shape of the intervention effect prespecified
We will consider the study to have a low risk of bias if a rational explanation for the shape of intervention effect was given by the author(s) of the study.
We will consider the study to have a high risk of bias if the explanation for the shape of intervention effect is inadequate.
rationale for the number and spacing of data points
We will consider the study to have a low risk of bias if an (adequate) rationale for the number of points is stated (e.g., monthly data for 12 months post-intervention was used because the anticipated effect was expected to decay) or a sample size calculation was performed that influenced the study design and used reasonable assumptions.
We will consider the study to have a high risk of bias if an (adequate) rationale for the number of points is not available from the author(s) and no, or an inadequate, sample size calculation was performed.
appropriate statistical analyses
We will consider studies to have a low risk of bias if:
autoregressive integrated moving average (ARIMA) models were used, or
time series regression models were used to analyse the data and serial correlation was adjusted/tested for, or
we can correct the analyses.
We will consider the trial to be at high risk of bias if none of the above were done/are possible.
Controlled before-after studies
We will use domains four through eight of the randomised controlled trials 'Risk of bias' tool, and items one and two of the assessment for interrupted time series, to assess the risk of bias in controlled before-after studies. However, two important sources of bias relevant for controlled before-after trials are not covered by the above. We will report 'unclear' risk of bias if there is insufficient information for evaluation.
baseline differences between groups. We will consider the variables type and place of work, age, sex and chronotype.
We will consider studies to have a low risk of bias if these four variables are reported to be similar.
We will consider studies to have a high risk of bias if the authors judge based on the reports that one of these variables differs enough to introduce bias.
appropriate statistical analyses
We will consider studies to have a low risk of bias if the analysis is considered adequate or can be corrected.
We will consider studies to have a high risk of bias if the analysis is considered inadequate, for example if:
it does not report baseline data and changes from baseline for both participants in the intervention group and controls, or
confounding is not adequately addressed in the analysis.
Non-randomised cross-over trials
In addition to the assessment of risk of bias for randomised cross-over trials, we will assess the additional items listed at controlled before-after studies above.
Assessment of bias in conducting the systematic review
We will conduct the review according to this published protocol and report any deviations from it in the 'Differences between protocol and review' section of the systematic review.
Measures of treatment effect
We will enter the outcome data for each study into the data tables in RevMan software (RevMan 2012). We will enter data as point estimates, standardised mean differences (SMDs) and their standard deviation (SD) when multiple scales are mixed, or mean and SD for continuous outcomes when the same scale is used. Should we obtain only dichotomised data for a continuous outcome, we will use these data. We will contact the authors aiming to obtain continuous data. If only effect estimates and their 95% confidence intervals (CIs) or standard errors are reported in studies we will enter these data into RevMan using the generic inverse variance method. When the results cannot be entered in either way, we will describe them in the 'Characteristics of included studies' table, or enter the data into Additional tables. We will reverse the scoring of scales if needed, so that a high score will denote the same direction (good or bad) in all outcomes. Different study designs will not be mixed. We will use STATA for calculations not possible within RevMan.
For controlled before-after studies, we will plot the outcome measurements both at baseline and follow-up to ensure that baseline imbalances are taken into account.
For cross-over studies we will examine the interaction between the order of treatments and outcome, where possible.
For time-series studies, we will extract data from original papers and re-analyse them according to recommended methods for analysis of interrupted time series designs for inclusion in systematic reviews (Ramsay 2003). These methods utilise a segmented time-series regression analysis to estimate the effect of an intervention while taking into account secular time trends and any auto-correlation between individual observations. If an included interrupted time series study uses a control group, we will use the difference in rates between the intervention and the control group as the outcome. For each study, we will fit a first-order auto regressive time-series model to the data using a modification of the parameterisation of Ramsay (Ramsay 2003). Details of the mode specification are as follows:
Y= ß0 + ß1time + ß2 (time-p) I (time > p) + ß3 I (time > p) + E, E ˜ N (0, s2).
For time = 1,...,T, where p is the time of the start of the intervention, I (time > p) is a function which takes the value 1 if time is p or later and zero otherwise, and where the errors E are assumed to follow a first order auto regressive process (AR1). The parameters ß have the following interpretation: ß1 is the pre-intervention slope. ß2 is the difference between post and pre-intervention slopes. ß3 is the change in level at the beginning of the intervention period, meaning that it is the difference between the observed level at the first intervention time point and that predicted by the pre-intervention time trend.
We will standardise the data of interrupted time series studies in order to obtain effect sizes by dividing the outcome and standard error by the pre-intervention standard deviation as recommended by Ramsay 2003. Thus we will have two separate outcomes for an interrupted time series study: the short term change in the level of outcome due to the intervention which can be interpreted as an additive effect, and the long term change in the trend in time or change of slope which indicates an increasing effect of the intervention.
Unit of analysis issues
For studies that employ a cluster-randomised design and that report sufficient data to be included in the meta-analysis but that do not make an allowance for the design effect, we will calculate the design effect based on the methods described in chapter 16.3.6 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). If no reliable estimate of the intra-cluster correlation coefficient (ICC) is available from the literature but the other data are available, we will include the trial using an ICC of 0.1. Currently the authors are unaware of ICCs for trials in shift workers. Of published ICCs, we deem the ones provided by the Health Services Research Unit to be most applicable to trials in shift workers (ICC Database). The ICCs range from 0 to 0.32, with most coefficients being below 0.1. We will follow the methods stated in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011) for the calculations.
Dealing with missing data
We will contact trial authors to obtain data not found in their reports that are needed for the assessment of risk of bias, or data for outcomes relevant to this systematic review. We will use all reports of trials in order to obtain missing data, including presentations, if found. We will use the methods presented in the Cochrane Handbook (Higgins 2011) chapter 184.108.40.206 to calculate statistics (e.g. standard deviations or correlation coefficients) that can be calculated from other values.
If possible, we will use intention-to-treat (ITT) analyses in randomised trials and similar analyses of the full group in non-randomised trials. The following information may be of relevance in order to assess the impact of missing data: distribution by intervention group and by baseline variables (age, sex etc) as well as by responses on a first questionnaire (when multiple questionnaires are used). We will examine reasons for drop-out and missing data if these data are available. We will record the methods study authors used for dealing with missing data.
Missing data may be a strong source of bias. Sensitivity analyses will examine the effect of trials with high and low rates of missing data.
We do not aim to undertake imputation of data. The extent of missing data may be so severe that even imputation may lead to biased estimates of the effect.
Assessment of heterogeneity
Within each comparison of interventions and each outcome, we will assess the homogeneity of the results of included studies based on similarity of interventions, populations, exact outcome definition, outcome timing and follow-up.
We will separately analyse studies in shift workers and persons with shift work disorder. We will explore other differences in population in subgroup analyses (see Subgroup analysis and investigation of heterogeneity).
We will consider studies to be similar enough to be combined with regard to outcome if they use the same outcome (e.g. sleepiness) measured at a similar time with regard to the shifts examined. All different ways of measuring one outcome will be considered similar enough for the primary analysis. Differences in outcome definitions will be examined in subgroup analyses. When a study reports an outcome in more than one way, we will use the subjective measure of sleepiness, sleep quality from sleep diaries and sleep length from sleep diaries in the main analysis.
With regard to interventions, we anticipate combining the following (types of) comparisons:
comparison of two different lengths of shift (e.g. 8 versus 12 hour shifts) in studies where the speed (slow, fast, fixed) and type (backward, forward, none) is the same in both groups. We will combine these studies in a primary analysis irrespective of speed and type of rotation.
comparison of backward versus forward rotation in studies where the length of shifts and the speed of rotation are the same in both groups. We will combine these studies in a primary analysis irrespective of length of shift, speed of rotation and other details of the shifts.
comparison of slow versus fast rotation in studies where the length of shifts and the direction of rotation are the same in both groups. We will combine studies if shift lengths constitute the same number of hours, irrespective of the direction of rotation and the length of shifts.
mixed comparisons, where more than one element of shift schedules differs between the two groups.
flexible and fixed shift scheduling.
different starting times of shifts.
We will test for statistical heterogeneity by means of the Chi2 test as implemented in the forest plot in Review Manager 5 software (RevMan 2012). We will use a significance level of P < 0.1 to indicate whether there is a problem with heterogeneity. Moreover, we will quantify the degree of heterogeneity using the I2 statistic, where an I2 value of 25% to 50% indicates a low degree of heterogeneity, 50% to 75% a moderate degree of heterogeneity and > 75% a high degree of heterogeneity (Higgins 2011).
If we identify substantial heterogeneity we will report it and explore possible causes by prespecified subgroup analysis.
Assessment of reporting biases
We will reduce the effect of reporting bias by including studies and not publications in order to avoid the introduction of duplicated data (i.e. two articles could represent duplicate publications of the same study). Following the Cho 2000 statement on redundant publications, we will attempt to detect duplicate studies and, if more articles report on the same study, we will extract data only once. We will prevent location bias by searching across multiple databases. We will prevent language bias by not excluding any article based on language. We will construct and analyse funnel plots to assess the likelihood of small study bias if more than five trials are included in a comparison. We will use the test proposed by Egger 1997 if we can include more than 10 trials in a comparison. We will assess selective reporting of outcomes in sensitivity analyses (see Sensitivity analysis).
We will first present results separately for randomised studies, controlled before-after studies and interrupted time series. We will pool data from studies judged to be homogeneous (see Assessment of heterogeneity) using Review Manager 5 software. If possible, we will combine studies using incidence data or for trials reporting continuous data, standardised mean differences (SMDs). To make the SMDs more readily interpretable for clinicians, we will then recalculate the pooled SMD into a mean difference by multiplying the SMD by the median standard deviation taken from included studies using the preferred scale in question. We will meta-analyse sleep length as mean differences.
When studies are found to be heterogenous (i.e. dissimilar in terms of intervention, outcome, population or follow-up time) we expect to find them also statistically heterogenous. Therefore, we consider a random-effects model more appropriate for meta-analysis. All estimates will include a 95% confidence interval (CI). For analyses not possible within Review Manager software, we will use STATA or other statistical software.
We will use the GRADE approach as described in the Cochrane Handbook (Higgins 2011) and as implemented in the GRADEPro 3.2 software (GRADEpro 2008) to present the quality of evidence and 'Summary of findings' tables.
The downgrading of the quality of a body of evidence for a specific outcome will be based on five factors:
Limitations of study,
Indirectness of evidence,
Inconsistency of results,
Imprecision of results,
The GRADE approach specifies four levels of quality (high, moderate, low and very low).
Subgroup analysis and investigation of heterogeneity
If we find several studies that have investigated similar interventions and have used the same or very similar outcome measures at the same or similar follow-up times, we will conduct the following subgroup analyses:
Chronotype (or similar). Rationale: sleep-wake disturbances on, for example, night shifts are different for people with different chronotypes or different chronobiological propensity (Erren 2013).
Shift schedule details. Rationale: details of shift schedules may influence sleepiness:
for shift length studies: different types of rotation.
for direction of rotation studies: speed of rotation and length of shift.
for speed of rotation studies: direction of rotation and length of shift.
Occupational settings or branches of industry (e.g. hospital staff). Rationale: work differs in different industries by physical and psychological strain, thus affecting sleepiness, for example.
Different ways of measuring the same outcome. Rationale: For example, actigraphy for sleep length may be more exact, yet limited to a smaller time range than sleep diaries, which give a better overall picture.
Mean or median age. Rationale: older shift workers have more experience with shift work and may have adapted better to shift work or may be persons whose chronotypes are better suited to shift work, as people not suited to shift work will have left shift work (selection effects).
We plan to conduct the following sensitivity analyses:
each of the domains of the 'Risk of bias' assessment; for selective outcome reporting each outcome by selective reporting of the other outcomes,
differences in measuring the same outcome (e.g. self reported versus physiological sleepiness)
different assumptions for imputation of missing data, different proportions of missing data
different assumptions for intra-class correlation (for cluster randomised trials)