Criteria for considering studies for this review
Types of studies
We will include individually-randomised or cluster-randomised trials. This includes randomised cross-over trials, in which individuals eventually receive both interventions but the order in which they receive the interventions is random. We will include studies reported as full-text, those published as abstract only, and unpublished data.
We will also examine laboratory trials. We define laboratory trials as trials in which recruited individuals are subject to the intervention in a laboratory setting, not in their respective workplace. We will present data from laboratory studies in a separate table and use the data for comparison in the Discussion section but not for drawing conclusions on intervention effects.
Types of participants
We will include studies conducted with adult workers engaged in shift work schedules that include night shift work, irrespective of industry, country, age or co-morbidities, who have not actively sought help for sleep disturbances. These studies address the 'prevention of sleep disturbances' aspect of this review. We will also include studies conducted with shift workers who suffer from sleep disorders and who have sought medical assistance for them. These studies address the 'treatment of sleep disturbances' aspect of this review.
We will also include laboratory studies conducted with persons in simulated night shift work in a table in the Discussion section (see above).
Types of interventions
We will include trials comparing any person-directed, non-pharmacological intervention with any other intervention. We anticipate the following types of interventions:
use of bright light (˜10000 lux for 30 minutes) off shift
use of bright light during shift
use of dim light (˜100 lux) during shifts
use of dark goggles during shift
use of dark goggles off shift
napping before shift
napping during shift
exercise during shift
exercise off shift
information about amount and timing of sleep
information about sleeping conditions
other lifestyle education (e.g. exercise, dietary information)
For the light and goggle interventions we will distinguish between timing (e.g. beginning, middle and end of night shift), wavelengths (white light versus limited range of wavelengths) and intensity (strength and duration) of light or amount of light transmitted through the goggles.
We will include studies comparing any person-directed, non-pharmacological interventions with no intervention or an alternative intervention.
Types of outcome measures
Sleep-wake disturbance associated with shift work is at the root of health problems of shift workers. In order to characterise the sleep-wake disturbance, we will include studies that have measured intervention effects with the following outcome measures:
Sleep quality off shift: measured with a validated questionnaire such as the Bergen Insomnia scale (Pallesen 2008), Pittsburg Sleep Quality Index (Buysse 1989), Basic Nordic Sleep Questionnaire (Partinen 1995), Jenkins Sleep Questionnaire (Lallukka 2011), Karolinska Sleepiness Scale (Kaida 2006), relevant questions in the Standard Shift Work Index and wrist-worn actigraphy-based measures. These questionnaires measure more than just sleep quality. However we are only interested in the questions relating to sleep quality and sleep length. We will also accept sleep quality as measured in sleep diaries.
Sleep length off shift: Average length of sleep based on the relevant questions in validated questionnaires (see examples above) or on sleep diaries, or wrist-worn actigraphy.
Sleepiness during shift: Sleepiness measured at the beginning, middle and end of the shift. The disadvantages of sleepiness outcomes is that they are measured at specific time points and do not give us overall measures for sleepiness. Sleepiness can be operationalised as:
self-rated (subjective) sleepiness measured with a validated questionnaire such as the Karolinska Sleepiness Scale (Kaida 2006), Stanford Sleepiness Scale (Herscovitch 1981; Hoddes 1972), relevant questions in the Standard Shift Work Index (Barton 1995), or other visual analogue scales, or
physiological sleepiness measured by electrophysiological methods while working (e.g., electroencephalogram or electro-oculogram measurement while driving a train) or by standardised physiological tests of sleepiness such as the Multiple Sleep Latency Test (Carskadon 1986), the Maintenance of Wakefulness Test (Mitler 1982) or the pupillometric assessment, or
behavioural sleepiness measured as performance in a validated vigilance test such as the Psychomotor Vigilance Test (e.g. Basner 2011, Thorne 2005), the MackWorth Clock Test (Mackworth 1950), or single or multiple choice reaction time tests, or
behavioural sleepiness measured as characteristics of overt behaviour that are identified through video recording methods such as an Observer Rating of Drowsiness (e.g. Wierwille 1994), or PERCLOS (percentage of eyelid closure) (Dinges 1998; Sommer 2010).
Fatigue usually refers to exhaustion or tiredness due to long-lasting exertion. However, because there are some differences in the use of these terms in different countries (e.g. between Europe and Australia), we will also include fatigue as an outcome measure when it is used as a measure of sleepiness at work. Therefore, we will include studies measuring fatigue at any moment during the shift as a self-reported outcome measured with a validated questionnaire or interview.
In studies that report primary outcomes for this review we will examine the following secondary outcomes:
costs for lighting interventions (e.g. initial and running costs of the lighting equipment),
costs for napping interventions (e.g. number of staff and costs for covering the time when individuals sleep).
A full cost-effectiveness analysis is beyond the scope of this review, as it would require information not only on our primary outcomes and their 'value' (e.g. willingness to pay) but also of potential adverse effects of shift systems such as errors or injuries and their costs and 'values'. Errors and injuries in shift workers are being evaluated in another protocol (Ker 2009).
Search methods for identification of studies
We will search the following databases:
Cochrane Central Register of Controlled Trials (CENTRAL, The Cochrane Library)
Web of Knowledge (http://isiknowledge.com/ )
ProQuest Dissertations & Theses (http://www.proquest.co.uk)
We present search strategies for the first six databases as Appendix 1; Appendix 2; Appendix 3; Appendix 4; Appendix 5 and Appendix 6. As we will conduct another Cochrane review assessing the effects of shift schedule interventions (Erren 2013a) in conjunction with this one, we will run only one joint systematic search to avoid needless duplication of work. We will search Proquest using subject headings and keywords only.
Since the search term 'shift' alone leads to a very high number of citations, we have included many relevant combinations of the term 'shift' with other terms used to describe specific shifts. Examples are shift work, night shift, shift schedule and graveyard shift. We also account for terms that describe shift work, but do not use the word 'shift' such as duty time or hours (e.g. transport industry), rota (medicine) or the 4-day week alias compressed work week used to denote a series of 12-hour shifts. The search is limited by terms for different outcomes or types of interventions. Since only abstracts are searched, we have included terms near, but not exactly covered by the inclusion criteria. A second limitation is by type of trial (not for all databases).
Searching other resources
We will check reference lists of all primary studies and review articles for additional references. We will contact experts in the field to identify additional unpublished materials. We will search the conference proceedings of the biannual symposium on shift and night work. We will search the World Health Organisation Trial Register (www.who.int/ictrp/) as well as the most important trial registers within this register directly (www.clinicaltrials.gov, https://www.clinical trialsregister.eu/).
Data collection and analysis
Selection of studies
Two review authors (CH, MK) will independently screen titles and abstracts of all the studies we identify as a result of the search and code them as 'retrieve' (eligible or potentially eligible/unclear) or 'do not retrieve'. We will retrieve the full-text study reports and two review authors (CH, MK) will independently screen them for inclusion. They will also identify and record reasons for the exclusion of ineligible studies. We will resolve any disagreement through discussion or, if required, we will consult a third person (TE). We will identify and exclude duplicates and collate multiple reports of the same study so that we include studies rather than reports of studies in the review. We will record the selection process in sufficient detail to complete a PRISMA flow diagram and 'Characteristics of excluded studies' table. We will also seek to obtain further information from the study authors when a paper is found to contain insufficient information to enable us to reach a decision on eligibility.
Data extraction and management
Two review authors (CH, MK) will independently extract data from each of the included trials. We will extract the following information and present it in the review:
methods: type of trial (randomised controlled trial or cluster randomised trial, randomised cross-over trial), allocation, inclusion criteria, statistical analysis,
basic information: country, dates of study (beginning and end of allocation or study), duration of study, number of participants, number of participants evaluated, information about shift schedules,
basic information about the participants: age, sex, occupations, chronotype (morningness-eveningness score or similar),
intervention: details of interventions being compared, other interventions performed at the same time,
which outcomes were measured, their definitions, which outcomes are reported,
outcome data for the outcomes relevant to this systematic review, and
funding for trial, and notable conflicts of interest of trial authors.
For randomised laboratory studies we will briefly extract the following:
details of the interventions compared, including any interventions performed at the same time in both groups,
number of participants,
country and duration of the trial,
which outcomes were measured and how, and
results of outcomes relevant to this review.
Assessment of risk of bias in included studies
Two review authors per study (CH, MK, TD, RF, LF, JL, MS) will independently assess the risk of bias of the included studies. We will consult a third review author (TE) when disagreements occur, to make the final judgment. If information is absent for evaluation of the methodological criteria, we will contact the trial authors to request additional information. Where possible, we will use quotes from the text to support our judgements about the individual 'Risk of bias' items.
We will use the 'Risk of bias' tool described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011) as implemented in Review Manager 5 (RevMan 2012). We will rate each domain as having a high, low or unclear risk of bias. We will rate a domain as having an unclear risk of bias if there is insufficient information to evaluate the domain.
We will evaluate the following domains (taken, added to, and modified where applicable from Higgins 2011):
random sequence generation:
We will consider trials to have a low risk of bias if they describe a random element in sequence generation such as:
using a random number table,
using a computer random number generator,
shuffling cards or envelopes,
We will consider trials to have a high risk of bias if the authors describe using:
odd or even date of birth,
or a rule based on e.g. work record number.
We will consider trials to have a low risk of bias if they report:
central allocation (including telephone, web-based and pharmacy-controlled randomisation),
or sequentially-numbered, opaque, sealed envelopes.
We will consider trials to have a high risk of bias if they report using:
an open random allocation schedule (e.g. a list of random numbers), or assignment envelopes without appropriate safeguards (e.g. if envelopes were unsealed or nonopaque or not sequentially numbered),
alternation or rotation,
date of birth or record number, or
any other explicitly unconcealed procedure.
blinding of participants and personnel:
We will consider trials to have a low risk of bias when:
the blinding of participants and key study personnel was ensured and it is unlikely that the blinding could have been broken, or
authors do not report about blinding or report lack of or incomplete blinding, but we judge that the outcome is not likely to be influenced by lack of blinding.
We will consider trials to have a high risk of bias when:
there was no blinding or incomplete blinding, and the outcome is likely to be influenced by lack of blinding, or
authors attempted to blind participants and personnel, but it is likely that the blinding could have been broken, and the outcome is likely to be influenced by lack of blinding.
blinding of outcome assessors (evaluated for each outcome separately)
We will consider trials to have a low risk of bias if:
there is no blinding of outcome assessment, but the review authors judge that the outcome measurement is not likely to be influenced by lack of blinding, or
blinding of outcome assessment is ensured, and it is unlikely that the blinding could have been broken.
We will consider trials to have a high risk of bias if:
there is no blinding of outcome assessment, and the outcome measurement is likely to be influenced by lack of blinding, or
there is blinding of outcome assessment, but it is considered likely that the blinding could have been broken, and the outcome measurement is likely to be influenced by lack of blinding.
incomplete outcome data (evaluated for each outcome separately)
We will consider trials to have a low risk of bias if:
there are no missing outcome data,
reasons for missing outcome data are unlikely to be related to true outcome,
missing outcome data are balanced in numbers across intervention groups, with similar reasons for missing data across groups,
in dichotomous outcome data, the proportion of missing outcomes compared with the observed event risk is not large enough to have a clinically-relevant impact on the intervention effect estimate,
in continuous outcome data, a plausible effect size (difference in means or standardised difference in means) among missing outcomes is not large enough to have a clinically-relevant impact on observed effect size, or
missing data have been imputed using appropriate methods.
We will consider trials to have a high risk of bias if:
the reason for missing outcome data is likely to be related to true outcome, with either imbalance in numbers or reasons for missing data across intervention groups,
in dichotomous outcome data, the proportion of missing outcomes compared with observed event risk is large enough to induce clinically-relevant bias in the intervention effect estimate,
in continuous outcome data, the plausible effect size (difference in means or standardised difference in means) among missing outcomes is large enough to induce clinically-relevant bias in observed effect size,
‘as-treated’ analysis done with substantial departure of the number of participants assigned at randomisation (or beginning of the trial),
there is potentially inappropriate application of simple imputation, or
in cluster randomised trials loss of full clusters is likely to introduce bias.
selective outcome reporting
We will consider trials to have a low risk of bias if:
the study protocol is available and all of the study’s pre-specified (primary and secondary) outcomes that are of interest in the review have been reported in the pre-specified way,
or the study protocol is not available but it is clear that the published reports include all expected outcomes, including those that were pre-specified (convincing text of this nature may be uncommon).
We will consider trials to have a high risk of bias if:
not all of the study’s pre-specified primary outcomes have been reported,
one or more primary outcomes is reported using measurements, analysis methods or subsets of the data (e.g. sub scales) that were not pre-specified,
one or more reported primary outcomes were not pre-specified (unless clear justification for their reporting is provided, such as an unexpected adverse effect),
one or more outcomes of interest in the review are reported incompletely so that they cannot be entered in a meta-analysis, or
the study report fails to include results for a key outcome that would be expected to have been reported for such a study.
outcome reliable or objectively measured (for each of the outcomes relevant to the review)
We will consider the outcomes to have a low risk of bias if the outcome is measured objectively (e.g. psychomotor vigilance test) or two or more raters have an agreement > 90% or kappa => 0.8.
We will consider the outcomes to have a high risk of bias if two or more raters have an agreement < 90% or kappa < 0.8.
other sources of bias. We will mention any other sources of bias identified in this field.
Randomised cross-over trials
We will assess all items for randomised controlled trials and in addition we will assess the following domain. We will report 'unclear' risk of bias if there is insufficient information to evaluate the domain.
Assessment of bias in conducting the systematic review
We will conduct the review according to this published protocol and report any deviations from it in the 'Differences between protocol and review' section of the systematic review.
Measures of treatment effect
We will enter the outcome data for each study into the data tables in RevMan (RevMan 2012). We will enter data as point estimates, standardised mean differences (SMDs) and their standard deviation (SD) when multiple scales are mixed, or mean and SD for continuous outcomes when the same scale is used. Should authors have dichotomised the relevant continuous outcomes we will use the data types presented by the authors if we are unable to obtain the data as continuous data. If only effect estimates and their 95% confidence intervals (CIs) or standard errors are reported in studies we will enter these data into RevMan using the generic inverse variance method. When the results cannot be entered in either way, we will enter them into Additional tables. We will reverse the scoring of scales if needed, so that a high score will denote the same direction (good or bad) in all outcomes. We will not mix different study designs. We will use STATA for calculations not possible within RevMan.
Unit of analysis issues
For studies that employ a cluster-randomised design and report sufficient data to be included in the meta-analysis but do not make an allowance for the design effect, we will calculate the design effect based on the methods described in chapter 16.3.6 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). Should no reliable estimate of the intra-cluster correlation coefficient (ICC) be available from the literature but the other data be available, we will include the trial using an ICC of 0.1 . Currently the authors are unaware of ICCs for trials in shift workers. Of published ICCs, we deem the ones from the Health Services Research Unit (ICC Database) most applicable to trials in shift workers. The ICCs range from 0 to 0.32, with most coefficients being below 0.1. Higgins 2011
Dealing with missing data
We will contact trial authors to obtain data not found in their reports that are needed for the assessment of risk of bias, or data for outcomes relevant to this systematic review. We will use all reports of trials in order to obtain missing data, including presentations, if found. We will use the methods presented in the Cochrane Handbook (Higgins 2011) chapter 22.214.171.124 to calculate statistics (e.g. standard deviations or correlation coefficients) that can be calculated from other values.
If possible, we will use intention-to-treat analyses in randomised trials. The following information may be of relevance in order to assess the impact of missing data: distribution by intervention group and by baseline variables (age, sex, etc) as well as by responses on a first questionnaire (when multiple questionnaires are used). We will examine reasons for drop-out and missing data if these data are available. We will record the methods study authors used for dealing with missing data.
Missing data may be a strong source of bias. Sensitivity analyses will examine the effect of trials with high and low numbers of missing data.
We aim not to undertake imputation of data. The extent of missing data may be so severe that even imputation may lead to biased estimates of the effect.
Assessment of heterogeneity
Within each comparison of interventions and each outcome, we will assess the homogeneity of the results of included studies based on similarity of interventions, populations, exact outcome definitions, outcome timing and follow-up.
We will separately analyse studies in shift workers and persons with shift work disorder. We will explore other differences in population in subgroup analyses (see Subgroup analysis and investigation of heterogeneity).
We will consider studies to be similar enough to be combined with regard to outcome if they use the same outcome (e.g. sleepiness) measured at a similar time with regard to the shifts examined. All of the different ways of measuring one outcome will be considered similar enough for the primary analysis. Differences in outcome definitions will be examined in subgroup analyses. When a study reports an outcome in more than one way we will use the subjective measure of sleepiness, sleep quality from sleep diaries and sleep length from sleep diaries in the main analysis.
We anticipate several categories of interventions (light, goggles, exercise, educational interventions). For light or goggle interventions we will consider the time of day, duration of light, strength and wavelengths of light (or similar for goggles). We will consider educational interventions to be similar enough provided they address similar topics (e.g. sleep times with regard to shift, sleep conditions, exercise) and have a similar duration. We will combine all exercise interventions.
In addition, we will test for statistical heterogeneity by means of the Chi2 test as implemented in the forest plot in Review Manager 5 software (RevMan 2012). We will use a significance level of P < 0.1 to indicate whether there is a problem with heterogeneity. Moreover, we will quantify the degree of heterogeneity using the I2 statistic, where an I2 value of 25% to 50% indicates a low degree of heterogeneity, 50% to 75% a moderate degree of heterogeneity and > 75% a high degree of heterogeneity (Higgins 2011). If we identify substantial heterogeneity we will report it and explore possible causes by prespecified subgroup analysis.
Assessment of reporting biases
We will reduce the effect of reporting bias by including studies and not publications in order to avoid the introduction of duplicated data (i.e. two articles could represent duplicate publications of the same study). Following the Cho 2000 statement on redundant publications, we will attempt to detect duplicate studies and, if more articles report on the same study, we will extract data only once. We will prevent location bias by searching across multiple databases. We will prevent language bias by not excluding any article based on language. We will construct and analyse funnel plots to assess the likelihood of publication bias if more than five trials are included in a comparison. We will use the test proposed by Egger 1997 if we include more than 10 trials in a comparison. We will assess selective reporting of outcomes in sensitivity analyses (see Sensitivity analysis).
We will pool data from studies judged to be homogeneous enough (see Assessment of heterogeneity) using Review Manager 5 software. If possible, we will combine studies using incidence data or for trials reporting continuous data, standardised mean differences (SMDs). To make the SMDs more readily interpretable for clinicians, we will then recalculate the pooled SMD into a mean difference by multiplying the SMD by the median standard deviation taken from included studies using the preferred scale in question. We will meta-analyse sleep length as mean differences.
When studies are found to be heterogenous (i.e. dissimilar in terms of intervention, outcome, population or follow-up time) we expect to find them also statistically heterogenous. Therefore, we consider a random-effects model to be more appropriate for meta-analysis. All estimates will include a 95% CI. For analyses not possible within Review Manager we will use STATA or other statistical software.
We will use the GRADE approach as described in theCochrane Handbook (Higgins 2011) and as implemented in the GRADEPro 3.2 software (GRADEpro 2008) to present the quality of evidence and 'Summary of findings' tables.
The downgrading of the quality of a body of evidence for a specific outcome will be based on five factors:
Limitations of study.
Indirectness of evidence.
Inconsistency of results.
Imprecision of results.
The GRADE approach specifies four levels of quality (high, moderate, low and very low).
Subgroup analysis and investigation of heterogeneity
If we find several studies that have investigated similar interventions and have used the same or very similar outcome measures at the same or similar follow-up times, we will conduct the following subgroup analyses:
Chronotype (or similar). Rationale: sleep-wake disturbances on, for example, night shifts are different for people with different chronotypes or different chronobiological propensity (Erren 2013).
Differences in the intervention: Details of, for example, light scheduling may influence sleepiness. occupational settings or branches of industry (e.g. hospital staff). Rationale: work differs in different industries by physical and psychological strain, thus affecting sleepiness for example.
Different ways of measuring the same outcome. Rationale: for example, actigraphy for sleep length may be more exact, yet limited to a smaller time range than sleep diaries, which give a better overall picture.
Mean or median age. Rationale: older shift workers have more experience with shift work and may have adapted better to shift work or may be persons whose chronotypes are better suited to shift work, as people not suited to shift work will have left shift work (selection effects)
If possible we will conduct the following sensitivity analyses:
each of the domains of the 'Risk of bias' assessment; for selective outcome reporting each outcome by selective outcome reporting of the other outcomes
different assumptions for imputation of missing data, different proportions of missing data
different assumptions for ICCs (for cluster-randomised trials)