Efficacy, acceptability, and tolerability of all available treatments for insomnia in the elderly: a systematic review and network meta‐analysis

Symptoms of insomnia are highly prevalent in the elderly. A significant number of pharmacological and non‐pharmacological interventions exist, but, up‐to‐date, their comparative efficacy and safety has not been sufficiently assessed.


Introduction
Approximately 50% of older adults complain about symptoms of insomnia (1). Insomnia leads to reduced quality of life (2,3), impairments in psychosocial and cognitive functioning, facilitates other mental disorders like depressive disorders or substance abuse (4,5), and may increase the risk for cardiovascular and metabolic diseases (6)(7)(8)(9). It is known that people suffering from insomnia have higher use of healthcare services (10) and cause higher costs thereby (11).
A broad range of pharmacological and nonpharmacological interventions for sleep disorders exists. Sedating drugs such as benzodiazepines or the so-called z-drugs are still used very frequently in the elderly population although the choice of substance has changed over the last decades and new substances have become available (12). Older people, with their age-related changes in brain structure and drug metabolism and their high rate of comorbidities are especially susceptible to adverse events of these substances. Adverse events related to insomnia and sedating drugs are often severe and include risk of falls and fractures, oversedation, and confusion (13)(14)(15)(16). Therefore, several authors suggest that non-pharmacological interventions should be considered as first-line treatment options for insomnia in the elderly (17). Non-pharmacological treatment options include different approaches such as sleep hygiene, relaxation techniques, or cognitive behavioral therapy that attempt to modify sleep-related cognitions and behaviors (18), but also other interventions such as acupuncture, music therapy, bright-light therapy, or yoga.
The comparative efficacy and safety of this variety of newer and older pharmacological and nonpharmacological interventions has not been sufficiently assessed yet (12,19). Few meta-analytical evaluations exist, and these have been published at least ten years ago (20,21); thus, for most interventions, an efficacy and safety appraisal is not available.

Aims of the study
It is currently unclear which of the available interventions should be preferred in terms of efficacy and safety for the treatment of insomnia in the elderly. Therefore, we decided to conduct a comprehensive systematic review of all currently available treatment options and assess their relative effects via network meta-analysis (NMA).

Materials and methods
An a priori written study protocol was published in PROSPERO [number: CRD42018106411] and can be found in Appendix S2.

Participants and interventions
Our analysis included all randomized controlled trials (RCTs) that examined treatment options for insomnia in elderly patients (>65 years). All available interventions were included. Minimum duration of RCTs was set at 5 days for drug interventions; for non-drug interventions, the study duration criterion did not apply. No maximum duration of RCTs was set.

Search strategy and selection criteria
We identified RCTs in elderly patients with insomnia through a comprehensive, systematic literature search in MEDLINE, Embase, PsycINFO, Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews (CDSR), ClinicalTrials.gov, and WHO ICTRP up to May 25, 2019 (Appendix S3). Moreover, we inspected the reference lists of the included studies and previous reviews on the same topic (20,21). We excluded cluster-randomized trials (22). Studies that demonstrated a high risk of bias for sequence generation or allocation concealment were excluded (23). If a trial was described as double-blind but randomization was not explicitly mentioned, we assumed that study participants were randomized, and we excluded the trial in a sensitivity analysis. Risk of bias in the included studies was independently assessed by two reviewers (M.T.S. and M.H.), using the Cochrane collaboration's risk-of-bias tool (23). We sent emails to the first and corresponding authors of all included studies to ask for missing data.

Outcome measures and data extraction
The primary outcomes were (a) nocturnal total sleep time measured in minutes and (b) sleep quality as measured by any validated self-rating scale, for example, the Pittsburgh Sleep Quality Index (24) or the Insomnia Severity Index (25).
Secondary outcomes were sleep onset latency defined as the time taken to fall asleep measured in minutes; number of nocturnal awakenings; nocturnal time awake after sleep onset defined as the total minutes spent awake after sleep onset until the end of sleep; daytime impairment measured by performance tasks and self-reports such as the Epworth Sleepiness Scale (26) or the Stanford Sleepiness Scale (27); subjective well-being/quality of life measured by any validated scale such as Short Form-36 (SF-36) (28); polysomnographic or actigraphic recordings of the primary outcome 'nocturnal sleep time total' measured in minutes; dropouts due to any reason and due to adverse events; total number of adverse events; and the occurrence of important adverse events such as sedation and subsequent impaired daytime-functioning, risk of falls, paradoxical drug reactions, for example, agitation or anxiety, dependency, cardiovascular adverse effects (AEs), hematological AEs, and endocrinological AEs.
When authors of original studies used imputation methods to handle missing data, these were preferred over completers' data. In crossover trials, we used the first crossover phase to avoid the problem of carryover effects (29) if possible; otherwise, we included the results as presented by the authors if there was an adequate washout period between the different phases, defined as a minimum of 5 times the elimination half-life of each drug (30). Study selection and data extraction were performed independently by at least two reviewers (M.T.S., M.H.). Missing SDs were estimated from P values or substituted by the mean SD of the other included studies.

Statistical analysis
We performed pairwise meta-analyses and NMAs in a frequentist setting using the R packages meta (31) and netmeta (32). We used the random effects model and assumed common heterogeneity across all comparisons. For continuous outcomes, we primarily used absolute numbers, for example, total sleep time in minutes, and presented them as mean differences (MDs). If different scales were used, as for the assessment of sleep quality, the effect sizes were calculated as Hedge's g standardized mean differences (SMDs). For binary outcomes, the effect sizes were calculated as odds ratios (ORs). Both types of effect sizes were presented in league tables with their 95% confidence intervals (CIs). To avoid redundancy with the tables, only large and relatively precise associations are discussed in the text. When possible, statistical inconsistency was evaluated using the SIDE test that separates indirect and direct estimates (33) and the designby-treatment interaction test to assess global inconsistency in the network (34,35). We assessed the plausibility of the transitivity assumption by examining if there were any important differences between treatments in key study characteristics that could be effect modifiers and we have planned several subgroup analyses and meta-regressions. The variables considered a priori were as follows: (i) percentage of female participants, (ii) baseline severity of the primary outcomes, (iii) study duration and (iv) sponsoring of pharmaceutical industry and allegiance bias, that is, whether the inventors of psychotherapy are also investigators of a trial, and (v) in-patients versus out-patients. Similarly, the following sensitivity analyses on the primary outcomes were planned a priori: (i) fixed effects instead of random effects model, (ii) exclusion of open-label and single-blind studies, (iii) exclusion of studies that did not use operationalized criteria to diagnose insomnia, (iv) exclusion of studies that presented only completers analyses, (v) exclusion of studies with high risk of bias in blinding, missing outcome data, selective reporting and other biases, (vi) additional inclusion of studies involving patients with secondary insomnia or patients with severe somatic or psychiatric conditions, as long as not all patients had the same underlying disorder. We also re-analyzed the two primary outcomes after grouping interventions in their classes (e.g., benzodiazepines, antipsychotics, antidepressants, psychotherapeutic interventions).
We planned to investigate the presence of smallstudy effects for the primary outcomes by using a comparison-adjusted funnel plot (36,37). We assessed the confidence in estimates of the primary outcomes with the Confidence in Network Meta-Analysis (CINeMA) framework and web application (38)(39)(40).

Description of included studies
We identified 53 RCTs with 6832 unique participants published from 1980 to 2019 through the literature search. The PRISMA flowchart is shown in Fig. 1 and with details of all included studies in Appendix S6. Of 43 studies that examined the efficacy of one or more drugs, 31 had a placebo arm. The drug involved in most comparisons was melatonin (seven trials), followed by nitrazepam, triazolam, and zolpidem (five trials each), whereas few trials were available for most other drugs. The remaining 10 RCTs examined the efficacy of other interventions such as acupressure, auricular acupuncture, magneto-auriculotherapy (MAT), laser auriculotherapy (LAT), brief behavioral therapy, hand bath plus massage, massage, mindfulness-based stress reduction program, tart cherry juice, and therapeutic touch. These non-drug interventions were disconnected from the network for all outcomes; therefore, only the results of their pairwise meta-analyses are presented. For drug trials, the network plots of eligible comparisons for the primary outcomes are presented in Figs 2 and 3a and b (see below); network plots for secondary outcomes are presented in Appendix S11. For each outcome, some of the drugs, although included in the systematic review, were not included in the network meta-analysis because either they were not connected to the network or they had no usable data. For all interventions, pairwise meta-analytic results are presented in Appendix S10.
Of 5209 patients with sex indicated, 3300 were women (63.4%). The mean (SD) age of participants was 75.2 (4.2) years. The median trial duration was 21 days (range of 3 to 168). The assessment of risk for bias is presented in Appendix S7. The trial reports often did not provide details about randomization procedures and allocation concealment; three studies were single-blind, three were open-label, one used a singleblind design for two arms and open-label design for the third arm, and the remaining studies were double-blind. The mean drop-out rate was 8.4% for the studies included in our systematic review, and we found indication for high risk of bias for selective reporting in 23 studies (44.9%).

Primary outcomes of NMA
The results of the pairwise meta-analyses and the NMA for total sleep time are summarized in Table 1. The NMA results were in accordance with the pairwise results, when the later was available. Compared to placebo, total sleep time was on average 62 min longer in patients in food supplement (i.e., a specific combination containing 5 mg melatonin, 225 mg magnesium, and 11.25 mg zinc) (41), 50 min longer in patients in diazepam, 40 min longer in promethazine and propiomazine, 36 min longer in temazepam, 31 min longer in doxepin, and 24 min longer in eszopiclone.
Nevertheless, the majority of the estimates were uncertain because there were few studies per intervention and even fewer closed loops.

Consistency of the network and confidence in the estimates (CINeMA)
Inconsistency of the networks was not measurable since there were no or just one or two closed loops of evidence in each network. The few studies available per comparison did not allow firm conclusions about the absence of imbalance in effect modification and in most comparisons only one study was available (Appendix S4). Consequently, the plausibility of the transitivity assumption could not be evaluated. The assessment of confidence in the estimates using CINeMA was very low, primarily due to within-study bias and across-studies bias, imprecision, and the inability to evaluate the synthesis assumptions (incoherence) (Appendix S12).

Secondary outcomes of NMA
Sleep onset latency. The league table is presented in Appendix S9.1. Ordered from the most to the least effective, diazepam, propiomazine, promethazine, doxepin, eszopiclone, temazepam, chlormethiazole, ramelteon, and suvorexant performed better, with their MDs ranging from À25 to À7 min shorter time taken to fall asleep compared to placebo, but there was much uncertainty.
Number of nocturnal awakenings. The league table is presented in Appendix S9.2. From the most to the least effective, zolpidem, temazepam, propiomazine, and diphenhydramine outperformed placebo, with their MDs ranging from À0.96 to À0.30, but estimates were very uncertain.
Nocturnal time awake after sleep onset. From the most to the least effective, suvorexant, melatonin, esmirtazapine, doxepin, zolpidem, and eszopiclone were associated with less nocturnal time awake ranging from À24 to À12 min compared to placebo (Appendix S9.3).
Daytime impairment. For this outcome, two different subnetworks were formed (Appendices S9.4a and b). In the first subnetwork, propiomazine performed better than the other interventions (Appendix S9.4a). In the second subnetwork, no important differences were detected due to large uncertainty (Appendix S9.4b).
Quality of life. Only three studies provided useful data for this outcome. Food supplement performed better than placebo with an SMD of 0.61, but there was large uncertainty (Appendix S9.5).
Drop-outs due to any reason and due to adverse events. No important differences were detected in drop-outs due to any reason or due to adverse events between any of the interventions either in the pairwise meta-analyses or in the NMA due to large uncertainty (Appendices S9.7 and S9.8).
Total number of patients with adverse events and important individual adverse events. The small number of studies and the large uncertainty in the results did not enable firm conclusions for these outcomes as presented in Appendices S9.9-S9.13b. Only for sedation, there was some evidence that esmirtazapine performed worse than chlormethiazole, eszopiclone, and placebo (Appendix S9.14).
Pairwise meta-analyses of non-pharmacological interventions. The results of all pairwise meta-analyses of all outcomes are presented in Appendix S10. For non-pharmacological interventions, nine out of ten included studies provided usable data. Acupressure performed better than sham acupressure in terms of sleep quality (SMD À1.58) and quality of life (SMD 5.09); auricular acupuncture performed better than its control in terms of sleep quality (SMD À147.29) without showing more drop-outs, but the quality of the single trial was very low and the sample size small (n = 44) making the results unreliable; no important differences were detected between brief behavioral therapy and self-monitoring in terms of total sleep time, sleep quality, sleep onset latency, waking after sleep onset, total sleep time by polysomnography,  number of nocturnal awakenings, and total dropouts; similarly, no important differences were detected between laser auriculotherapy (LAT), magneto-auriculotherapy (MAT), and their combination in any of the examined outcomes (e.g., total sleep time, sleep quality, sleep onset latency, waking after sleep onset, total sleep time by actigraphy, quality of life, and total drop-outs); massage had comparable drop-outs with no intervention; mindfulness-based stress reduction program performed better than waitlist in terms of sleep quality (SMD À1.04) and daytime impairment (SMD À0.62), with no observed difference in drop-outs; tart cherry juice performed better than artificial juice in terms of sleep quality (SMD À0.51) and time awake after sleep onset (MD À17.00 min), but no differences were detected in terms of total sleep time, sleep onset latency, and drop-outs; no important differences were detected between therapeutic touch, mimic therapeutic touch, and no intervention in terms of sleep quality. Side-effects were not reported in any of the non-pharmacological interventions.
Small-study effects and publication bias. Comparison-adjusted funnel-plots were not produced as they would not be meaningful because comparisons included three studies at most.

Discussion
This is the first evidence synthesis which evaluated all pharmacological and non-pharmacological interventions in older patients with insomnia should be read from left to right and the estimate is in the cell in common between the column-defining treatment and the row-defining treatment. In the left lower half, weighted mean differences (MDs) higher than 0 favor the column-defining treatment, in the upper right half MDs higher than 0 favor the row-defining treatment. Cells in bold print indicate significant results. Food supplement refers to a specific combination of melatonin, magnesium, and zinc. NA=not available. should be read from left to right and the estimate is in the cell in common between the column-defining treatment and the row-defining treatment. In the left lower half, standardized mean differences (SMDs) lower than 0 favor the column-defining treatment, in the upper right half SMDs lower than 0 favor the row-defining treatment. Cells in bold print indicate significant results. Food supplement refers to a specific combination of melatonin, magnesium, and zinc. NA=not available.
including 53 studies, a large number of patients (i.e., 6832), and assessing a total of 16 efficacy and safety outcomes. For the pharmacological interventions, we found that food supplement (i.e., containing melatonin, magnesium, and zinc), propiomazine, temazepam, doxepin, and eszopiclone improved total sleep time and sleep quality more than placebo. Diazepam and promethazine were better than placebo in sleep duration and melatonin in sleep quality.
Few trials examined total sleep time with nonpharmacological interventions and none found an important effect. In terms of sleep quality, which could be considered as a more subjective outcome, acupressure, auricular acupuncture, mindfulnessbased stress reduction program, and tart cherry juice performed better than their controls.
Regarding safety, no differences were detected among interventions in any of the various outcomes due to small sample sizes and resulting large uncertainty; only for sedation, esmirtazapine was worse than chlormethiazole, eszopiclone, and placebo.
In our meta-analysis, broad inclusion criteria were applied for example insomnia or sleep disorder as defined by the authors of individual studies as long as all patients did not suffer from the same comorbid medical condition, which is particularly important for reasons of transitivity when conducting a network meta-analysis. In addition, and in contrast to previous reviews on the same topic which also included younger patients (42)(43)(44)(45), only patients older than 65 were selected. In many countries, the age of 65 is associated with marked changes in life such as retirement or a loss of close relatives. Moreover, nowadays, the physical health of patients in their late 50s or early 60s often does not differ from 'general' adults. In order to examine older patients who are clearly different from 'general' adults (usually defined in studies as 18-65 years), we focused on an age group which would be classified as geriatric by recent definitions (46) including mainly an age of over 70. While such patients are usually excluded from studies in the general population, the mean age of the included studies in our meta-analysis was 75.2 years with a range of 68-87.
Owing to several limitations, our findings are not definitive. First of all, few comparisons had data from more than one trial limiting the robustness of the results. Due to the low reliability of the results, treatment hierarchies were not produced. The trials in the network were not as well linked (Figs 2 and 3, and Appendix S11) as in other NMAs (47) and consistency could not be assessed since there were none or very few closed loops per network. Furthermore, few trials had a duration longer than 3 weeks. In our analysis, although many efficacy differences were shown, no differences in drop-outs or side-effects were identified, but it is possible that trials of longer duration would be needed for this reason. Up to 2005, FDA recommended that drug treatment for insomnia should not exceed 4 weeks, but since 2005, treatment duration is not addressed (48). Also, nonpharmacological treatments were disconnected from the network and the available evidence was scarce and of questionable quality. Moreover, many other drug interventions, such as bromazepam, mirtazapine, and quetiapine which are routinely used to treat insomnia in the elderly, had no available RCT. Finally, results from a metaanalysis cannot be better than those of the studies included. In our NMA, reporting bias was present in a considerable number of studies highlighting that one of the intrinsic difficulties of the insomnia literature is the multitude of possible outcomes, including subjective and polysomnographic ones. There is an inevitable necessity of agreedupon core outcome sets in the studies of insomnia, so that selective outcome reporting will be discouraged and the more adequate evidence synthesis becomes possible.
At present, insufficient evidence exists on which intervention is more efficacious for elderly patients with insomnia. Cognitive behavioral therapy is the standard first-line treatment for all adults with insomnia, and especially in the elderly population is considered preferable since it is considered to have fewer side-effects (49), but evidence from Treatments are presented in an alphabetical order. Results of the network meta-analysis are presented in the left lower half and results from pairwise comparisons in the upper right half, if available. Comparisons between treatments should be read from left to right and the estimate is in the cell in common between the column-defining treatment and the row-defining treatment. In the left lower half, standardized mean differences (SMDs) lower than 0 favor the column-defining treatment, in the upper right half SMDs lower than 0 favor the row-defining treatment. Cells in bold print indicate significant results NA=not available.
RCTs is lacking. Our analysis suggests that more trials, of longer duration, examining more interventions and several outcomes are warranted. The evidence contributing to our findings is of low credibility and hence results could change if further studies become published.