The effectiveness of generic self‐management interventions for patients with chronic musculoskeletal pain on physical function, self‐efficacy, pain intensity and physical activity: A systematic review and meta‐analysis

Abstract Generic self‐management programs aim to facilitate behavioural adjustment and therefore have considerable potential for patients with chronic musculoskeletal pain. Our main objective was to collect and synthesize all data on the effectiveness of generic self‐management interventions for patients with chronic musculoskeletal pain in terms of physical function, self‐efficacy, pain intensity and physical activity. Our secondary objective was to describe the content of these interventions, by means of classification according to the Behaviour Change Technique Taxonomy. We searched PubMed, CENTRAL, Embase and Psycinfo for eligible studies. Study selection, data extraction and risk of bias were assessed by two researchers independently. Meta‐analyses were only performed if the studies were sufficiently homogeneous and GRADE was used to determine the quality of evidence. We identified 20 randomized controlled trials that compared a self‐management intervention to any type of control group. For post‐intervention results, there was moderate quality evidence of a statistically significant but clinically unimportant effect for physical function and pain intensity, both favouring the self‐management group. At follow‐up, there was moderate quality evidence of a small clinically insignificant effect for self‐efficacy, favouring the self‐management group. All other comparisons did not indicate an effect. Classification of the behaviour change techniques showed large heterogeneity across studies. These results indicate that generic self‐management interventions have a marginal benefit for patients with chronic musculoskeletal pain in the short‐term for physical function and pain intensity and for self‐efficacy in the long‐term, and vary considerably with respect to intervention content. Significance This study contributes to a growing body of evidence that generic self‐management interventions have limited effectiveness for patients with chronic musculoskeletal pain. Furthermore, this study has identified substantial differences in both content and delivery mode across self‐management interventions.


Introduction
Chronic musculoskeletal pain negatively influences daily life functioning, emotional well-being and social participation (Turk et al., 2011). Low back pain and neck pain alone contribute to 1694 years lost to disability (YLD) per 100,000 persons annually, placing these conditions, respectively, first and fourth in the ranking of diseases on global years lived with disability (Vos et al., 2012).
The experience of pain interrupts individuals' ongoing activities, forcing them to choose between pursuit of their intended action or activity, disengagement or avoidance behaviours. Motivational conflicts such as these constantly interfere with daily life activities and are assumed to have a negative effect on an individual's well-being and identity (Vlaeyen et al., 2016). In order to maintain sufficient quality of life, successful self-managementthe ability to manage symptoms, treatment, physical, psychological and social consequences, and lifestyle changes related to one's chronic conditionis essential (Barlow et al., 2002;Lorig and Holman, 2003). To facilitate this process, generic self-management interventions are designed to teach persons how to self-regulate their chronic condition. Rather than providing unilateral solutions to disease-specific problems, self-management interventions provide a generic set of skills and competencies (e.g. problemsolving, decision making, etc.) in order to facilitate living a meaningful life despite chronic pain.
The definition of self-management does not specify how this behavioural adjustment should be achieved. This allows for a large variety of content and delivery modes in self-management interventions. In order to provide more clarity in comparing interventions, Michie and colleagues have developed a taxonomy of behaviour change techniques (BCTs) that enables more precise reporting (Michie et al., 2013). Moreover, classification of components according to this taxonomy facilitates comparison of intervention content and is expected to provide insight into the various intended mechanisms of action.
As self-management programmes aim to facilitate behavioural adjustment, they have considerable potential for positive long-term effects on outcomes of importance to patients. However, as newly learned behaviours in the context of physical activity (Sullum et al., 2000) or pain rehabilitation (Turk and Rudy, 1991) are difficult to maintain, it is important to study the long-term outcomes of these interventions.

The present study
Our primary aim was to collect and synthesize all available data on the immediate and long-term (more than six months) effectiveness of generic selfmanagement interventions for patients with chronic musculoskeletal pain in terms of physical function, self-efficacy, pain intensity and physical activity. We hypothesized that self-management interventions would improve self-efficacy, enabling patients with chronic pain to increase their physical activity, consequently reducing their perceived limitations in physical function, at least in the short-term. Attributable to a shift in attention from disease-related problems to engagement in daily life activities, patients might even perceive less pain after the intervention.
Our secondary aim was to describe the intervention content, by means of classification according to the Behaviour Change Technique Taxonomy (v1). This aim builds on recent efforts to acquire more insight into theory and techniques behind self-management interventions (e.g. Keogh et al., 2015).

Protocol and registration
The review protocol has been registered in the Prospero database (CRD42015024417).

Information sources
We searched MEDLINE via PubMed, CENTRAL, Embase and Psycinfo databases for eligible studies from inception up to May 2017. The search strategy was designed in collaboration with a medical informatics specialist and contained a combination of thesaurus terms and free text words. The PubMed search string (see Appendix S1) was constructed first and was used as a template for the other databases. The database search was extended in the following ways: first, reference lists of included articles were screened by one of the researchers (SE) and eligible studies underwent the same reviewing process (i.e. backward citation tracking). Second, when a study was included in the analysis, PubMed was used to search for eligible studies that cited this study (i.e. forward citation tracking). Third, to minimize publication bias, we also searched for unpublished studies and grey literature in DART-Europe E-thesis portal, the Open Access Thesis and Dissertations database (OATD), the WHO International Clinical Trials Registry Platform (WHO-ICTRP), and the Networked Digital Library of Theses and Dissertations (NDLTD) with the combined terms 'chronic pain' and 'selfmanagement' as entry terms.

Eligibility criteria
We included randomized controlled trials that met the following eligibility criteria: The study sample had to consist of adult patients with chronic musculoskeletal pain, defined as pain that persists for longer than 3 months and that is perceived in the musculoskeletal system (i.e. bones, joints, tendons or muscles). Although self-management principles have been incorporated in multicomponent treatment programmes (e.g. Meng et al., 2011;Du et al., 2017), and self-management skill training can overlap with other types of interventions with different underlying theoretical approaches (e.g. action planning in the Health Action Process Approach, (Schwarzer, 2008), we were only interested in generic interventions that focused on improving behavioural adjustment by training self-management skills. Therefore, the intervention had to address at least one of the following five self-management skills: problem-solving, decision making, resource utilization, forming a partnership with a health care provider and taking action (Lorig and Holman, 2003). In addition, the intervention had to include both an element of information transfer on selfmanagement principles (e.g. education session or lecture) and a training component where self-management skills were actually rehearsed or performed. The intervention had to be focused on improving generic self-management skills, rather than on training disease-specific skills (e.g. joint protection techniques). The study had to include a control intervention that was not a self-management intervention. Lastly, the study had to include at least one of the following outcome measures: physical function, self-efficacy, pain intensity, or physical activity. For physical function, we included self-report instruments that measured the degree of interference that chronic pain had on daily life activities and social participation. For self-efficacy, we included selfreport instruments that measured the level of confidence in patients' capabilities to perform daily life tasks or activities. Pain intensity measures were included if they solely measured the degree of pain experienced on a scale from low to high intensity. Composite scores of various moments of pain intensity were also included (e.g. Von Korff scales), as well as sum scores of pain intensity for each tender point. For physical activity, we included both self-report instruments and activity trackers that provided an indication of how often certain types of physical activities were performed.
We excluded studies with samples that solely consisted of patients with osteoarthritis, because Kroon et al. (2014) had recently published a systematic review of self-management interventions for this subgroup. When a composite sample included patients with osteoarthritis, at least 50% of the sample had to consist of patients with other forms of chronic musculoskeletal pain. In addition, interventions that were designed to improve self-management in the context of pre-operative training, post-operative rehabilitation or palliative care were excluded, as we expected that this would lead to substantial heterogeneity regarding disease management and coping. To avoid heterogeneity, studies were also excluded if they only included patients on the basis of a specific comorbidity (e.g. psychiatric or obese patients), or if they combined the self-management intervention with other chronic pain treatment modalities (e.g. graded activity, exposure in vivo, Acceptance and Commitment Therapy, interdisciplinary pain management programmes). We also excluded e-health interventions that did not include any form of face-to-face contact during treatment, because a recent systematic review had been conducted on this topic (Eccleston et al., 2014). Only studies that were published in Dutch or English languages were included. We used the online application software 'Rayyan' to screen the abstracts (Ouzzani et al., 2016).

Study selection, data collection and risk of bias
Two researchers independently (HW and SE) performed the study selection, data collection and assessment of risk of bias in five stages. For each stage (abstract screening; full text inclusion; BCT data extraction; patient, intervention, comparison, outcome and study design data extraction; risk of bias assessment), we held pilot test sessions where we calibrated our procedures. At regular intervals within each stage, meetings were held to compare results and to reach consensus. If differences in scoring remained, a third researcher (JP) made the final decision. In the first stage, all abstracts were screened on eligibility criteria with respect to study design and patients. In the second stage, full text articles were read and checked on all eligibility criteria. Data collection started in the third stage and involved (1) copying all information regarding the intervention that was provided in the study or in the protocol; (2) extracting all individual intervention components from this information; and (3) classifying these components according to the BCT taxonomy v1 (Michie et al., 2013). In the fourth stage, we extracted all relevant data with respect to our analysis, including patient characteristics, means and standard deviations for all outcome measures of interest. For each study, we selected the measures that best fitted our definition for the primary outcomes. In accordance with the Cochrane Handbook, we considered studies as our primary source of interest (Higgins and Green, 2011). As a consequence, we also extracted data from study protocols and articles with follow-up data, when they were available. In the fifth stage, we determined risk of bias using the Cochrane's Collaboration's tool for assessing risk of bias. The following types of bias were assessed: random sequence generation (selection bias); allocation concealment (selection bias); blinding of outcome assessment (detection bias); incomplete outcome data (attrition bias), selective reporting (reporting bias) and other sources of bias. Blinding of participants and personnel was not included in the bias assessment, as the characteristics of self-management interventions do not allow for appropriate blinding. Due to the nature of the studies, we scored the default blinding of outcome assessment as high risk of bias, but upgraded to unclear or low if attempts to blind the outcome assessment for patients or assessors were described (e.g. blinding of patients to former assessment). All other types of bias were assessed according to the guidelines in the Cochrane Handbook (Higgins and Green, 2011). The risk of bias was used as input for the assessment of the quality of evidence for each outcome measure. Studies were considered high risk of bias when three or more items were scored unclear or high, or when two items were scored high.

Outcome reporting and data synthesis
Between-group comparisons for post-intervention (within one month of the end of the intervention) and follow-up (at minimum six months post-intervention) were calculated per study for each of the outcomes of interest, using RevMan 5.3 software (Cochrane, 2014). In case of more than one followup measurement, we included the last time point in our analysis. If more than one self-management group was included within a study, we used only the intervention group that best fitted our definition of self-management interventions. In the situation of more than one control group within a study, we included only the most active control group in our comparisons. Results were presented for each outcome separately. If the GRADE analyses revealed both directness and consistency as a serious risk of bias, we concluded that the data were too heterogeneous to perform a meta-analysis and presented the results narratively. Each outcome was expected to be measured with differing varying questionnaires. Therefore, standardized mean differences (SMD) with 95% confidence intervals were used. A priori, we decided to select random effects models because we assumed differences in the true outcomes across studies, based on between-study variation in duration, intensity and patient characteristics. If the pooled SMD was significant, we re-expressed this effect on one of the outcome measures to examine the clinical importance. This was performed by multiplying the SMD with the standard deviation of the control group of one of the included studies that adopted this measure. Subsequently, we compared this effect with available estimates of the minimal important change to assess the clinical importance. When it was not possible to obtain measures of central tendency or dispersion, the results were narratively presented and compared to the results of the meta-analysis.
BCTs were graphically visualized in a table. Relative differences between studies and between domains of the taxonomy were calculated and presented narratively.

Assessment of the quality of evidence
For each comparison in the meta-analysis, we used the GRADEpro Guideline Development Tool (Evidence Prime I, 2015) to determine the quality of evidence. As only randomized controlled trials were included, the initial quality of evidence started as 'high' and was downgraded as a result of limitations with respect to risk of bias, inconsistency, indirectness, imprecision or publication bias.
For each comparison, we downgraded the level of evidence when (1) more than 25% of the sample came from studies with high risk of bias; (2) the I 2 was more than 60% combined with a limited overlap of confidence intervals (inconsistency); (3) substantial differences were present in study population, intervention protocol, control group or outcome measures (indirectness); or (4) when the total sample size of all included studies was less than the optimal information size of n = 400 (imprecision). We determined the optimal information size with a 1580 Eur J Pain 22 (2018) 1577--1596 sample size calculation with a = 0.05, b = 0.8, SD = 0.2 as parameters (Sch€ unneman et al., 2013). To assess publication bias, funnel plot symmetry and distribution of effect sizes were inspected. We based our quality of evidence criteria on the Grade Handbook (Sch€ unneman et al., 2013) and the Cochrane Handbook (Higgins and Green, 2011).

Study selection
The search yielded 7843 hits. After removal of duplicates and the screening of abstracts, 102 full-text articles were assessed for eligibility. Eighty-two studies were excluded and 20 studies were selected for data extraction and analysis (see Fig. 1).

Patient and study characteristics
The total study population consisted of 3557 patients. Seventy-five percent of the study population was female. All studies were performed in Western Europe, Australia or the United States. Average pain duration characteristics were only reported in eight studies and the means ranged from 2.3 to 20 years with a median of 8.4 years. Patient eligibility criteria varied across studies and were based on localization (e.g. back pain), specific diagnosis group (e.g. fibromyalgia syndrome), or duration of pain. Table 1 provides an overview of all participant characteristics within each study.
The included studies show substantial variation regarding intervention content, delivery and measurement instruments (Table 2). For example, the median number of face-to-face sessions was 6 (range: 3-15), and the median duration was 15 h (range: 2.8-45 h). Furthermore, fifteen studies included a follow-up measurement of at least six months post-intervention, with a mean of 10.53 (SD = 2.59) months. The mean number of BCTs was 12.6 (range: 5-26). Forty-three of the 93 available BCTs in the taxonomy were identified in the studies and we identified BCTs in all domains of the taxonomy, except for scheduled consequences and covert learning. The domains with the highest numbers of BCTs were goals and planning (accounting for 27.8% of the total BCTs), and social support (10.6%). Six BCTs were frequently used in the interventions: 'Social support (unspecified) provided by group interventions', 'credible source provided by an experienced health care provider or patient', and 'goal setting (behaviour)' were present in at least 90% of the interventions. 'Problem-solving', 'instruction on how to perform the behaviour' and 'information about health consequences (education)' were present in 80-90% of the interventions. Appendix S2 provides a full overview of the BCT profiles per study.
Sixteen studies provided sufficient data to perform meta-analyses. One of these studies used change scores to control for baseline differences (Manning et al., 2014), whereas the other studies used final value scores of their outcome measures. As both type of scores are not compatible within one calculation of a standardized mean difference, we analyzed the comparisons of Manning et al. (2014) separately. The four studies that could not be included in the meta-analyses were presented narratively (Taal et al., 1993;Burckhardt et al., 1994;Dworkin et al., 2002;Hutting et al., 2015).

Physical function
For eight of the 11 studies that reported outcomes on physical function we were able to calculate standardized mean differences. Statistical pooling was considered appropriate despite a high I 2 , as a sensitivity analysis revealed that the heterogeneity was mainly attributable to one study (Asenlof et al., 2005), and that the confidence intervals showed substantial overlap. The pooled effect was calculated with Hedges' (adjusted) g and was significant; SMD À0.28 [À0.52, À0.03], z = 2.23, p = 0.03 (see Fig. 3). When this effect is re-expressed on a Pain Disability Index (PDI), using the baseline standard deviation (SD = 14.7) of the control group of Asenlof et al. (2005), this effect corresponds to a between-group difference of 4.12 points on a PDI favouring the selfmanagement group. This is lower than the minimal clinically important change of 8.5 points that was calculated by Soer et al. (2012). The between-group comparison of post minus pre scores of Manning et al. (2014) was not significant; SMD À0.40 [À0.82, 0.02], z = 1.88, p = 0.06. Burckhardt et al. (1994) and Dworkin et al. (2002) reported no betweengroup differences post-intervention. Taal et al. (1993) found a statistically significant difference on the Modified Health Assessment Questionnaire

Self-efficacy
Ten studies reported post-intervention comparisons for self-efficacy. However, due to the statistical heterogeneity (I 2 = 65%, v 2 (7965) = 17.39, p = 0.02), an overall effect was not calculated. Four comparisons showed statistically significant differences, favouring the experimental group (Lefort et al., 1998;Asenlof et al., 2005;Ersek et al., 2008;Nicholas et al., 2013) with SMD ranging from À0.74 to À0.32. Furthermore, Taal et al. (1993) reported an effect favouring the self-management intervention, with a change score of the experimental group (0.17) that significantly differed from the change score of the control group (À0.13), p < 0.05. Burckhardt et al. (1994) also found a statistically significant difference between both groups, with the self-management group reporting higher scores on the function subscale of the self-efficacy scale (620.7), than the control group (467.5). Four comparisons did not show an effect, (King et al., 2002;Stuifbergen et al., 2010;Manning et al., 2014;Knittle et al., 2015). The unpooled comparisons, indicating a trend favouring self-management, are shown in Fig. 4.

Pain intensity
Eight studies reported comparisons for pain intensity. Although the four studies that reported endpoint data showed substantial overlap of their confidence intervals, the I 2 was 55%. As a sensitivity analysis showed that the heterogeneity was contributable to only one study, a meta-analysis was performed. The results indicate a statistically significant difference favouring the self-management group, SMD À0.28 [À0.56, À0.01], z = 2.03, p = 0.04 (see Fig. 5). We re-calculated this effect on an 11-point NRS scale (0-10), using the baseline standard deviation of the control group in Nicholas et al. (2013). This effect corresponds to a 0.48 difference in pain intensity, measured on a 0-10 NRS, which is lower than the minimal clinically important difference (MCID) of 2.0 (Salaffi et al., 2004). Manning et al. (2014) reported a similar result: SMD À0.44 [À0.86, À0.02], z = 2.05, p = 0.04. Burckhardt et al. (1994), Dworkin et al. (2002) and Taal et al. (1993) reported no statistically significant post-intervention differences for pain intensity between the self-management and control groups.

Physical activity
Only three studies compared differences in changes of physical activity immediately post-treatment. Two studies provided sufficient information for a metaanalysis (Stuifbergen et al., 2010;Knittle et al., 2015). We pooled these study outcomes as the I 2 was 0% and there was substantial overlap in the two confidence intervals. There was no significant difference between the intervention group and the control group, SMD 0.14 [0.38, À0.14], z = 1.18, p = 0.24 (see Fig. 6). This is in line with Taal et al. (1993), who also did not find a difference between self-management and control groups on physical activity.

Evaluation of the evidence
The GRADE evidence plot (Table 3) shows the postintervention comparisons combined with the quality of evidence. For each outcome measure, fewer than   European Pain Federation -EFIC â 25% of the participants were from high risk of bias studies. The inconsistency was high for self-efficacy, due to high statistical heterogeneity compared with low overlap in confidence intervals. For the other outcome measures, the statistical heterogeneity was either limited or mainly contributable to one study. As a result of substantial variations in intervention content and outcome measures, all comparisons were downgraded for indirectness. Physical activity was the only comparison downgraded for imprecision, because the combined sample size was smaller than the optimal information size. Visual inspection of the funnel plots (see Appendix S3) did not indicate any publication bias. This resulted in the following evidence statements: For physical function and pain intensity, there is moderate quality evidence for a small but clinically insignificant effect favouring selfmanagement. For physical activity, there is low quality evidence for no effect of self-management compared to a control group. Although we did not calculate standardized mean differences for self-efficacy, based on the range of effects, we conclude that there is low quality evidence for a trend favouring the self-management intervention. The studies that were not included in the meta-analysis showed similar results and support these conclusions.

Limitations in physical function
Twelve out of 15 studies with follow-up data were eligible for pooling (see Fig. 7). The median followup time of all 15 studies was 12 months. As the statistical heterogeneity was low (I 2 = 0%), we performed a meta-analysis. The pooled effect of 11 studies with endpoint data was not statistically significant, SMD À0.07 [À0.16, 0.02], z = 1.60, p = 0.11, and this was also the case for Manning et al. (2014), SMD À0.06 [À0.47, 0.36], z = 0.27, p = 0.78. In addition, Hutting et al. (2015) and Taal et al. (1993) also reported no effects at follow-up. The only study that reported a long-term positive effect on physical function was Dworkin et al. (2002); at 12 months follow-up, the self-management group showed less limitation in physical function compared to control, p = 0.01.

Self-efficacy
For self-efficacy, the median follow-up time was 12 months. Six of eight studies were included in the meta-analysis (see Fig. 8). We found an I 2 of 0%, indicating homogeneous results across studies. For  Taal et al. (1993) reported a significant difference for self-efficacy favouring the selfmanagement group (p < 0.05) with a positive change score of 0.17 for the self-management group and a change score of À0.06 for the control group at 13 months follow-up, whereas Hutting et al. (2015) did not find a difference at follow-up.

Pain intensity
Ten of the 13 studies were included in the metaanalysis (see Fig. 9). The median follow-up time for all 13 studies was 12 months. We decided to pool the results as the I 2 was 40% and the overlap of confidence intervals was sufficient.  Taal et al. (1993) and Hutting et al. (2015) also did not report a significant difference at follow-up, but Dworkin et al. (2002) indicated that, at 12 months follow-up, pain intensity was lower for the experimental group, compared to the control group (p = 0.036).

Physical activity
Four studies provided information on follow-up time for physical activity, with a median follow-up time of 9 months. Three studies were eligible for pooling. We performed a meta-analysis as the confidence intervals largely overlapped and the I 2 was 0% (see Fig. 10). There was no overall group-effect on physical activity, SMD 0.15 [À0.07, 0.38], z = 1.34, p = 0.18. Taal et al. (1993) also reported no differences between both groups at follow-up.

Evaluation of the evidence
The GRADE evidence plot (Table 4) shows the standardized mean differences in combination with the quality of evidence ratings. The outcomes were evaluated similarly to the post-intervention results, with the exception of a high consistency score for selfefficacy. This resulted in the following evidence statements: At six to thirteen months follow-up, there is moderate quality evidence that self- European Pain Federation -EFIC â management interventions have a statistically significant, but clinically unimportant effect on self-efficacy. For pain and physical function, there is moderate quality evidence that self-management intervention groups are not more effective than control groups. For physical activity, there is low quality evidence that self-management interventions are not more effective than control groups. The studies that could not be included in the GRADE analysis showed similar trends and corroborated these conclusions.

Summary of main results
The primary aim of this study was to investigate the effectiveness of self-management interventions on physical function, self-efficacy, pain intensity and physical activity for patients with chronic musculoskeletal pain. We identified 20 randomized controlled trials that compared a self-management intervention to a control group. For post-intervention results, we found moderate quality evidence for a statistically significant, but clinically unimportant effect on physical function and pain intensity, both favouring the selfmanagement group. We also found low quality evidence for a trend favouring self-management interventions on self-efficacy and for no effect on physical activity. There was moderate quality evidence for a small, clinically insignificant, effect on self-efficacy at follow-up. We found moderate quality evidence for no between-group differences at follow-up for the remaining outcome measures. The results from the meta-analyses were corroborated by the studies that  could not be included in the pooling. These findings indicate that self-management interventions have an only marginal benefit for patients with chronic musculoskeletal pain both in the short and long-term. Furthermore, we found a large variety in BCTs used, indicating substantial differences between interventions in how to teach self-management skills.

Similarities to and differences with other systematic reviews
We identified four related systematic reviews that show similar trends in effectiveness. Jordan et al. (2010) identified one subgroup of self-management interventions that specifically targeted patients with osteoarthritis. For both short-and long-term comparisons with control groups, the effects on clinical outcomes were inconclusive: Three studies of seven showed improvement on pain intensity; and for functional disability and quality of life, one out of five studies reported better results for the self-management group than the control group. In addition, Nolte and Osborne (2013) evaluated the outcomes of 18 self-management interventions that adopted the Stanford criteria and concluded that these interventions were only marginally effective for pain, disability and depression. Only the median effect sizes for self-efficacy, d = 0.30 (Range: 0.05-0.72), and for knowledge, d = 0.78 (range: À0.05 to 1.11), were medium to large at post-intervention. Warsi et al. (2004) did not find a significant improvement on pain and disability associated with self-management interventions for patients with arthritis. This also holds for Kroon et al. (2014), who performed a systematic review and meta-analysis to assess the effectiveness of self-management programmes in patients with osteoarthritis. They concluded that self-management interventions caused small to no benefits, which is in line with the current findings. We also found two systematic reviews with contrasting findings. Du et al. (2011) studied self-management interventions for patients with chronic musculoskeletal pain and concluded that these were effective on pain intensity and disability. However, the pooled results only showed a trend in favour of self-management for patients with chronic low back pain and a statistically significant but small change in disability and pain intensity for patients with arthritis. For Du    (2017), the pooled comparisons (intervention vs. control) were statistically significant at all time points for patients with chronic low back pain, but the effect sizes were small (ranging from À0.20 to À0.29 for pain intensity and À0.19 to À0.28 for disability). Differences in inclusion criteria concerning the interventions could further explain the variations in outcomes.

Future directions
Although the results of this study may not be surprising in light of the previous findings from systematic reviews, there is a large body of evidence that shows how psychological adjustment in the situation of a chronic disease may lead to favourable outcomes, such as improved well-being and adaptive  European Pain Federation -EFIC â lifestyle changes (Stanton et al., 2007;de Ridder et al., 2008;Kamper et al., 2015). Below we will discuss three ideas that may explain why generic selfmanagement interventions are not as effective as expected and that could direct future research and intervention design. First, lasting behaviour change is a daunting challenge, which involves not only motivational factors such as self-efficacy and intention, but also automatic processes such as habit formation (Webb and Sheeran, 2006;Strack and Deutsch, 2004;Papies, 2016). For patients with enduring pain, these automatic factors may be of particular importance as they have coped with their pain often for several years, thereby allowing habitual routines to develop in response to pain perception. This could explain the marginal long-term effects because habits are difficult to modify, especially when interventions do not take these automatic behavioural processes into account (Papies, 2016). In order to successfully counter these habitual behaviours in interventions, Papies (2016) proposes a different approach with more emphasis on analysing and modifying these specific routines. This personalized approach differs from generic selfmanagement interventions that provide one set of skills expected to benefit all patients. In order to capture the individual tailoring that is required in these interventions, we endorse the recommendation of Morley et al. (2013) to further explore the potential of single-case methodology. For example, experience sampling technologywhere multiple (near) realtime self-reports of thoughts, feelings or activities can be obtainedcould provide a more detailed insight in longitudinal individual response patterns to treatment (Vlaeyen et al., 2001;Maes et al., 2015). Second, Keogh et al. (2015) attribute the limited effectiveness and large variety in content and delivery of self-management interventions to limited and inconsistent application of behaviour change theory   European Pain Federation -EFIC â throughout the intervention. Increased self-efficacy is often mentioned as an explanatory (mediating) factor, but it remains unclear how more confidence in the capability to live a meaningful life with pain would explain all post-intervention results, including pain intensity. In particular, as we only identified a post-intervention trend for self-efficacy favouring self-management interventions, other mechanisms that have not yet been identified could be responsible for the small short-term effects on pain intensity and physical function. We believe that future research on moderators and mediators of the relationship between self-management interventions and outcome measures could provide insight in how to optimize the effectiveness of this type of intervention. Third, despite the limited effectiveness of standalone generic interventions, self-management skills such as problem-solving, action-planning and decision making have the potential to reinforce existing pain management treatments. Indeed, self-management is regarded as a common component in interdisciplinary pain management programmes and is expected to facilitate more active and resilient coping (McCracken and Turk, 2002;Turk et al., 2011). Future studies should investigate the interaction between self-management skill training and diseasespecific treatment components. This would lead to more insights on the contribution of self-management skill training to long-term effects of pain management programmes.

Strengths and limitations
Although all included studies focused on enhancing generic self-management skills in order to improve clinical outcomes, there was a large variation on how to achieve and measure this. As a consequence, the methodological heterogeneity of the included studies negatively influenced the robustness of the outcomes. Therefore, the overall quality of evidence was downgraded for each comparison on indirectness. This also caused us to select a random-effects model, which made the pooled results difficult to interpret (Higgins and Green, 2011), even when the effect was re-expressed on the measurement scale of interest. Although this method provides an indication of the clinical importance of the effect, it cannot be regarded as a conclusive result. This is mainly because MCIDs are concerned with the effect at individual patient level rather than on mean scores at group level. However, an advantage of statistical pooling over qualitative forms of synthesis is that sample weights are included in the calculation of the overall effect. Visual inspection of the forest plots showed that only few individual studies reported small but statistically-significant effects, indicating that other forms of synthesis probably would have yielded similar interpretations. A second limitation is that our conclusions relate to average group effects and do not provide more detailed information on the proportion of patients that respond well to selfmanagement interventions. Although a responder analysis is recommended (Henschke et al., 2014), very few studies provided such details. The consequence is that we were unable to explore beyond an average effect at study level.
Furthermore, we aimed to expose the various mechanisms of self-management interventions by identifying and classifying the behaviour change strategies as much as possible. This method revealed commonly used strategies (e.g. a focus on goals and planning) as well as variation in the selection of techniques to support adaptive behaviour change for patients with chronic pain. This approach opened the black box of self-management interventions to a certain extent. Although it seemed a logical next step to investigate whether specific combinations of BCTs influence the outcomes (e.g. Michie et al., 2009), we refrained from doing these analyses. Due to the generally small standardized mean differences throughout the comparisons (range SMD between studies = À1 to 0.41), we hypothesized that further exploration would not yield meaningful information.

Conclusion
There is moderate quality evidence that generic selfmanagement interventions have a small clinically unimportant post-intervention effect on physical function and pain intensity. For physical activity, there is low quality evidence for no post-intervention effect and for self-efficacy, though we identified a trend favouring self-management interventions. At follow-up, there is moderate quality evidence for no effect of self-management interventions on physical function and pain, and low quality evidence for no effect on physical activity. In addition, we found a small but clinically unimportant long-term effect for self-management interventions on self-efficacy. Overall, these findings indicate that self-management interventions only have a marginal benefit on self-efficacy, pain intensity, physical function, and physical activity for patients with chronic musculoskeletal pain.