The importance of context (placebo effects) in conservative interventions for musculoskeletal pain: A systematic review and meta‐analysis of randomized controlled trials

Contextual effects (e.g. patient expectations) may play a role in treatment effectiveness. This study aimed to estimate the magnitude of contextual effects for conservative, non‐pharmacological interventions for musculoskeletal pain conditions. A systematic review and meta‐analysis of randomized controlled trials (RCTs) that compared placebo conservative non‐pharmacological interventions to no treatment for musculoskeletal pain. The outcomes assessed included pain intensity, physical functioning, health‐related quality of life, global rating of change, depression, anxiety and sleep at immediate, short‐, medium‐ and/or long‐term follow‐up.


| INTRODUCTION
Non-pharmacological conservative interventions can improve a range of patient-reported outcomes across several musculoskeletal conditions.For example, exercise training was shown to improve pain and/or disability in acute (Gianola et al., 2022) and chronic back pain (Owen et al., 2019), neck pain (Gross et al., 2016) and osteoarthritis of the knee (Fransen et al., 2015) or hip (Fransen et al., 2014).Similarly, spinal manipulation improved pain and/or disability for acute (Paige et al., 2017) and chronic back (Rubinstein et al., 2019) or neck pain (Gross et al., 2015).Moreover, acupuncture was shown to lessen chronic pain (Vickers et al., 2012) and shockwave therapy improved pain outcomes in patients with plantar heel pain (Li et al., 2019) or frozen shoulder (Zhang et al., 2021).However, systematic reviews of non-pharmacological interventions for musculoskeletal pain reveal a persistent struggle in establishing the superiority of active and passive treatments over their sham counterparts (Howick et al., 2013;Machado et al., 2009).This dilemma becomes particularly salient in the context of subjective outcomes like pain, where the line between true therapeutic impact and placebodriven improvement becomes blurred (Machado et al., 2009).Within this context, it is plausible that contextual factors, which encompass physical, psychological and social elements defining the therapeutic interaction, might serve as both causal mediators and/or moderators for these treatments (Rossettini et al., 2018;Testa & Rossettini, 2016).To improve the effectiveness of non-pharmacological interventions for musculoskeletal pain, it is essential to identify the contextual factors that influence treatment outcomes.
The effects of medical interventions (i.e. the total treatment effect) may be broken down into three components: (1) non-specific, (2) contextual and (3) specific effects (Cashin et al., 2021;Hróbjartsson & Gøtzsche, 2010).Non-specific effects describe those effects associated with the natural history of disease (Herbert et al., 2011), natural fluctuations in disease severity (Herbert et al., 2011), regression to the mean (Barnett et al., 2005), measurement error (Streiner et al., 2015), random error (Streiner et al., 2015), spontaneous remission and the Hawthorne effect (McCambridge et al., 2014).These are not inherent to the treatment and occur naturally over time.Contextual effects occur as a result of exposure to contextual factors associated with the clinical interaction and include features of the patient (e.g.treatment expectations) or therapist (e.g.friendliness) (Sherriff et al., 2022), patienttherapist relationship (Kinney et al., 2020) and setting of the intervention (Sandal et al., 2019).These produce a treatment effect independent of the specific effect of the intervention.The specific effect is the effect inherent to the treatment, such as via the physiological mechanism of action of the treatment itself, and is calculated by subtracting the contextual effects and non-specific effects from the total treatment effects (Ernst & Resch, 1995).To estimate the contextual effects, any RCT study design that contains at least a true control group and a placebo intervention is necessary (Gotzsche, 1994).
In this study, we adopt the term 'contextual effects' to refer to the broader impact of various factors within the therapeutic encounter on treatment outcomes.While we recognize that placebo effects constitute a substantial portion of these contextual effects, we aim to explore their influence comprehensively.Our decision to investigate placebo interventions against no-treatment control groups stems from the recognition that placebo responses are inherently intertwined with contextual factors.Placebo interventions, often involving mechanisms such as patient-provider interactions, expectancy and psychological conditioning, are vehicles through which contextual effects can manifest.By comparing placebo interventions to no-treatment controls, we seek to discern the extent to which contextual effects contribute to treatment outcomes, not solely limited to traditional placebo responses.
However, given the known effect sizes of recommended evidence-based treatments for musculoskeletal conditions, it may still contribute an important component.
Significance: Contextual effects of non-pharmacological conservative interventions for musculoskeletal conditions are likely to be small for a broad range of patient-reported outcomes (pain intensity, physical function, quality of life, global rating of change and depression).Contextual effects are unlikely, in isolation, to offer much clinical care.But these factors do have relevance in an overall treatment context as they provide almost 30% of the minimally clinically important difference.
The size of these contextual effects is under contention, especially for subjective outcomes (e.g.pain).From their comprehensive meta-analysis comparing various placebo interventions with a no-treatment control group across a diverse range of conditions, Hróbjartsson and Gøtzsche (2010) concluded that the clinical impact of placebo interventions is generally nuanced.While their analysis suggested that placebo interventions may not consistently yield significant direct objective clinical effects, it's notable that they can exert influence over subjective outcomes, particularly in the context of pain and nausea.Other researchers found higher effect sizes for placebo interventions on pain reduction in an experimental setting but not in a clinical setting (Price et al., 2003;Vase et al., 2002).However, the interpretation of these findings continues to be debated (Hróbjartsson & Gøtzsche, 2006;Vase et al., 2009), and despite the passage of more than two decades, the ongoing discourse underscores their enduring significance.This ongoing exchange of ideas indicates that the complexity of the psychological mechanisms underpinning these effects often eludes thorough comprehension through conventional meta-analytic approaches (Einarson et al., 2001).Two more recent systematic reviews (Strijkers et al., 2021;van Lennep et al., 2021) looked at the effectiveness of placebo interventions in a low back pain population.The estimated effects differed and were small to medium for chronic low back pain patients.However, it is still unclear how large and clinically relevant the contextual effects for non-pharmacological conservative interventions for musculoskeletal pain are.
The aim of this systematic review is to (a) estimate the magnitude of contextual effects for non-pharmacological conservative interventions for musculoskeletal pain conditions for multiple patient-reported outcomes and (b) to assess the clinical significance of these effects.

| METHODS
This review is reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Ardern et al., 2021;Page et al., 2021) and was prospectively registered (PROSPERO: CRD42021282620).Data and statistical code can be found in an online repository (https:// osf.io/ 2ghm4/ ).

| Patient involvement
There was no patient or public involvement in creating this systematic review.

| Search strategy
An electronic database search of MEDLINE, EMBASE, CINAHL, Web of Science Core Collection, CENTRAL and SPORTDiscus was conducted (Supplement 1).Searches were performed from their inception to September 2021.The search terms were identified after preliminary searches of the literature and by comparing them against a previous systematic review (Hróbjartsson & Gøtzsche, 2010;Strijkers et al., 2021).Unpublished and ongoing trials were searched via WHO International Clinical Trials Registry Platform (http:// www.who.int/ ictrp/ en/ ) and the US National Institutes of Health (https:// clini caltr ials.gov/ ).A search for prior systematic reviews published was completed via the Cochrane Database of Systematic Reviews (search terms: placebo, pain, limits: none) and GoogleScholar (search terms: placebo, pain, review, limits: first 10 pages).We also performed forward and backward citation tracking.Forward citation tracking was performed via Web of Science Core Collection.Two independent reviewers (TS, AR) evaluated all trials against pre-specified inclusion/exclusion criteria based on title/abstract and subsequently full text.Disagreements were settled through discussion among the reviewers (TS, AR).A third reviewer (AA) adjudicated any disagreement.

| Inclusion criteria
Included studies were required to be randomized trials (individual, cluster or cross-over) in English or German with at least a comparison of placebo intervention versus no intervention.Quasi-RCTs and non-RCTs were excluded given they do not offer an unbiased estimate of the effect size (Herbert et al., 2011).The use of the term placebo and sham as it refers to an intervention arm within controlled trials is used interchangeably within the literature.For our study, the term placebo was used to describe a control arm as being a placebo or sham interventions.The study had to be available as a full text article (i.e.grey literature excluded).Inclusion criteria followed the Participants, Interventions, Comparators, Outcomes, Study design (PICOS) framework (Page et al., 2021).
Population: Adults (≥18 years) with any primary musculoskeletal pain (Treede et al., 2015).There were no restrictions on sex or race.
Interventions: A placebo intervention was defined as 'any intervention defined as a placebo or sham intervention by the study author' (Hróbjartsson & Gøtzsche, 2010).The actual intervention investigated in the trial was a conservative non-pharmacological intervention as defined in our Supplement 2.
Comparators: A no-treatment control group including wait-list control was included.We also included trials in which both the placebo group and no-treatment control group received the same basic treatment, which refers to interventions sharing fundamental therapeutic components (e.g. both trial arms received additional physical therapy).
Outcomes: Primary outcomes were any general or disease specific measures of pain intensity and any general or disease-specific measures of physical functioning (see Supplement 3 for the list of example instruments).Secondary outcomes were: global ratings of improvement by the study participant, any measure of self-reported quality of life, any measure of self-reported depression, any measure of self-reported anxiety and any measure of self-reported sleep.
Time: We considered and grouped the outcomes according to the following follow-up time-points: immediate (<1 day), short-term (≥1 day but <3 month), intermediateterm (≥3 but <12 month) and long-term (≥12 month).If multiple follow-ups existed within each timeframe, we extracted the follow-up closest to 1 month for short-term, 3 months for intermediate term and 1 year for long-term.
Exclusion criteria can be found in Supplement 4.

| Data extraction
Study information was independently extracted by two authors (TS, AA), with disagreement settled via discussion.If disagreement could not be settled, a third adjudicator (AR) decided.The following information were extracted: author, year, primary musculoskeletal condition, study design, sample size, age, percentage of females, duration of complaints, intervention type, total duration of treatment, number of treatments, outcome measures and follow-up time points.Extracted data was compared for differences between extractors via the diffdf package in R. If a study report did not report relevant data for extraction, the corresponding author was contacted on two occasions over a 2-week period.Data for the main results were extracted as mean and standard deviation (baseline, post-treatment and change from baseline) and as ANCOVA for the outcome pain intensity (adjusted by baseline values) effect estimates where possible or the number of events (n) and non-events (N) where applicable.
When these were not reported and we did not receive data from the study author we used a specific calculator (https:// smcgr ath.shiny apps.io/ estme ansd/ ) to convert data given as medians and interquartile ranges or ranges (McGrath et al., 2020) Studies with multiple treatment groups were preferably combined to create a single pair-wise comparison.If that was not feasible, we split the 'shared' group into two or more groups with smaller sample size, so that two or more (reasonably independent) comparisons were analysed (Higgins et al., 2019).If multiple outcomes were reported on different scales in one trial, we standardized them to one scale and combined them to one result using a within-trial synthesis that accounts for correlation between the different scales.The outcomes within the cluster (study) were inverse-variance weighted (Viechtbauer, 2010).Cluster randomized trials and crossover trials were included in the analysis as per Cochrane guidance (Higgins et al., 2019) (see Supplement 5 for further details).Sensitivity analyses were conducted in pairwise analyses with a range of different ICCs to check the robustness of the results (Higgins et al., 2021).Where data are presented in a figure only, GetData Graph Digitizer (http:// www.getda ta-graph -digit izer.com) was used to extract the values by measuring the length of the axes in pixels followed by the length of the relevant data of interest (Vucic et al., 2015).

| Risk of bias assessment and GRADE
Risk of bias was assessed via the Cochrane Risk of Bias Tool 2.0 according to the study design (i.e.parallel, crossover or cluster RCT) (Sterne et al., 2019).An overall risk of bias judgement was made for the primary subjective outcome self-reported pain intensity.If a pain intensity outcome was not available a suitable outcome for physical function was chosen.Assessment of risk of bias was based on results of the last follow-up time point of the individual study.We chose the last-follow-up point to give the most conservative estimate of the risk of bias because the effect of missing data will be likely most pronounced at the last follow-up time point (Flemyng et al., 2023).Two independent assessors (TS, AA) performed the assessment.Disagreements were resolved through discussion or by a third reviewer (AR).
The Grading of Recommendations Assessment, De velopment and Evaluation (GRADE) approach was used to assess the certainty of evidence.We detail our used criteria in (Supplement 6) (Higgins et al., 2019).All ratings started at a high level of certainty given guidelines for meta-analyses including RCTs only.Two authors (ST, SK) performed the GRADE assessment.TS was the adjudicator if no agreement was reached.

| Statistical analysis
For the outcome of pain intensity, we prioritized ANCOVA data (mean differences with baseline score as a covariate) over change-from-baseline data (Daly et al., 2021).For all other outcomes, we used change-from baseline data.
If the correlation coefficient of pre-and post-treatment SDs was not available, we used ρ = 0.59 as a correlational value (Balk et al., 2012).We performed a sensitivity analysis with the 25th and 75th percentile of ρ = (0.40, 0.81) (Balk et al., 2012).We used mean differences (MD) as effect size for pain scales by conversion to a common 0-100 scale (Thorlund et al., 2011).Physical function and the secondary outcomes were analysed with standardized mean differences (SMD) with an internal reference SD as there were different scales for these outcomes (Daly et al., 2021).SMD with sample-based standardization have diverse inefficiencies and thus were not applied (Daly et al., 2021).The outcome global rating of improvement was analysed with sample-based SMD because no baseline SD were available.Some trials did not report the outcome global rating of improvement on a continuous scale so we converted the odds ratio to SMD via established formulas SMD = log(OR) * √ 3 with variance V SMD = V log(OR) * 3 2 (Borenstein et al., 2009).Effect sizes of cross-over RCT (mean differences and SMD) were calculated approximately as no data was available for an exact analysis (Higgins et al., 2019).We assumed a conservative correlation coefficient corr = 0.5 to calculate the standard errors of the effect sizes.For cluster RCT, there was also no data available for an exact analysis, so we adjusted the sample size for clustering and used an intracluster (or intraclass) correlation coefficient (ICC) of 0.05 (Campbell et al., 2000) to calculate the design effect = 1 + (M − 1) * ICC, where M is the mean cluster size (Higgins et al., 2019).
We focused our analysis on outcomes where the number of studies was equal to or greater than five, as the metaanalysis utilizing robust variance estimation requires at least four degrees of freedom to maintain the assumption of a t-distribution (Tanner-Smith et al., 2016).Effect sizes  were analysed via random effects meta-analysis with robust variance estimation to account for correlated effects of multiple time points of one study (Fisher & Tipton, 2015;Tanner-Smith et al., 2016).Measures of heterogeneity used were Cochrane's Q and the resulting chi-squared statistic and I 2 .95% Prediction intervals were used to assess the impact of heterogeneity (Borenstein, 2019).

| Sensitivity analysis
Since correlational values for robust variance estimation were unknown, we assumed a correlational value of ρ = 0.8 for all main analyses.We performed sensitivity analysis with a range of different correlational parameters of ρ = (0.0, 0.2, 0.4, 0.6, 1.0).We calculated standard random effects meta-analysis at each time point as a sensitivity analysis to compare these estimated time-points with the results of the multivariate meta-analysis with robust variance estimation.The heterogeneity parameter τ (tau) of the standard random effects analysis was estimated via restricted maximum likelihood (REML) estimation.The confidence interval of the pooled effect of the standard analysis was calculated via the Knapp-Hartung method with an ad hoc adjustment following Jackson et al. (2017).
Additional sensitivity analyses were performed via outlier identification and influence analysis (Viechtbauer & Cheung, 2010).The impact of the choice of correlational values and the impact of imputed values were also assessed.

| Reporting bias assessment
Small study effects and publication bias were assessed via funnel plots, Egger's test and trim and fill methods if at least 10 studies were included in the meta-analysis (Mavridis & Salanti, 2014).

| Meta-regression
Univariate random effects meta-regression with robust variance estimation was performed if at least 10 studies were available (Higgins et al., 2019).We fitted robust variance estimates meta-regression with the following prespecified covariates when 10 or more studies were available: study year, country of study origin, nonstandardized co-analgesia (yes or no), co-intervention (yes or no), pain type (acute, subacute, chronic, mixed), pain condition (e.g.low back pain, hip pain, etc.), placebo type (needle, manual, machine, etc.), experimental trial versus clinical trial, risk of bias (low, some concerns, high), type of analysis applied by the trial intention to treat analysis/per protocol analysis.Choice of covariates was partially based on a study by Kamper et al. (2008).It should be noted that degrees of freedom (df) < 4 means that the results of this particular analysis (category) cannot be trusted because the t distribution approximation does not hold (Tanner-Smith et al., 2016).Hence, we only interpret results with df ≥4.The intercept of the dichotomous and categorial outcomes was removed to enable an easier interpretation.

| Clinical significance
Minimally clinically important differences were prospectively defined considered for pain intensity as a between group difference of 20% (Christiansen et al., 2018;Ferreira et al., 2013) which corresponds to 20 points on a 0/100 pain scale.For outcomes assessed via SMDs, we considered a value of 0.5 as clinically relevant (Norman et al., 2003).

| Patient and public involvement
The review team did not engage with consumer groups due to limited funding and the review protocol was drafted before the involvement of patients and the public in reviews became standard practice.Instead, the review team provided the results of the review to their clinical colleagues and individuals from the public with whom they had personal relationships.The team sought informal feedback from these individuals based on their experiences with musculoskeletal pain as either patients or clinicians.T A B L E 2 Overview of all meta-analytic outcomes with GRADE ratings.

Outcome (follow-up, effect size [ES])
Included studies T A B L E 2 (Contiuned)

| RESULTS
We identified 8898 reports through database searching and manual search of reference lists of relevant literature reviews.After removing duplicates and screening titles and abstracts of all remaining unique reports, 184 full-text reports were assessed for eligibility.We included 64 studies (see Supplement 7 for citations) among 65 study reports (Figure 1).Notably, 6 studies (Bush et al., 1985;Kim et al., 2020;Lewis et al., 2010;Lichtbroun et al., 2001;Shiasy et al., 2020;Vlaeyen et al., 1996) were eligible for inclusion but failed to report the relevant outcomes sufficiently.We contacted the authors but received no response.Literature sources and reasons for exclusion of ineligible studies/reports are reported in Supplement 8.

| Risk of bias and GRADE assessment
The primary outcome pain intensity in parallel RCTs was judged low for one outcome, 10 as some concerns and 41 outcomes were judged at high risk of bias.In the cluster and cross-over RCTs, the outcomes were all judged to be at high risk of bias (Supplement 11).Detailed summary risk of bias plots is given in Supplement 12.The certainty of the evidence was rated for meta-analytic outcomes low (one time) and very low (all other outcomes) and as very low for individual study outcomes (Supplement 13).Main reasons for downgrading the evidence were risk of bias, publication bias and indirectness.

| Data handling
All applied data handling methods (median conversion, transformation of standard errors, transformation of confidence intervals, pooling/splitting of groups, extraction from figures, etc.) are listed for each study and outcome in Supplement 14.We defined two comparators for analysis: placebo versus control (no treatment group or usual care group).All outcomes and studies employed in analysis can be seen in Table 2.Estimated internal reference standard deviations for the SMD can be found in Supplement 20.

| Primary outcome pain intensity
We included 57 studies with 74 time-points in the main analysis for all time-points.For immediate-term followup 10 studies, for short-term follow-up 44 studies, for intermediate-term follow-up 17 studies and for long-term follow-up 3 studies were included.

| Primary outcome physical function
Collectively, 37 studies were analysed for all time-points with 48 total time-points.Analyses of follow-up included the following number of studies: immediate term 2 studies, short-term 31 studies, intermediate-term 12 studies and long-term 3 studies.

| Quality of life
Fourteen studies with 16 time-points were analysed.For short-term and intermediate 11 studies and 4 studies were included.Only one study reported long-term results.No study reported results for immediate follow-up.

| Global rating of change
Seven studies with 14 time-points were included in the analysis for all time-points combined.Six studies were analysed for short-term and intermediate-term follow-up.For the long-term follow-up, there were only two studies available.

| Depression
Six studies were analysed for the outcome depression for 6 time-points.5 studies reported short-term outcomes and one study intermediate-term outcomes.

| Other secondary outcomes
Results, forest plots and GRADE rating for the Hospital Anxiety and Depression Scale (HADS) [N = 2 studies], anxiety [N = 2 studies] and sleep outcomes [N = 1 studies] can be found in Supplement 16.These outcomes could not be meta-analysed due to the low number of studies.

| Results of meta-regressions
Robust variance estimations meta-regressions with one covariate each were performed for the outcomes pain intensity, physical function and quality of life to assess the impact of the prespecified covariates.For the other outcomes, the number of studies was too low (n ≤ 10 studies).All results can be found in Supplement 17.

| Meta-regression for pain intensity
None of the analysed covariates showed a magnitude of a difference between subgroups that could be considered clinically important.This is not surprising as the 95% prediction interval contained no clinically important effects (95% PI: −15.28, 4.64) for pain intensity reduction.

| regression for quality of life
The analysed covariates did not include clinically meaningful effects for any covariate.This is not surprising as the 95% prediction interval contained no clinically important effects (95% PI: −0.39, 0.04) for the improvement of quality of life.

| Results for sensitivity analyses
Sensitivity analyses for different values of the assumed correlation for the calculation of the change from baseline standard deviations and for different correlation parameters for robust variance estimation were robust to different parameters of rho (Supplement 15).Results were also robust to the removal of outlier and influential studies, the removal of studies with imputed values (medians and IQR, values extracted from images and imputed standard deviations) (Supplement 15).

| Reporting biases
Our empirical assessment of publication bias is detailed in Supplement 18.We rated down one step for publication bias for the outcomes pain intensity and physical function because there was empirical evidence from funnel plots, Eggers test and a trim and fill analysis that showed evidence of small study effects which could be a hint for publication bias.

| Amendments to the protocol
We changed our inclusion criteria by removing the inclusion of usual care studies because they undermine the rational for the estimation of contextual effects and create clinical heterogeneity.To ensure transparent reporting, we retain the originally planned analyses in Supplement 19.We removed the covariate 'placebo indistinguishability ("Can an outside observer tell a difference between placebo and intervention?')'from our meta-regression due to reliability and validity concerns in rating this outcome.

DISCUSSION
This systematic review assesses magnitude of contextual effects of non-pharmacological conservative intervention for musculoskeletal pain conditions.Our key finding was that, for a broad range of patient-reported outcomes (pain intensity, physical function, quality of life, global rating of change and depression), we estimated mostly very low certainty trivial effect sizes that did not appear to reach clinical significance and showed a longitudinal trend of diminishing effect in the course of time.For example, for pain intensity, we estimated an effect size of MD: −5.32 for contextual effects with a confidence interval from −7.20 to −3.44 on a 0-100-point VAS scale.Importantly, all results were robust to various sensitivity analyses.

| Results in perspective of the available literature
A Cochrane Review by Hróbjartsson and Gøtzsche (2010) found a trivial effect for placebo interventions for the outcome of pain intensity for a mixture of populations and intervention.This is comparable to our results in a more homogenous musculoskeletal pain population which excluded medication or injection therapies.A more recent systematic review (Strijkers et al., 2021) included pharmacological and injection therapies and reported similar results as our review for pain intensity, disability and quality of life for participants with chronic low back pain.with the placebo intervention and the clinical encounter as the type of placebo, the colour of the pills, the intensity of the treatment, branding or marketing, cost of treatment, and invasiveness of the treatment may mediate the placebo effect (Doherty & Dieppe, 2009).Therefore, this may partially explain the effect size reported by van Lennep et al.We completed a subgroup analysis via meta-regression accounting for the type of placebo intervention; however, the findings show that the type of placebo intervention did not reveal clinically meaningful effects.The results of our review as well as the results of the other analyses do not support a powerful effect of placebo interventions which is sometimes stated in the literature (Beecher, 1955;Colloca & Barsky, 2020).The contextual effects from these interventions are trivial or small and not clinically meaningful.
We can conclude that contextual effects are unlikely, in isolation, to offer much clinical care.Importantly, these factors do have relevance in an overall treatment context, that is an (evidence-based) treatment might not lead to clinically significant effects if contextual factors are minimized.Exercise, psychosocial interventions, multidisciplinary management and education/information are commonly recommended treatment options for musculoskeletal pain recommended in international evidence-base guidelines (Babatunde et al., 2017;Lin et al., 2020).High-quality systematic reviews with meta-analyses have shown the impact on chronic musculoskeletal pain to be ~19.88(95% CI: 13.2, 26.4) VAS points for exercise and ~14.6 (95% CI: 4.8, 24.4) for multidisciplinary interventions compared to no intervention or usual care (Kamper et al., 2015;Miller et al., 2022).Furthermore, MCID is often taken for pain intensity to be 20 points (Christiansen et al., 2018;Ferreira et al., 2013).If we compare our estimated contextual effect for pain intensity of 5.32 to these data, then we can see that it makes up almost 30% of the MCID, and 30%-40% of recommended treatments.This is a sizable component that should not be disregarded.We must note that this contention is particularly relevant in trials that include a no treatment comparison.The impact of contextual factors becomes less pronounced in head-to-head comparisons of treatments, as these factors can exert influence on both groups.It is possible that larger contextual effects (Kaptchuk et al., 2006) can be achieved by using certain types of placebo interventions (e.g.needles, certain forms of manual therapy, etc.), but we could not find convincing evidence for this.

| Strengths and limitations
The pooling of different pain types, pain conditions and placebo/sham interventions might seen as a limitation leading to heterogeneity.To consider this, we used random-effects meta-analysis and performed meta-regressions with pain types, pain conditions and interventions as covariates.The amount of heterogeneity exemplified by the 95% prediction intervals for most analyses did not capture clinically meaningful effects.Furthermore, we could not identify covariates that explained the remaining amount of heterogeneity in a clinically meaningful way.It is important to note that the limited number of studies available for the metaregression analysis warrants caution when drawing definitive conclusions (Thompson & Higgins, 2002).No-treatment control groups create performance bias because of a lack of blinding (Hróbjartsson, 2002), cointervention bias (Hróbjartsson, 2002), attrition bias, response bias, compensatory rivalry, resentful demoralization or nocebo effects (Gerdesmeyer et al., 2017).It should be noted that response bias and co-intervention bias might be partly cancelled out by each other (Hróbjartsson & Gøtzsche, 2010).These biases might lead to an underestimation of the contextual effects (Einarson et al., 2001).For example, a wait list group had worse outcomes than a true no treatment group in a psychotherapy context (Furukawa et al., 2014).We used meta-regression to control for co-interventions and co-analgesia and could not find a clinically relevant difference that showed major discrepancies to the main analyses.But we cannot absolutely rule out that different types of no treatment control groups have important differences.One other limitation of the current paper is that it does not address the issue of placebo effect size in relation to the true intervention effect size.Placebo interventions that mimic non-pharmacological interventions, which already have a relatively small treatment effect, may produce even smaller and clinically irrelevant placebo effects (Howick & Hoffmann, 2018).This is illustrated by the results of Miller et al. (2022) who found that non-pharmacological interventions were only superior to no-treatment or usual care, but not to sham interventions.In contrast, placebo interventions that mimic high-impact clinical treatments, such as surgery, may have substantial effects that challenge the validity of some surgical procedures (Louw et al., 2017).
We did not search for the possibility of different types of responders to placebo interventions which might exist in this patient population (Vase, 2020).

| Future directions
Future RCTs should be designed to measure contextual effects at low risk of bias (Kamper & Williams, 2013) and be designed to be able to ascertain the contribution of specific, non-specific and contextual factors to the overall treatment effect.An example would be a three-arm trial with an exercise intervention, sham exercise intervention and a no-intervention control, to examine the contextual effects of exercise in more detail.No sham exercise trial was available in the literature for inclusion in this review.Intervention and sham intervention need to conform to the TIDieR checklist (Hoffmann et al., 2014) to ensure adequate reporting of details and replicability.It needs to be ensured that the surrounding contexts of the interventions are identical.This includes how the treatment is administered, the treatment rooms, the training, the presentation of the clinician, the verbal interaction with the patient, the advice given, etc. (Kamper & Williams, 2013).To reduce bias due to unblinding, the third arm wait-list group could be told that they are participating in an observational study potentially minimizing bias (Hinman et al., 2014).Important contextual factors, such as expectations, previous treatment experiences, patient beliefs and treatment satisfaction, should be collected and assessed.Incorporating these factors into analysis could help us determine whether they act as potential mediators, being affected by the treatment and in turn influencing the treatment's impact on the outcome.Additionally, it is essential to explore whether these contextual factors function as moderators, altering the direction or strength of the relationship between the treatment and the outcome (Baron & Kenny, 1986).
Furthermore, identifying whether there are responders and non-responders to placebo interventions could inform clinical practice by assessing the variability of the contextual effects (Mills et al., 2021).Studies of longer duration should also be undertaken as studies that reported longer-term follow-ups (≥12 months) were underrepresented with a maximum of 3 trials reporting for one outcome.

| CONCLUSION
The contextual effects of non-pharmacological conservative interventions for musculoskeletal conditions were statistically significant but small in magnitude for the patient-reported outcomes of pain intensity, physical function, quality of life, global rating of change and depression.Nonetheless, contextual effects could make up a significant percentage of commonly recommended evidence-based treatments.In comparison to a commonly used threshold of clinical significance for intensity, contextual effects could make up ~30%.Future research should focus on low risk of bias RCTs that enable robust quantification of what contributes to the total treatment effect.

F
I G U R E 2 Placebo/Sham intervention versus control for the outcome pain intensity.A minus sign signifies an effect in favour of the placebo intervention.Dashed lines on the forest plot diamonds represent the 95% prediction intervals.The pooled effect size (MD) for all time points is in black, and the pooled effect sizes for individual time-points are in grey.The dashed red lines represent the region of equivalence.

F
I G U R E 3 Placebo/Sham intervention versus control for the outcome physical function.A minus sign signifies an effect in favour of the placebo intervention.Dashed lines on the forest plot diamonds represent the 95% prediction intervals.The pooled effect size (SMD) for all time points is in black, and the pooled effect sizes for individual time-points are in grey.The dashed red lines represent the region of equivalence.

F
I G U R E 4 Placebo/Sham intervention versus control for the outcome quality of life.A minus sign signifies an effect in favour of the placebo intervention.Dashed lines on the forest plot diamonds represent the 95% prediction intervals.The pooled effect size (SMD) for all time points is in black, and the pooled effect sizes for individual time-points are in grey.The dashed red lines represent the region of equivalence.
Conversely, van Lennep et al. (2021) found medium reduction for pain intensity and disability in a chronic low back pain population that included medication and injection placebo interventions.This discrepancy to our results (n = 59 trials) and the results by Strijkers et al. (n = 19 trials) might be due to the lower number of studies (n = 5) included in the review by van Lennep et al.Another explanation for the discrepancy of findings may be due to the differences in the placebo interventions investigated.Although not conclusive, the type of placebo, and in particular the meaning (inclusive of expectancy) may influence the magnitude of effect on outcomes such as pain intensity (Doherty & Dieppe, 2009; Fässler et al., 2015).Attributes associated F I G U R E 5 Placebo/Sham intervention versus control for the global rating of change A minus sign signifies an effect in favour of the placebo intervention.Dashed lines on the forest plot diamonds represent the 95% prediction intervals.The pooled effect size (SMD) for all time points is in black, and the pooled effect sizes for individual time-points are in grey.The dashed red lines represent the region of equivalence.

F
I G U R E 6 Placebo/Sham intervention versus control for the depression.A minus sign signifies an effect in favour of the placebo intervention.Dashed lines on the forest plot diamonds represent the 95% prediction intervals.The pooled effect size (SMD) for all time points is in black, and the pooled effect sizes for individual time-points are in grey.The dashed red lines represent the region of equivalence.

control group Comparison control group Number of treatments placebo Number of treatments control Total duration of treatment weeks Outcome measures available Follow ups weeks
Study (

year) Primary MSK condition Study design Sample size placebo Sample size control Age placebo (SD) Age control (SD) Percentage female placebo Percentage female control Duration of complaints
T A B L E 1(Continued)

Placebo control group Comparison control group Number of treatments placebo Number of treatments control Total duration of treatment weeks Outcome measures available Follow ups weeks
Study (

year) Primary MSK condition Study design Sample size placebo Sample size control Age placebo (SD) Age control (SD) Percentage female placebo Percentage female control Duration of complaints
T A B L E 1(Continued)