1. Top of page
  2. Abstract


Identifying factors that predict who is likely to gain the greatest benefit from different treatments for low back pain is an important research priority. Here we report moderator analyses of the Back Skills Training Trial (BeST) that tested a cognitive–behavioral approach for low back pain.


We recruited 701 participants ages ≥18 years with at least moderately troublesome low back pain present for >6 weeks from 56 general practices in 7 localities across England to a trial adding a group cognitive–behavioral approach to active management advice. The cognitive–behavioral package had a moderate effect on primary outcomes (Roland Morris Disability Questionnaire [RMDQ] and modified Von Korff scales). At 12-month followup, we tested for interaction between randomized groups on 2 prespecified baseline variables (troublesomeness and fear avoidance) and 10 post hoc (exploratory) variables identified from previous studies.


Neither troublesomeness nor fear avoidance moderated treatment effect on any of our primary outcomes. In the final model, the only moderation by baseline variables of the effect of randomization was on the RMDQ outcome. Being younger and currently working both moderated treatment effect, resulting in larger improvements as a response to treatment.


Although BeST is one of the larger trials of back pain treatment, it is still too small to reliably detect moderation if it exists. Since the significant moderation effects were only observed for 1 outcome measure in 3 of 10 post-hoc analyses, we cannot conclude that these are true moderation effects.


  1. Top of page
  2. Abstract

Several therapist-delivered intervention packages are effective and cost effective for chronic nonspecific low back pain. Treatment packages of acupuncture needling, exercise, and manual therapy are recommended for the early management of persistent back pain by the National Institute of Health and Clinical Excellence guidelines (1). We have added to these data by demonstrating that a group cognitive–behavioral approach delivered by a range of health professionals has sustained the effect on low back pain disability, at a very modest cost to the health care provider (Back Skills Training Trial n = 701) (2). Most of these interventions produce small to moderate mean benefits that are likely to be important at a population level, but that may not be important to an individual patient. Identifying moderators (effect modifiers), factors that predict who is likely to gain the greatest benefit from different treatments for low back pain, is an important research priority because it will allow us to deliver the best treatment for an individual patient (1, 3, 4). There are few randomized studies of moderators of treatments for nonspecific low back pain and even fewer that have used prospectively defined moderators (5). Here we report prespecified moderator analyses and further exploratory moderator analyses of the Back Skills Training Trial data.

Significance & Innovations

  • Baseline characteristics were not shown to predict response to a cognitive–behavioral approach to treatment of low back pain in secondary analysis of a large randomized controlled trial.

  • No evidence to support selective provision of such services was found.

  • Future research into back pain subgroups may need to use different approaches.


  1. Top of page
  2. Abstract

Trial design.

We have reported the design, intervention, and main analyses of the Back Skills Training Trial in detail elsewhere (2, 6–8). Briefly, we recruited 701 participants ages 18 years or older with at least moderately troublesome nonspecific low back pain present for greater than 6 weeks from 56 general practices in 7 localities across England. We randomized these, 2:1 in favor of the intervention, using a remote telephone randomization service. All baseline variables were collected using a self-completed questionnaire prior to randomization.

All participants received a 15-minute session of active management advice, including the benefit of and how to remain active, avoidance of bed rest, appropriate use of pain medication and symptom management, and a copy of The Back Book (9). Those in the intervention group were offered an individual assessment lasting up to 1.5 hours and 6 sessions of group therapy using a cognitive–behavioral approach lasting 1.5 hours per session.

Our primary outcomes, the Roland Morris Disability Questionnaire (RMDQ; scale 0–24, where lower scores indicate less severe disability) and modified von Korff (MVK) scales of pain and disability (scale 0–100%, where lower scores indicate less pain and disability), were measured at 12 months using a self-administered postal questionnaire (10, 11). Nonresponders to the postal questionnaire received a telephone interview to collect core outcome data. Collection of the RMDQ data was not included in the telephone interview process due to its length.

We found that the intervention package was moderately effective at 12 months based on the 598 subjects who responded at 12 months (2) (Table 1). To reduce hazards from making multiple comparisons, we have not included 3- and 6-month followup data in this secondary analysis.

Table 1. Main effect of BeST intervention package*
 Mean treatment difference (95% CI)
  • *

    95% CI = 95% confidence interval; RMDQ = Roland Morris Disability Questionnaire; MVK = modified Von Korff.

  • Range 0–24.

  • Range 0–100.

All participants at 12-month followup 
 RMDQ (n = 498)1.3 (0.56, 2.06)
 MVK disability (n = 552)8.4 (4.47, 12.32)
 MVK pain (n = 583)7.0 (3.12, 10.81)
Participants with pain for >3 months at randomization 
 3-month followup 
  RMDQ (n = 509)1.0 (0.38, 1.69)
  MVK disability (n = 515)4.3 (0.45, 8.17)
  MVK pain (n = 534)6.6 (3.21, 10.08)
 6-month followup 
  RMDQ (n = 524)1.4 (0.64, 2.18)
  MVK disability (n = 547)8.1 (4.25, 12.00)
  MVK pain (n = 565)7.9 (4.11, 11.61)
 12-month followup 
  RMDQ (n = 494)1.3 (0.53, 2.02)
  MVK disability (n = 548)8.2 (4.24, 12.16)
  MVK pain (n = 579)6.8 (2.92, 10.57)

Potential effect moderators.

At the design stage of the trial, we prespecified 3 hypotheses (confirmatory analyses) (12). First, the benefits from treatment would be greater in those with more troublesome low back pain (moderately versus very/extremely troublesome). We measured this during the participants' initial assessment prior to randomization using a 5-point Likert scale from not at all troublesome to extremely troublesome, with participants only being included if they reported low back pain of at least moderate troublesomeness (13, 14). Those with more troublesome pain have a greater capacity for improvement and arguably have more to gain from additional improvements from treatment. Assessing troublesomeness of pain is an attractive and simple approach that could be used in the consulting room to decide which patients should be considered for further treatment. Second, the intervention would have a greater effect in those with high levels of fear avoidance. Fear avoidance is well established as a potentially modifiable predictor of poor outcome from back pain (15). Our intervention was designed to target fear avoidance among a range of unhelpful beliefs about back pain and to promote physical activity. We measured this using the Fear-Avoidance Beliefs Questionnaire (FABQ) at baseline (16). Third, the benefits of treatment would be greater for those with subacute low back pain (<3 months) than those with chronic low back pain. We were unable to perform this analysis because too few participants had subacute pain.

At baseline, we collected data on other potential predictors or moderators of treatment effect (Table 2). These included demographic information on: 1) whether or not participants received state benefits, 2) depression and anxiety using the Hospital Anxiety and Depression Scale (HADS), 3) self-efficacy using the Pain Self-Efficacy Questionnaire, and 4) health-related quality of life using the physical and mental health component scores of the Short Form 12 (SF-12) (17–19). We dichotomized potential moderators that were continuous variables to facilitate the analyses using cut points available in the literature. We used a cut point of <14 for the FABQ and <11 for both the anxiety and depression components of the HADS (17, 20). For all other continuous outcomes where generally accepted cut points were not available, we used a median cut point (21).

Table 2. Demographic characteristics and outcome measures at baseline of the sample providing followup at 12 months*
 Control (advice only) (n = 199)Advice plus cognitive–behavioral intervention (n = 399)Total (n = 598)
  • *

    HADS = Hospital Anxiety and Depression Scale; RMDQ = Roland Morris Disability Questionnaire; MVK = modified Von Korff.

  • Scale 0–24; a lower score indicates lower fear-avoidance beliefs.

  • Scale 0–21; a lower score indicates less anxiety and depression.

  • §

    Scale 0–60; a lower score indicates lower self-efficacy.

  • Scale 0–24; lower scores indicate less severe disability.

  • #

    Scale 0–100%; lower scores indicate less pain and disability.

Age, years   
 Mean ± SD54.2 ± 14.4653.7 ± 14.3853.8 ± 14.39
 Missing, no.011
 Male, no. (%)77 (38.7)164 (41.1)241 (40.3)
 Female, no. (%)121 (60.8)235 (58.9)356 (59.5)
 Missing, no.101
Ethnicity, no. (%)   
 White175 (88.0)350 (87.7)525 (87.8)
 Asian or Asian British6 (3.0)18 (4.5)24 (4.0)
 Black or black British3 (1.5)6 (1.5)9 (1.5)
 Chinese or other4 (2.0)5 (1.3)9 (1.5)
 Missing11 (5.5)20 (5.0)31 (5.2)
Left full-time education, no. (%)   
 Age ≤16 years101 (50.8)225 (56.4)326 (54.5)
 Age >16 years90 (45.2)154 (38.6)244 (40.8)
 Missing8 (4.0)20 (5.0)28 (4.7)
Employed, no. (%)   
 Not employed103 (51.8)192 (48.1)295 (49.3)
 Employed95 (47.7)206 (51.6)301 (50.3)
 Missing1 (0.50)1 (0.3)2 (0.4)
Frequency of back pain (past 6 weeks), no. (%)   
 Comes and goes + getting better49 (24.6)98 (24.6)147 (24.6)
 Fairly constant + getting worse148 (74.4)299 (74.9)447 (74.7)
 Missing2 (1.0)2 (0.5)4 (0.7)
Troublesomeness, no. (%)   
 Moderately troublesome113 (56.8)214 (53.6)327 (54.7)
 Very/extremely troublesome86 (43.2)185 (46.4)271 (45.3)
Fear avoidance   
 Mean ± SD13.9 ± 6.3513.4 ± 6.4713.5 ± 6.43
 Missing, no.122335
Duration of back pain, years since first onset   
 Mean ± SD13.1 ± 12.1613.1 ± 13.2313.1 ± 12.87
 Missing, no.51823
Benefits, no. (%)   
 No benefits164 (82.4)327 (81.9)491 (82.1)
 Benefits33 (16.6)65 (16.3)98 (16.4)
 Missing2 (1.0)7 (1.8)9 (1.5)
HADS anxiety   
 Mean ± SD7.5 ± 4.438.1 ± 4.297.9 ± 4.34
 Missing, no.3912
HADS depression   
 Mean ± SD5.3 ± 3.475.9 ± 3.755.7 ± 3.66
 Missing, no.235
Pain self-efficacy§   
 Mean ± SD42.1 ± 11.8239.9 ± 13.4940.6 ± 13.00
 Missing, no.91221
 Mean ± SD8.2 ± 4.498.7 ± 5.018.5 ± 4.84
 Missing, no.000
MVK disability#   
 Mean ± SD45.8 ± 23.4147.8 ± 23.8647.1 ± 23.71
 Missing, no.31013
MVK pain#   
 Mean ± SD58.3 ± 18.7258.7 ± 19.1758.6 ± 19.01
 Missing, no.134

Statistical analysis.

The analysis was on an intent-to-treat basis. We analyzed our primary outcome measures as change from baseline (baseline minus followup). All statistical tests were 2-sided and statistical significance was assessed at the 5% level for the univariate and exploratory analyses only. For the prespecified subgroup analyses, the significance level was adjusted for multiple comparisons using the Bonferroni correction and was therefore assessed at the 2.5% level, i.e., α = 0.025 (22). All of the analyses were carried out using Stata, version 10.1.

Initially, we did an exploratory univariate analysis of all participant demographics and baseline outcomes using simple linear regression in order to identify potential predictors of 12-month outcome. Although identifying predictors of outcome was not our primary focus, we report them here for completeness. We fitted linear regression models for each of the primary outcome measures (change from baseline) with the inclusion of an interaction term to directly examine whether the treatment difference depends on the moderators, both prespecified and exploratory. A statistical test for interaction is the most appropriate method to evaluate and draw inferences from subgroup analyses (23–25). We fitted the following models: 1) model 1, unadjusted model: with treatment assignment, moderator, and the interaction term of these 2 variables; 2) model 2, adjusted model: with treatment assignment, moderator, and the interaction term of these 2 variables adjusted for age, sex, and baseline value of the dependent variable; and 3) model 3, adjusted model: with treatment assignment, moderator, and the interaction term of these 2 variables adjusted for baseline and demographic covariates selected using a forward stepwise variable-selection algorithm with a stringent significance level of 0.01 due to multiple testing to determine whether a variable is added to or removed from the model (26).

The unadjusted model was a comparator for the adjusted models to see whether covariate adjustment altered the conclusions of the analyses. The second model adjusted for clinically relevant factors that were prespecified in the study protocol. Selection of which covariates to adjust for can be problematic in subgroup analyses; therefore, baseline factors that predict outcome can be considered using an appropriate statistical selection procedure (27). For this reason, the final model was fitted using a forward stepwise selection procedure to identify covariates that are predictors of outcome to put into the model. Where the model adjusts for a baseline variable that is also a potential moderator (e.g., age ≤54 years versus age >54 years), we excluded that variable from the model to avoid any issues of colinearity.

For the benefit of those doing systematic reviews of treatments for chronic low back pain, we also present here the main analysis just for those reporting pain present for 3 months or more at randomization.


  1. Top of page
  2. Abstract

We obtained 12-month followup data on 598 (85%) of 701 of our participants (199 control and 399 intervention). Demographic and outcome measure data collected at baseline were well balanced across both arms for the sample providing 12-month followup (Table 2). The number of participants contributing to the univariate analyses ranged from 456 for the SF-12 to 583 for troublesomeness and MVK pain. Initial univariate analyses showed a consistent pattern that age, employment, benefits, and MVK disability score were all predictors of outcome in all 3 outcome measures. Troublesomeness, duration, baseline RMDQ, and MVK pain predicted outcome in some, but not all, outcome measures. None of the other baseline variables, including the FABQ, showed any association with outcome (Table 3).

Table 3. Univariate analyses to identify potential baseline predictors of outcome at 12 months of followup*
 RMDQ change from baselineMVK disability change from baselineMVK pain change from baseline
PDifference (95% CI)No.PDifference (95% CI)No.PDifference (95% CI)No.
  • *

    RMDQ = Roland Morris Disability Questionnaire; MVK = modified Von Korff; difference = mean effect size of the interaction term; 95% CI = 95% confidence interval; FABQ = Fear-Avoidance Belief Questionnaire; HADS = Hospital Anxiety and Depression Scale; SF-12 = Short Form 12.

Baseline variables of subgroups specified in protocol         
 FABQ (positive values favor those with greater fear-avoidance beliefs) (16)0.940.002 (−0.06, 0.06)4700.480.13 (−0.23, 0.48)5230.19−0.20 (−0.50, 0.10)551
 Troublesomeness of back pain (positive values indicate better outcome for those with greater troublesomeness) (13, 14)0.020.90 (0.16, 1.64)4980.133.38 (−1.04, 7.80)5520.024.59 (0.89, 8.30)583
Baseline variables of other subgroups         
 Duration of back pain, years (positive values favor those with a longer duration of back pain)0.04−0.03 (−0.06, −0.01)4780.03−0.20 (−0.37, −0.02)5310.11−0.12 (−0.27, 0.03)560
 Age, years (positive values favor those who are older)0.04−0.03 (−0.05, −0.01)498< 0.001−0.32 (−0.48, −0.17)5510.003−0.20 (−0.32, −0.07)582
 Sex (positive values indicate better outcome for women)0.580.21 (−0.54, 0.97)4970.064.37 (−0.10, 8.83)5510.880.29 (−3.50, 4.07)582
 Age left full-time education (positive values indicate better outcome for those who left education at an older age or are still in education)0.090.42 (−0.07, 0.91)4750.211.88 (−1.03, 4.78)5270.411.01 (−1.41, 3.43)558
 Frequency of back pain, past 6 weeks (positive values favor those with improved frequency of back pain)0.86−0.05 (−0.58, 0.49)4940.202.05 (−1.10, 5.21)5480.980.02 (−2.63, 2.68)581
 In employment (positive values indicate better outcome for those in employment)< 0.0011.45 (0.72, 2.18)496< 0.0019.35 (5.00, 13.70)550< 0.0016.97 (3.30, 10.64)581
 Benefits (positive values indicate better outcome for those with no benefits)0.021.19 (0.17, 2.22)4900.046.31 (0.42, 12.19)5450.016.81 (1.81, 11.81)574
 HADS anxiety (positive values favor those with anxiety symptoms) (17)0.890.01 (−0.08, 0.09)4900.310.26 (−0.25, 0.78)5410.54−0.13 (−0.56, 0.30)571
 HADS depression (positive values favor those with depressive symptoms) (17)0.970.002 (−0.10, 0.11)4930.290.33 (−0.28, 0.94)5490.32−0.26 (−0.76, 0.25)578
 Pain self-efficacy (positive values favor those with stronger self-efficacy beliefs) (18)0.320.02 (−0.01, 0.05)4800.74−0.03 (−0.20, 0.14)5360.090.12 (−0.02, 0.27)564
Other baseline variables tested for inclusion in final model         
 RMDQ (range 0–24, where lower scores indicate less severe disability)< 0.0010.22 (0.15, 0.30)4980.360.21 (−0.25, 0.67)5520.97−0.01 (−0.39, 0.37)583
 MVK disability (range 0–100, where lower scores indicate less disability) (11)< 0.0010.03 (0.01, 0.04)485< 0.0010.52 (0.44, 0.60)552< 0.0010.15 (0.08, 0.23)572
 MVK pain (range 0–100, where lower scores indicate less pain) (11)0.060.02 (−0.01, 0.04)494< 0.0010.24 (0.13, 0.36)549< 0.0010.32 (0.23, 0.42)583
 Ethnicity (UK Census categories; positive values indicate better outcome for those from a nonwhite background)0.540.22 (−0.49, 0.94)4740.790.51 (−3.28, 4.30)5230.960.09 (−3.08, 3.25)553
 SF-12 physical component score (positive values favor those with better physical quality of life) (19)0.800.01 (−0.03, 0.04)4560.75−0.04 (−0.28, 0.20)5100.130.15 (−0.04, 0.35)538
 SF-12 mental component score (positive values favor those with better mental quality of life) (19)0.57−0.01 (−0.04, 0.02)4560.16−0.15 (−0.35, 0.06)5100.970.003 (−0.17, 0.17)538

Models including interaction term between randomized group and moderator.

The goodness of fit was assessed for models 1, 2, and 3 using the adjusted R2 statistic. This statistic can take on any value less than or equal to 1, with a value closer to 1 indicating a better fit. If the model contains terms that do not aid in predicting response, then negative values can occur. The adjusted R2 values for model 1 (unadjusted model: treatment + moderator + interaction) ranged from 0.02 to 0.06, for model 2 (adjusted for age, sex, and baseline) ranged from 0.09 to 0.33, and for model 3 (adjusted for significant predictors from stepwise selection) ranged from 0.16 to 0.43. Clearly, model 3 explained the most variance and was therefore chosen as our final model.

In our third model based on the stepwise selection process, we adjusted for different baseline variables for each of our 3 primary outcomes. Only for the RMDQ did any interactions reach statistical significance: age and employment status. Being younger and employed were all statistically significant moderators for gaining additional benefit, as measured by the RMDQ, from treatment. This means, for example, that on average those who were employed gained an additional benefit from the treatment of 1.89 (95% confidence interval 0.43, 3.35) points in the RMDQ compared to those who were not employed (Table 4). These effects were not seen with either the MVK disability or MVK pain.

Table 4. Multivariate analyses of outcomes at 12 months of followup (n = 598)*
SubgroupModel 3
RMDQMVK disability§MVK pain
  • *

    Values are the mean estimate of the interaction term (95% confidence interval) of the effect of the treatment difference between subgroups unless otherwise indicated. RMDQ = Roland Morris Disability Questionnaire; MVK = modified Von Korff; included % = percentage of 598 subjects included in this analysis; HADS = Hospital Anxiety and Depression Scale.

  • The direction of the analyses is interpreted as presented, i.e., the former subgroup minus the latter.

  • Adjusted for baseline RMDQ, employed, pain self-efficacy, and benefits.

  • §

    Adjusted for baseline MVK disability, baseline MVK pain, baseline Short Form 12 (SF-12) physical, baseline SF-12 mental, and pain self-efficacy.

  • Adjusted for baseline MVK pain, pain self-efficacy, and benefits.

Prespecified subgroup analyses   
 Fear avoidance, included % (R2)75 (0.21)80 (0.43)89 (0.18)
  ≥14 to <14−0.01 (−1.53, 1.50)−2.56 (−10.35, 5.23)2.18 (−5.35, 9.71)
 Troublesomeness, included % (R2)79 (0.21)83 (0.42)93 (0.17)
  Very/extremely to moderately−1.01 (−2.52, 0.50)−4.42 (−12.11, 3.27)−5.04 (−12.47, 2.40)
Exploratory subgroup analyses   
 Duration, included % (R2)76 (0.22)81 (0.42)90 (0.17)
  ≥3 years to <3 years0.14 (−1.55, 1.83)−3.00 (−11.59, 5.60)−3.46 (−11.81, 4.89)
 Age, included % (R2)79 (0.21)83 (0.42)93 (0.18)
  ≥54 years to <54 years−1.58 (−3.05, −0.12)−1.67 (−9.28, 5.94)−3.00 (−10.35, 4.35)
 Sex, included % (R2)78 (0.21)83 (0.43)93 (0.17)
  Female to male−1.27 (−2.79, 0.25)−3.59 (−11.30, 4.12)−3.27 (−10.83, 4.28)
 Left full-time education, included % (R2)75 (0.21)79 (0.42)89 (0.17)
  Age >16 years to age ≤16 years1.29 (−0.24, 2.82)3.01 (−4.90, 10.92)4.15 (−3.47, 11.77)
 Frequency of back pain, included % (R2)78 (0.21)83 (0.43)93 (0.18)
  Comes and goes + getting better to fairly constant + getting worse−0.12 (−1.81, 1.57)2.81 (−5.73, 11.35)0.15 (−8.25, 8.55)
 Benefits, included % (R2)79 (0.21)82 (0.43)93 (0.17)
  Benefits to no benefits0.32 (−1.67, 2.31)−5.56 (−15.94, 4.81)−0.22 (−10.06, 9.62)
 Employed, included % (R2)79 (0.22)83 (0.43)93 (0.18)
  Employed to not employed1.89 (0.43, 3.35)3.16 (−4.44, 10.75)5.01 (−2.33, 12.34)
 HADS anxiety, included % (R2)78 (0.21)82 (0.43)92 (0.17)
  ≥11 to <11−1.12 (−2.83, 0.58)−2.15 (−10.97, 6.67)−2.41 (−10.83, 6.01)
 HADS depression, included % (R2)78 (0.22)83 (0.43)93 (0.17)
  ≥11 to <11−2.07 (−4.79, 0.65)−14.58 (−29.19, 0.03)−4.82 (−17.98, 8.33)
 Pain self-efficacy, included % (R2)79 (0.19)83 (0.42)93 (0.16)
  ≥42 to <42−0.15 (−1.65, 1.36)2.05 (−5.67, 9.78)0.60 (−6.91, 8.11)

No statistically significant moderation effects were observed for the prespecified subgroup analyses in either of the models; however, some significant moderators of treatment effect were observed in the exploratory subgroup analyses.


  1. Top of page
  2. Abstract

This analysis contributes to the body of research seeking to identify subgroups of patients with low back pain who are likely to achieve greater benefit from particular treatments. Particular strengths are that the analysis is based on a large well-conducted randomized controlled trial and that we were able to do confirmatory analyses on 2 variables, fear avoidance and troublesomeness, that were prespecified before the trial started. The study was not, however, originally powered for these confirmatory analyses. Because we had a prespecified duration of back pain as a subgroup, we have included an analysis of pain of more or less than 3 years since onset as an exploratory analysis; however, this may not be a meaningful distinction in the clinical situation. The remainder of our subgroups is known sociodemographic prognostic indicator variables or psychological factors that might moderate the effect of a cognitive–behavioral approach. This reduces the chances of finding spurious positive results purely by chance.

With 598 participants included in this analysis, this is one of the largest studies of moderators of treatment for nonspecific low back pain. Witt et al included substantially more participants (n = 3,093), the UK Back Pain Exercise and Manipulation Trial (UK BEAM) included nearly twice as many participants (n = 1,116), and Sherman et al included a similar number of participants (n = 638). In these latter 2 cases, the participants were split between 4 treatment groups, meaning that the number included in each comparison was much smaller (28–30). Nevertheless, our statistical power to identify moderators is poor (31). Consequentially, we present only the point estimate and 95% confidence interval for each interaction in preference to the P value (32). In our study, we had sufficient power to detect a between-subgroup standardized mean difference ranging from 0.2 to 0.3 in the primary outcomes.

We make a clear distinction between confirmatory prespecified subgroup analyses and exploratory subgroup analyses, where the former is used for hypothesis testing and the latter is hypothesis generating (33, 34). We used the Bonferroni correction to adjust for multiple testing for the prespecified subgroup analyses only. Had the significance level not been adjusted, the conclusions drawn still would have been the same. For our exploratory analyses, we did not make this correction to ensure that we did identify variables worthy of further exploration. Initially we identified potential predictors of outcome using univariate analyses, followed by a more formal approach of forward stepwise selection (model 3). The stepwise selection process does not identify the same predictors as that from the univariate analyses, as some covariates either become significant or insignificant in the presence of other covariates during the modeling process (35).

Our subgroup analyses considered 3 models: an unadjusted model and 2 adjusted models. The first of the adjusted models (model 2) adjusted for clinically relevant covariates that were prespecified in the protocol. The focus of this model was to try and estimate subgroup effects, correcting for relevant predictors as presented in the literature. The second of the adjusted models (model 3) contained statistically significant predictors of outcome selected using forward stepwise selection, where the choice of covariates in these models was data driven. Based on this particular study, the latter adjusted model (model 3) offered more precision when estimating the subgroup effects compared to the former adjusted model (model 2). Because of our concerns about the scaling and sensitivity to change of the RMDQ, we included MVK disability and pain scores as additional primary outcomes. Consequentially, this analysis included 36 individual comparisons increasing the risk of any statistically significant interactions being chance findings. Such statistically significant interactions that we have found are not consistent across the 3 outcome measures, and thus caution is needed in their interpretation. This does raise for us a slight concern that positive findings in previous studies might not have been consistent if different outcome measures had been used.

In common with other studies, even using a model fitted using a selection procedure, the proportion of the variance in outcome explained by baseline variables is modest. This is greatest for a pure disability measure (MVK disability), least for a pure pain measure (MVK pain), and intermediate for a mixed measure (RMDQ).

Troublesomeness was identified as a potential predictor of outcome in 2 of our 3 primary outcome measures in the univariate analyses, but was not a predictor in the multivariate analyses (model 3) and did not moderate treatment effect. All of the Back Skills Training Trial participants had at least moderately troublesome low back pain. It is possible, had we also included subjects with slightly troublesome pain, that we would have observed a difference, but knowing whether or not our intervention works for those who are only slightly troubled by their back pain may not be a high research priority.

Fear avoidance did not predict outcome in either our univariate analyses or multivariate analyses (model 3). Also, it did not moderate treatment effect in any of our 3 outcomes. Our participants had all consulted for low back pain in the previous 6 months and had significant ongoing problems. They might, however, not be the same population as those currently attending for treatment who will tend to have more severe symptoms (36, 37).

Only for the RMDQ is there any statistically significant moderation of outcome in our final model (model 3). If it was not for our concerns about the measurement properties of the RMDQ, we would only have these data to consider. Since similar effects were not seen in the MVK disability and MVK pain scores, these are unlikely to be true moderation effects. If these were true results, this would suggest that we focused our efforts on the younger population that was currently working and not on the older unemployed population.

We have presented the data here just for those with chronic low back pain. We make no further comment on these data, as they are not the focus of this study but will be of use to others.

There is considerable research interest in identifying back pain subgroups. This analysis, in common with secondary analyses of the UK BEAM data set and a trial of acupuncture by Sherman et al, have failed to find convincing data to suggest that subgroups can be identified in existing trial data (29, 30). Together they have considered a range of potential moderators for 5 different treatment packages in rigorous analyses. As a rule of thumb to show an interaction between a potential moderator and treatment effect of a similar size to the main treatment effect, a 4-fold increase in sample size is required (31). The size of both of these studies was based on finding a main treatment effect rather than a moderation of such an effect. To our knowledge, only 1 trial has been explicitly powered to show moderator effects (n = 3,093) (28). In that trial, Witt et al found that acupuncture was more effective for those with worse initial back function, younger patients, and those with >10 years of schooling (28). Although we consider we have used the most appropriate statistical approach to our data, this approach may not yield positive results unless there are resources to run more trials of a similar size to the acupuncture trial by Witt et al. It is, therefore, perhaps surprising that some other studies have found apparently important effects on much smaller numbers. Great care is needed in interpretation of data from such studies. We now have a number of treatments of proven modest effectiveness. Any future studies should, therefore, need to compare 2 active treatments. The mean differences between the 2 treatments are likely to be much smaller than those comparing an active treatment to no treatment. To show a main effect will require a trial substantially larger than the Back Skills Training Trial; to show a statistically significant interaction, the sample size will need to be multiplied further. Any such trial would only test moderators as a single comparison between 2 treatments. It is unlikely that the very substantial funding needed for many such trials will be forthcoming and that any further research on subgrouping for those with nonspecific low back pain will need to consider adopting different approaches.

We suggest 2 alternative approaches that might possibly be more fruitful. First, there are now many thousands of individuals with back pain who have been recruited to randomized controlled trials. If the research community was to collaborate to develop a repository of individual patient data it may be possible, with the large number of subjects available, to develop statistical techniques that would allow moderators to be identified and clinical prediction rules to be developed (38). The acupuncture research community is already making progress in this direction (30). The back pain research community also needs to measure potential moderators and outcomes in a similar manner that is congruent with existing suggestions to facilitate such pooling (39–41).

Second, the research community should work together to develop some theoretically informed descriptors of back pain syndromes, which may include subject and clinical characteristics, that may respond to specific treatment approaches and then test the interventions in people meeting these criteria. The headache research community, for example, has developed a largely clinical classification of more than 200 different headache types that are now used to inform entry criteria for trials and clinical management without seeking to prove statistically that any one patient characteristic predicts response to treatment (42).

A robust secondary analysis of a large trial of a cognitive–behavioral approach did not identify baseline characteristics that modify treatment effects, allowing only a 0.2 to 0.3 between-subgroup standardized mean difference to be detected in the primary outcomes. Much larger studies would be needed to be confident that important moderators had not been overlooked; it is unlikely that such studies will appeal to funders. New research approaches are needed to confidently identify back pain subgroups.


  1. Top of page
  2. Abstract

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Underwood had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Underwood, Mistry, Lall, Lamb.

Acquisition of data. Underwood, Lall, Lamb.

Analysis and interpretation of data. Underwood, Mistry, Lall, Lamb.


  1. Top of page
  2. Abstract
  • 1
    Savigny P, Watson P, Underwood M. Early management of persistent non-specific low back pain: summary of NICE guidance. BMJ 2009; 338: b1805.
  • 2
    Lamb SE, Hansen Z, Lall R, Castelnuovo E, Withers EJ, Nichols V, et al. Group cognitive behavioural treatment for low-back pain in primary care: a randomised controlled trial and cost-effectiveness analysis. Lancet 2010; 375: 91623.
  • 3
    Borkan JM, Koes B, Reis S, Cherkin DC. A report from the Second International Forum for Primary Care Research on Low Back Pain: reexamining priorities. Spine (Phila Pa 1976) 1998; 23: 19926.
  • 4
    Kraemer HC, Stice E, Kazdin A, Offord D, Kupfer D. How do risk factors work together? Mediators, moderators, and independent, overlapping, and proxy risk factors. Am J Psychiatry 2001; 158: 84856.
  • 5
    Kamper SJ, Maher CG, Hancock MJ, Koes BW, Croft PR, Hay E. Treatment-based subgroups of low back pain: a guide to appraisal of research studies and a summary of current evidence. Best Pract Res Clin Rheumatol 2010; 24: 18191.
  • 6
    Lamb SE, Lall R, Hansen Z, Withers EJ, Griffiths FE, Szczepura A, et al. Design considerations in a clinical trial of a cognitive behavioural intervention for the management of low back pain in primary care: Back Skills Training Trial. BMC Musculoskelet Disord 2007; 8: 14.
  • 7
    Hansen Z, Daykin A, Lamb SE. A cognitive-behavioural programme for the management of low back pain in primary care: a description and justification of the intervention used in the Back Skills Training Trial (BeST; ISRCTN 54717854). Physiotherapy 2010; 96: 8794.
  • 8
    Lamb SE, Lall R, Hansen Z, Castelnuovo E, Withers EJ, Nichols V, et al. A multicentred randomised controlled trial of a primary care-based cognitive behavioural programme for low back pain: the Back Skills Training (BeST) trial. Health Technol Assess 2010; 14: 1253.
  • 9
    Roland M, Waddell G, Klaber Moffett J, Burton K, Main C. The back book. 2nd ed. Norwich: The Stationery Office; 2002.
  • 10
    Roland M, Morris R. A study of the natural history of back pain. Part I: development of a reliable and sensitive measure of disability in low-back pain. Spine (Phila Pa 1976) 1983; 8: 1414.
  • 11
    Underwood MR, Barnett AG, Vickers MR. Evaluation of two time-specific back pain outcome measures. Spine (Phila Pa 1976) 1999; 24: 110412.
  • 12
    Kent P, Keating JL, Leboeuf-Yde C. Research methods for subgrouping low back pain. BMC Med Res Methodol 2010; 10: 62.
  • 13
    United Kingdom back pain exercise and manipulation (UK BEAM) randomised trial: effectiveness of physical treatments for back pain in primary care. BMJ 2004; 329: 1377.
  • 14
    Parsons S, Carnes D, Pincus T, Foster N, Breen A, Vogel S, et al. Measuring troublesomeness of chronic pain by location. BMC Musculoskelet Disord 2006; 7: 34.
  • 15
    Pincus T, Vogel S, Burton AK, Santos R, Field AP. Fear avoidance and prognosis in back pain: a systematic review and synthesis of current evidence. Arthritis Rheum 2006; 54: 39994010.
  • 16
    Waddell G, Newton M, Henderson I, Somerville D, Main CJ. A Fear-Avoidance Beliefs Questionnaire (FABQ) and the role of fear-avoidance beliefs in chronic low back pain and disability. Pain 1993; 52: 15768.
  • 17
    Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale. Acta Psychiatr Scand 1983; 67: 36170.
  • 18
    Nicholas MK. The Pain Self-Efficacy Questionnaire: taking pain into account. Eur J Pain 2007; 11: 15363.
  • 19
    Ware J Jr, Kosinski M, Keller SD. A 12-item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care 1996; 34: 22033.
  • 20
    Cleland JA, Childs JD, Fritz JM, Whitman JM, Eberhart SL. Development of a clinical prediction rule for guiding treatment of a subgroup of patients with neck pain: use of thoracic spine manipulation, exercise, and patient education. Phys Ther 2007; 87: 923.
  • 21
    Altman DG, Royston P. The cost of dichotomising continuous variables. BMJ 2006; 332: 1080.
  • 22
    Lagakos SW. The challenge of subgroup analyses: reporting without distorting. N Engl J Med 2006; 354: 16679.
  • 23
    Brookes ST, Whitley E, Peters TJ, Mulheran PA, Egger M, Davey Smith G. Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives. Health Technol Assess 2001; 5: 156.
  • 24
    Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet 2000; 355: 10649.
  • 25
    Matthews JN, Altman DG. Interaction 3: how to examine heterogeneity. BMJ 1996; 313: 862.
  • 26
    Petrie A, Sabin C. Medical statistics at a glance. 3rd ed. Chichester (UK): Wiley-Blackwell; 2009.
  • 27
    Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med 2002; 21: 291730.
  • 28
    Witt CM, Jena S, Selim D, Brinkhaus B, Reinhold T, Wruck K, et al. Pragmatic randomized trial evaluating the clinical and economic effectiveness of acupuncture for chronic low back pain. Am J Epidemiol 2006; 164: 48796.
  • 29
    Underwood MR, Morton V, Farrin A. Do baseline characteristics predict response to treatment for low back pain? Secondary analysis of the UK BEAM dataset [ISRCTN32683578]. Rheumatology (Oxford) 2007; 46: 1297302.
  • 30
    Sherman KJ, Cherkin DC, Ichikawa L, Avins AL, Barlow WE, Khalsa PS, et al. Characteristics of patients with chronic back pain who benefit from acupuncture. BMC Musculoskelet Disord 2009; 10: 114.
  • 31
    Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, Peters TJ. Subgroup analyses in randomized trials: risks of subgroup-specific analyses; power and sample size for the interaction test. J Clin Epidemiol 2004; 57: 22936.
  • 32
    Matthews JN, Altman DG. Statistics notes. Interaction 2: compare effect sizes not P values. BMJ 1996; 313: 808.
  • 33
    Kent DM, Rothwell PM, Ioannidis JP, Altman DG, Hayward RA. Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal. Trials 2010; 11: 85.
  • 34
    Pincus T, Miles C, Froud R, Underwood M, Carnes D, Taylor S. Methodological criteria for the assessment of moderators in systematic reviews of randomised controlled trials: a consensus study. BMC Med Res Methodol 2011; 11: 14.
  • 35
    Tu YK, Gunnell D, Gilthorpe MS. Simpson's paradox, Lord's paradox, and suppression effects are the same phenomenon: the reversal paradox. Emerg Themes Epidemiol 2008; 5: 2.
  • 36
    Poiraudeau S, Rannou F, Baron G, Le Henanff A, Coudeyre E, Rozenberg S, et al. Fear-avoidance beliefs about back pain in patients with subacute low back pain. Pain 2006; 124: 30511.
  • 37
    George SZ, Fritz JM, Childs JD. Investigation of elevated fear-avoidance beliefs for patients with low back pain: a secondary analysis involving patients enrolled in physical therapy clinical trials. J Orthop Sports Phys Ther 2008; 38: 508.
  • 38
    Stewart LA, Parmar MK. Meta-analysis of the literature or of individual patient data: is there a difference? Lancet 1993; 341: 41822.
  • 39
    Bombardier C. Outcome assessments in the evaluation of treatment of spinal disorders: summary and general recommendations. Spine 2000; 25: 31003.
  • 40
    Deyo RA, Battie M, Beurskens AJ, Bombardier C, Croft P, Koes B, et al. Outcome measures for low back pain research: a proposal for standardized use [published erratum appears in Spine 1999;24:418]. Spine 1998; 23: 200313.
  • 41
    Vickers AJ, Cronin AM, Maschino AC, Lewith G, MacPherson H, Victor N, et al. Individual patient data meta-analysis of acupuncture for chronic pain: protocol of the Acupuncture Trialists' Collaboration. Trials 2010; 11: 90.
  • 42
    The International Classification of Headache Disorders: 2nd edition. Cephalalgia 2004; 24 Suppl: 9160.