Pain and Symptoms
Predicting response to a cognitive–behavioral approach to treating low back pain: Secondary analysis of the BeST data set†
Version of Record online: 29 AUG 2011
Copyright © 2011 by the American College of Rheumatology
Arthritis Care & Research
Volume 63, Issue 9, pages 1271–1279, September 2011
How to Cite
Underwood, M., Mistry, D., Lall, R. and Lamb, S. (2011), Predicting response to a cognitive–behavioral approach to treating low back pain: Secondary analysis of the BeST data set. Arthritis Care Res, 63: 1271–1279. doi: 10.1002/acr.20518
- Issue online: 29 AUG 2011
- Version of Record online: 29 AUG 2011
- Accepted manuscript online: 13 JUN 2011 01:18PM EST
- Manuscript Accepted: 20 MAY 2011
- Manuscript Received: 23 DEC 2010
- UK National Health Service Health Technology Assessment Programme. Grant Number: Project 01/75/01
- Birmingham Science City Translational Medicine Clinical Research and infrastructure Trials platform
- Advantage West Midlands
Identifying factors that predict who is likely to gain the greatest benefit from different treatments for low back pain is an important research priority. Here we report moderator analyses of the Back Skills Training Trial (BeST) that tested a cognitive–behavioral approach for low back pain.
We recruited 701 participants ages ≥18 years with at least moderately troublesome low back pain present for >6 weeks from 56 general practices in 7 localities across England to a trial adding a group cognitive–behavioral approach to active management advice. The cognitive–behavioral package had a moderate effect on primary outcomes (Roland Morris Disability Questionnaire [RMDQ] and modified Von Korff scales). At 12-month followup, we tested for interaction between randomized groups on 2 prespecified baseline variables (troublesomeness and fear avoidance) and 10 post hoc (exploratory) variables identified from previous studies.
Neither troublesomeness nor fear avoidance moderated treatment effect on any of our primary outcomes. In the final model, the only moderation by baseline variables of the effect of randomization was on the RMDQ outcome. Being younger and currently working both moderated treatment effect, resulting in larger improvements as a response to treatment.
Although BeST is one of the larger trials of back pain treatment, it is still too small to reliably detect moderation if it exists. Since the significant moderation effects were only observed for 1 outcome measure in 3 of 10 post-hoc analyses, we cannot conclude that these are true moderation effects.
Several therapist-delivered intervention packages are effective and cost effective for chronic nonspecific low back pain. Treatment packages of acupuncture needling, exercise, and manual therapy are recommended for the early management of persistent back pain by the National Institute of Health and Clinical Excellence guidelines (1). We have added to these data by demonstrating that a group cognitive–behavioral approach delivered by a range of health professionals has sustained the effect on low back pain disability, at a very modest cost to the health care provider (Back Skills Training Trial n = 701) (2). Most of these interventions produce small to moderate mean benefits that are likely to be important at a population level, but that may not be important to an individual patient. Identifying moderators (effect modifiers), factors that predict who is likely to gain the greatest benefit from different treatments for low back pain, is an important research priority because it will allow us to deliver the best treatment for an individual patient (1, 3, 4). There are few randomized studies of moderators of treatments for nonspecific low back pain and even fewer that have used prospectively defined moderators (5). Here we report prespecified moderator analyses and further exploratory moderator analyses of the Back Skills Training Trial data.
Significance & Innovations
Baseline characteristics were not shown to predict response to a cognitive–behavioral approach to treatment of low back pain in secondary analysis of a large randomized controlled trial.
No evidence to support selective provision of such services was found.
Future research into back pain subgroups may need to use different approaches.
MATERIALS AND METHODS
We have reported the design, intervention, and main analyses of the Back Skills Training Trial in detail elsewhere (2, 6–8). Briefly, we recruited 701 participants ages 18 years or older with at least moderately troublesome nonspecific low back pain present for greater than 6 weeks from 56 general practices in 7 localities across England. We randomized these, 2:1 in favor of the intervention, using a remote telephone randomization service. All baseline variables were collected using a self-completed questionnaire prior to randomization.
All participants received a 15-minute session of active management advice, including the benefit of and how to remain active, avoidance of bed rest, appropriate use of pain medication and symptom management, and a copy of The Back Book (9). Those in the intervention group were offered an individual assessment lasting up to 1.5 hours and 6 sessions of group therapy using a cognitive–behavioral approach lasting 1.5 hours per session.
Our primary outcomes, the Roland Morris Disability Questionnaire (RMDQ; scale 0–24, where lower scores indicate less severe disability) and modified von Korff (MVK) scales of pain and disability (scale 0–100%, where lower scores indicate less pain and disability), were measured at 12 months using a self-administered postal questionnaire (10, 11). Nonresponders to the postal questionnaire received a telephone interview to collect core outcome data. Collection of the RMDQ data was not included in the telephone interview process due to its length.
We found that the intervention package was moderately effective at 12 months based on the 598 subjects who responded at 12 months (2) (Table 1). To reduce hazards from making multiple comparisons, we have not included 3- and 6-month followup data in this secondary analysis.
|Mean treatment difference (95% CI)|
|All participants at 12-month followup|
|RMDQ (n = 498)†||1.3 (0.56, 2.06)|
|MVK disability (n = 552)‡||8.4 (4.47, 12.32)|
|MVK pain (n = 583)‡||7.0 (3.12, 10.81)|
|Participants with pain for >3 months at randomization|
|RMDQ (n = 509)†||1.0 (0.38, 1.69)|
|MVK disability (n = 515)‡||4.3 (0.45, 8.17)|
|MVK pain (n = 534)‡||6.6 (3.21, 10.08)|
|RMDQ (n = 524)†||1.4 (0.64, 2.18)|
|MVK disability (n = 547)‡||8.1 (4.25, 12.00)|
|MVK pain (n = 565)‡||7.9 (4.11, 11.61)|
|RMDQ (n = 494)†||1.3 (0.53, 2.02)|
|MVK disability (n = 548)‡||8.2 (4.24, 12.16)|
|MVK pain (n = 579)‡||6.8 (2.92, 10.57)|
Potential effect moderators.
At the design stage of the trial, we prespecified 3 hypotheses (confirmatory analyses) (12). First, the benefits from treatment would be greater in those with more troublesome low back pain (moderately versus very/extremely troublesome). We measured this during the participants' initial assessment prior to randomization using a 5-point Likert scale from not at all troublesome to extremely troublesome, with participants only being included if they reported low back pain of at least moderate troublesomeness (13, 14). Those with more troublesome pain have a greater capacity for improvement and arguably have more to gain from additional improvements from treatment. Assessing troublesomeness of pain is an attractive and simple approach that could be used in the consulting room to decide which patients should be considered for further treatment. Second, the intervention would have a greater effect in those with high levels of fear avoidance. Fear avoidance is well established as a potentially modifiable predictor of poor outcome from back pain (15). Our intervention was designed to target fear avoidance among a range of unhelpful beliefs about back pain and to promote physical activity. We measured this using the Fear-Avoidance Beliefs Questionnaire (FABQ) at baseline (16). Third, the benefits of treatment would be greater for those with subacute low back pain (<3 months) than those with chronic low back pain. We were unable to perform this analysis because too few participants had subacute pain.
At baseline, we collected data on other potential predictors or moderators of treatment effect (Table 2). These included demographic information on: 1) whether or not participants received state benefits, 2) depression and anxiety using the Hospital Anxiety and Depression Scale (HADS), 3) self-efficacy using the Pain Self-Efficacy Questionnaire, and 4) health-related quality of life using the physical and mental health component scores of the Short Form 12 (SF-12) (17–19). We dichotomized potential moderators that were continuous variables to facilitate the analyses using cut points available in the literature. We used a cut point of <14 for the FABQ and <11 for both the anxiety and depression components of the HADS (17, 20). For all other continuous outcomes where generally accepted cut points were not available, we used a median cut point (21).
|Control (advice only) (n = 199)||Advice plus cognitive–behavioral intervention (n = 399)||Total (n = 598)|
|Mean ± SD||54.2 ± 14.46||53.7 ± 14.38||53.8 ± 14.39|
|Male, no. (%)||77 (38.7)||164 (41.1)||241 (40.3)|
|Female, no. (%)||121 (60.8)||235 (58.9)||356 (59.5)|
|Ethnicity, no. (%)|
|White||175 (88.0)||350 (87.7)||525 (87.8)|
|Asian or Asian British||6 (3.0)||18 (4.5)||24 (4.0)|
|Black or black British||3 (1.5)||6 (1.5)||9 (1.5)|
|Chinese or other||4 (2.0)||5 (1.3)||9 (1.5)|
|Missing||11 (5.5)||20 (5.0)||31 (5.2)|
|Left full-time education, no. (%)|
|Age ≤16 years||101 (50.8)||225 (56.4)||326 (54.5)|
|Age >16 years||90 (45.2)||154 (38.6)||244 (40.8)|
|Missing||8 (4.0)||20 (5.0)||28 (4.7)|
|Employed, no. (%)|
|Not employed||103 (51.8)||192 (48.1)||295 (49.3)|
|Employed||95 (47.7)||206 (51.6)||301 (50.3)|
|Missing||1 (0.50)||1 (0.3)||2 (0.4)|
|Frequency of back pain (past 6 weeks), no. (%)|
|Comes and goes + getting better||49 (24.6)||98 (24.6)||147 (24.6)|
|Fairly constant + getting worse||148 (74.4)||299 (74.9)||447 (74.7)|
|Missing||2 (1.0)||2 (0.5)||4 (0.7)|
|Troublesomeness, no. (%)|
|Moderately troublesome||113 (56.8)||214 (53.6)||327 (54.7)|
|Very/extremely troublesome||86 (43.2)||185 (46.4)||271 (45.3)|
|Mean ± SD||13.9 ± 6.35||13.4 ± 6.47||13.5 ± 6.43|
|Duration of back pain, years since first onset|
|Mean ± SD||13.1 ± 12.16||13.1 ± 13.23||13.1 ± 12.87|
|Benefits, no. (%)|
|No benefits||164 (82.4)||327 (81.9)||491 (82.1)|
|Benefits||33 (16.6)||65 (16.3)||98 (16.4)|
|Missing||2 (1.0)||7 (1.8)||9 (1.5)|
|Mean ± SD||7.5 ± 4.43||8.1 ± 4.29||7.9 ± 4.34|
|Mean ± SD||5.3 ± 3.47||5.9 ± 3.75||5.7 ± 3.66|
|Mean ± SD||42.1 ± 11.82||39.9 ± 13.49||40.6 ± 13.00|
|Mean ± SD||8.2 ± 4.49||8.7 ± 5.01||8.5 ± 4.84|
|Mean ± SD||45.8 ± 23.41||47.8 ± 23.86||47.1 ± 23.71|
|Mean ± SD||58.3 ± 18.72||58.7 ± 19.17||58.6 ± 19.01|
The analysis was on an intent-to-treat basis. We analyzed our primary outcome measures as change from baseline (baseline minus followup). All statistical tests were 2-sided and statistical significance was assessed at the 5% level for the univariate and exploratory analyses only. For the prespecified subgroup analyses, the significance level was adjusted for multiple comparisons using the Bonferroni correction and was therefore assessed at the 2.5% level, i.e., α = 0.025 (22). All of the analyses were carried out using Stata, version 10.1.
Initially, we did an exploratory univariate analysis of all participant demographics and baseline outcomes using simple linear regression in order to identify potential predictors of 12-month outcome. Although identifying predictors of outcome was not our primary focus, we report them here for completeness. We fitted linear regression models for each of the primary outcome measures (change from baseline) with the inclusion of an interaction term to directly examine whether the treatment difference depends on the moderators, both prespecified and exploratory. A statistical test for interaction is the most appropriate method to evaluate and draw inferences from subgroup analyses (23–25). We fitted the following models: 1) model 1, unadjusted model: with treatment assignment, moderator, and the interaction term of these 2 variables; 2) model 2, adjusted model: with treatment assignment, moderator, and the interaction term of these 2 variables adjusted for age, sex, and baseline value of the dependent variable; and 3) model 3, adjusted model: with treatment assignment, moderator, and the interaction term of these 2 variables adjusted for baseline and demographic covariates selected using a forward stepwise variable-selection algorithm with a stringent significance level of 0.01 due to multiple testing to determine whether a variable is added to or removed from the model (26).
The unadjusted model was a comparator for the adjusted models to see whether covariate adjustment altered the conclusions of the analyses. The second model adjusted for clinically relevant factors that were prespecified in the study protocol. Selection of which covariates to adjust for can be problematic in subgroup analyses; therefore, baseline factors that predict outcome can be considered using an appropriate statistical selection procedure (27). For this reason, the final model was fitted using a forward stepwise selection procedure to identify covariates that are predictors of outcome to put into the model. Where the model adjusts for a baseline variable that is also a potential moderator (e.g., age ≤54 years versus age >54 years), we excluded that variable from the model to avoid any issues of colinearity.
For the benefit of those doing systematic reviews of treatments for chronic low back pain, we also present here the main analysis just for those reporting pain present for 3 months or more at randomization.
We obtained 12-month followup data on 598 (85%) of 701 of our participants (199 control and 399 intervention). Demographic and outcome measure data collected at baseline were well balanced across both arms for the sample providing 12-month followup (Table 2). The number of participants contributing to the univariate analyses ranged from 456 for the SF-12 to 583 for troublesomeness and MVK pain. Initial univariate analyses showed a consistent pattern that age, employment, benefits, and MVK disability score were all predictors of outcome in all 3 outcome measures. Troublesomeness, duration, baseline RMDQ, and MVK pain predicted outcome in some, but not all, outcome measures. None of the other baseline variables, including the FABQ, showed any association with outcome (Table 3).
|RMDQ change from baseline||MVK disability change from baseline||MVK pain change from baseline|
|P||Difference (95% CI)||No.||P||Difference (95% CI)||No.||P||Difference (95% CI)||No.|
|Baseline variables of subgroups specified in protocol|
|FABQ (positive values favor those with greater fear-avoidance beliefs) (16)||0.94||0.002 (−0.06, 0.06)||470||0.48||0.13 (−0.23, 0.48)||523||0.19||−0.20 (−0.50, 0.10)||551|
|Troublesomeness of back pain (positive values indicate better outcome for those with greater troublesomeness) (13, 14)||0.02||0.90 (0.16, 1.64)||498||0.13||3.38 (−1.04, 7.80)||552||0.02||4.59 (0.89, 8.30)||583|
|Baseline variables of other subgroups|
|Duration of back pain, years (positive values favor those with a longer duration of back pain)||0.04||−0.03 (−0.06, −0.01)||478||0.03||−0.20 (−0.37, −0.02)||531||0.11||−0.12 (−0.27, 0.03)||560|
|Age, years (positive values favor those who are older)||0.04||−0.03 (−0.05, −0.01)||498||< 0.001||−0.32 (−0.48, −0.17)||551||0.003||−0.20 (−0.32, −0.07)||582|
|Sex (positive values indicate better outcome for women)||0.58||0.21 (−0.54, 0.97)||497||0.06||4.37 (−0.10, 8.83)||551||0.88||0.29 (−3.50, 4.07)||582|
|Age left full-time education (positive values indicate better outcome for those who left education at an older age or are still in education)||0.09||0.42 (−0.07, 0.91)||475||0.21||1.88 (−1.03, 4.78)||527||0.41||1.01 (−1.41, 3.43)||558|
|Frequency of back pain, past 6 weeks (positive values favor those with improved frequency of back pain)||0.86||−0.05 (−0.58, 0.49)||494||0.20||2.05 (−1.10, 5.21)||548||0.98||0.02 (−2.63, 2.68)||581|
|In employment (positive values indicate better outcome for those in employment)||< 0.001||1.45 (0.72, 2.18)||496||< 0.001||9.35 (5.00, 13.70)||550||< 0.001||6.97 (3.30, 10.64)||581|
|Benefits (positive values indicate better outcome for those with no benefits)||0.02||1.19 (0.17, 2.22)||490||0.04||6.31 (0.42, 12.19)||545||0.01||6.81 (1.81, 11.81)||574|
|HADS anxiety (positive values favor those with anxiety symptoms) (17)||0.89||0.01 (−0.08, 0.09)||490||0.31||0.26 (−0.25, 0.78)||541||0.54||−0.13 (−0.56, 0.30)||571|
|HADS depression (positive values favor those with depressive symptoms) (17)||0.97||0.002 (−0.10, 0.11)||493||0.29||0.33 (−0.28, 0.94)||549||0.32||−0.26 (−0.76, 0.25)||578|
|Pain self-efficacy (positive values favor those with stronger self-efficacy beliefs) (18)||0.32||0.02 (−0.01, 0.05)||480||0.74||−0.03 (−0.20, 0.14)||536||0.09||0.12 (−0.02, 0.27)||564|
|Other baseline variables tested for inclusion in final model|
|RMDQ (range 0–24, where lower scores indicate less severe disability)||< 0.001||0.22 (0.15, 0.30)||498||0.36||0.21 (−0.25, 0.67)||552||0.97||−0.01 (−0.39, 0.37)||583|
|MVK disability (range 0–100, where lower scores indicate less disability) (11)||< 0.001||0.03 (0.01, 0.04)||485||< 0.001||0.52 (0.44, 0.60)||552||< 0.001||0.15 (0.08, 0.23)||572|
|MVK pain (range 0–100, where lower scores indicate less pain) (11)||0.06||0.02 (−0.01, 0.04)||494||< 0.001||0.24 (0.13, 0.36)||549||< 0.001||0.32 (0.23, 0.42)||583|
|Ethnicity (UK Census categories; positive values indicate better outcome for those from a nonwhite background)||0.54||0.22 (−0.49, 0.94)||474||0.79||0.51 (−3.28, 4.30)||523||0.96||0.09 (−3.08, 3.25)||553|
|SF-12 physical component score (positive values favor those with better physical quality of life) (19)||0.80||0.01 (−0.03, 0.04)||456||0.75||−0.04 (−0.28, 0.20)||510||0.13||0.15 (−0.04, 0.35)||538|
|SF-12 mental component score (positive values favor those with better mental quality of life) (19)||0.57||−0.01 (−0.04, 0.02)||456||0.16||−0.15 (−0.35, 0.06)||510||0.97||0.003 (−0.17, 0.17)||538|
Models including interaction term between randomized group and moderator.
The goodness of fit was assessed for models 1, 2, and 3 using the adjusted R2 statistic. This statistic can take on any value less than or equal to 1, with a value closer to 1 indicating a better fit. If the model contains terms that do not aid in predicting response, then negative values can occur. The adjusted R2 values for model 1 (unadjusted model: treatment + moderator + interaction) ranged from 0.02 to 0.06, for model 2 (adjusted for age, sex, and baseline) ranged from 0.09 to 0.33, and for model 3 (adjusted for significant predictors from stepwise selection) ranged from 0.16 to 0.43. Clearly, model 3 explained the most variance and was therefore chosen as our final model.
In our third model based on the stepwise selection process, we adjusted for different baseline variables for each of our 3 primary outcomes. Only for the RMDQ did any interactions reach statistical significance: age and employment status. Being younger and employed were all statistically significant moderators for gaining additional benefit, as measured by the RMDQ, from treatment. This means, for example, that on average those who were employed gained an additional benefit from the treatment of 1.89 (95% confidence interval 0.43, 3.35) points in the RMDQ compared to those who were not employed (Table 4). These effects were not seen with either the MVK disability or MVK pain.
|RMDQ‡||MVK disability§||MVK pain¶|
|Prespecified subgroup analyses|
|Fear avoidance, included % (R2)||75 (0.21)||80 (0.43)||89 (0.18)|
|≥14 to <14||−0.01 (−1.53, 1.50)||−2.56 (−10.35, 5.23)||2.18 (−5.35, 9.71)|
|Troublesomeness, included % (R2)||79 (0.21)||83 (0.42)||93 (0.17)|
|Very/extremely to moderately||−1.01 (−2.52, 0.50)||−4.42 (−12.11, 3.27)||−5.04 (−12.47, 2.40)|
|Exploratory subgroup analyses|
|Duration, included % (R2)||76 (0.22)||81 (0.42)||90 (0.17)|
|≥3 years to <3 years||0.14 (−1.55, 1.83)||−3.00 (−11.59, 5.60)||−3.46 (−11.81, 4.89)|
|Age, included % (R2)||79 (0.21)||83 (0.42)||93 (0.18)|
|≥54 years to <54 years||−1.58 (−3.05, −0.12)||−1.67 (−9.28, 5.94)||−3.00 (−10.35, 4.35)|
|Sex, included % (R2)||78 (0.21)||83 (0.43)||93 (0.17)|
|Female to male||−1.27 (−2.79, 0.25)||−3.59 (−11.30, 4.12)||−3.27 (−10.83, 4.28)|
|Left full-time education, included % (R2)||75 (0.21)||79 (0.42)||89 (0.17)|
|Age >16 years to age ≤16 years||1.29 (−0.24, 2.82)||3.01 (−4.90, 10.92)||4.15 (−3.47, 11.77)|
|Frequency of back pain, included % (R2)||78 (0.21)||83 (0.43)||93 (0.18)|
|Comes and goes + getting better to fairly constant + getting worse||−0.12 (−1.81, 1.57)||2.81 (−5.73, 11.35)||0.15 (−8.25, 8.55)|
|Benefits, included % (R2)||79 (0.21)||82 (0.43)||93 (0.17)|
|Benefits to no benefits||0.32 (−1.67, 2.31)||−5.56 (−15.94, 4.81)||−0.22 (−10.06, 9.62)|
|Employed, included % (R2)||79 (0.22)||83 (0.43)||93 (0.18)|
|Employed to not employed||1.89 (0.43, 3.35)||3.16 (−4.44, 10.75)||5.01 (−2.33, 12.34)|
|HADS anxiety, included % (R2)||78 (0.21)||82 (0.43)||92 (0.17)|
|≥11 to <11||−1.12 (−2.83, 0.58)||−2.15 (−10.97, 6.67)||−2.41 (−10.83, 6.01)|
|HADS depression, included % (R2)||78 (0.22)||83 (0.43)||93 (0.17)|
|≥11 to <11||−2.07 (−4.79, 0.65)||−14.58 (−29.19, 0.03)||−4.82 (−17.98, 8.33)|
|Pain self-efficacy, included % (R2)||79 (0.19)||83 (0.42)||93 (0.16)|
|≥42 to <42||−0.15 (−1.65, 1.36)||2.05 (−5.67, 9.78)||0.60 (−6.91, 8.11)|
No statistically significant moderation effects were observed for the prespecified subgroup analyses in either of the models; however, some significant moderators of treatment effect were observed in the exploratory subgroup analyses.
This analysis contributes to the body of research seeking to identify subgroups of patients with low back pain who are likely to achieve greater benefit from particular treatments. Particular strengths are that the analysis is based on a large well-conducted randomized controlled trial and that we were able to do confirmatory analyses on 2 variables, fear avoidance and troublesomeness, that were prespecified before the trial started. The study was not, however, originally powered for these confirmatory analyses. Because we had a prespecified duration of back pain as a subgroup, we have included an analysis of pain of more or less than 3 years since onset as an exploratory analysis; however, this may not be a meaningful distinction in the clinical situation. The remainder of our subgroups is known sociodemographic prognostic indicator variables or psychological factors that might moderate the effect of a cognitive–behavioral approach. This reduces the chances of finding spurious positive results purely by chance.
With 598 participants included in this analysis, this is one of the largest studies of moderators of treatment for nonspecific low back pain. Witt et al included substantially more participants (n = 3,093), the UK Back Pain Exercise and Manipulation Trial (UK BEAM) included nearly twice as many participants (n = 1,116), and Sherman et al included a similar number of participants (n = 638). In these latter 2 cases, the participants were split between 4 treatment groups, meaning that the number included in each comparison was much smaller (28–30). Nevertheless, our statistical power to identify moderators is poor (31). Consequentially, we present only the point estimate and 95% confidence interval for each interaction in preference to the P value (32). In our study, we had sufficient power to detect a between-subgroup standardized mean difference ranging from 0.2 to 0.3 in the primary outcomes.
We make a clear distinction between confirmatory prespecified subgroup analyses and exploratory subgroup analyses, where the former is used for hypothesis testing and the latter is hypothesis generating (33, 34). We used the Bonferroni correction to adjust for multiple testing for the prespecified subgroup analyses only. Had the significance level not been adjusted, the conclusions drawn still would have been the same. For our exploratory analyses, we did not make this correction to ensure that we did identify variables worthy of further exploration. Initially we identified potential predictors of outcome using univariate analyses, followed by a more formal approach of forward stepwise selection (model 3). The stepwise selection process does not identify the same predictors as that from the univariate analyses, as some covariates either become significant or insignificant in the presence of other covariates during the modeling process (35).
Our subgroup analyses considered 3 models: an unadjusted model and 2 adjusted models. The first of the adjusted models (model 2) adjusted for clinically relevant covariates that were prespecified in the protocol. The focus of this model was to try and estimate subgroup effects, correcting for relevant predictors as presented in the literature. The second of the adjusted models (model 3) contained statistically significant predictors of outcome selected using forward stepwise selection, where the choice of covariates in these models was data driven. Based on this particular study, the latter adjusted model (model 3) offered more precision when estimating the subgroup effects compared to the former adjusted model (model 2). Because of our concerns about the scaling and sensitivity to change of the RMDQ, we included MVK disability and pain scores as additional primary outcomes. Consequentially, this analysis included 36 individual comparisons increasing the risk of any statistically significant interactions being chance findings. Such statistically significant interactions that we have found are not consistent across the 3 outcome measures, and thus caution is needed in their interpretation. This does raise for us a slight concern that positive findings in previous studies might not have been consistent if different outcome measures had been used.
In common with other studies, even using a model fitted using a selection procedure, the proportion of the variance in outcome explained by baseline variables is modest. This is greatest for a pure disability measure (MVK disability), least for a pure pain measure (MVK pain), and intermediate for a mixed measure (RMDQ).
Troublesomeness was identified as a potential predictor of outcome in 2 of our 3 primary outcome measures in the univariate analyses, but was not a predictor in the multivariate analyses (model 3) and did not moderate treatment effect. All of the Back Skills Training Trial participants had at least moderately troublesome low back pain. It is possible, had we also included subjects with slightly troublesome pain, that we would have observed a difference, but knowing whether or not our intervention works for those who are only slightly troubled by their back pain may not be a high research priority.
Fear avoidance did not predict outcome in either our univariate analyses or multivariate analyses (model 3). Also, it did not moderate treatment effect in any of our 3 outcomes. Our participants had all consulted for low back pain in the previous 6 months and had significant ongoing problems. They might, however, not be the same population as those currently attending for treatment who will tend to have more severe symptoms (36, 37).
Only for the RMDQ is there any statistically significant moderation of outcome in our final model (model 3). If it was not for our concerns about the measurement properties of the RMDQ, we would only have these data to consider. Since similar effects were not seen in the MVK disability and MVK pain scores, these are unlikely to be true moderation effects. If these were true results, this would suggest that we focused our efforts on the younger population that was currently working and not on the older unemployed population.
We have presented the data here just for those with chronic low back pain. We make no further comment on these data, as they are not the focus of this study but will be of use to others.
There is considerable research interest in identifying back pain subgroups. This analysis, in common with secondary analyses of the UK BEAM data set and a trial of acupuncture by Sherman et al, have failed to find convincing data to suggest that subgroups can be identified in existing trial data (29, 30). Together they have considered a range of potential moderators for 5 different treatment packages in rigorous analyses. As a rule of thumb to show an interaction between a potential moderator and treatment effect of a similar size to the main treatment effect, a 4-fold increase in sample size is required (31). The size of both of these studies was based on finding a main treatment effect rather than a moderation of such an effect. To our knowledge, only 1 trial has been explicitly powered to show moderator effects (n = 3,093) (28). In that trial, Witt et al found that acupuncture was more effective for those with worse initial back function, younger patients, and those with >10 years of schooling (28). Although we consider we have used the most appropriate statistical approach to our data, this approach may not yield positive results unless there are resources to run more trials of a similar size to the acupuncture trial by Witt et al. It is, therefore, perhaps surprising that some other studies have found apparently important effects on much smaller numbers. Great care is needed in interpretation of data from such studies. We now have a number of treatments of proven modest effectiveness. Any future studies should, therefore, need to compare 2 active treatments. The mean differences between the 2 treatments are likely to be much smaller than those comparing an active treatment to no treatment. To show a main effect will require a trial substantially larger than the Back Skills Training Trial; to show a statistically significant interaction, the sample size will need to be multiplied further. Any such trial would only test moderators as a single comparison between 2 treatments. It is unlikely that the very substantial funding needed for many such trials will be forthcoming and that any further research on subgrouping for those with nonspecific low back pain will need to consider adopting different approaches.
We suggest 2 alternative approaches that might possibly be more fruitful. First, there are now many thousands of individuals with back pain who have been recruited to randomized controlled trials. If the research community was to collaborate to develop a repository of individual patient data it may be possible, with the large number of subjects available, to develop statistical techniques that would allow moderators to be identified and clinical prediction rules to be developed (38). The acupuncture research community is already making progress in this direction (30). The back pain research community also needs to measure potential moderators and outcomes in a similar manner that is congruent with existing suggestions to facilitate such pooling (39–41).
Second, the research community should work together to develop some theoretically informed descriptors of back pain syndromes, which may include subject and clinical characteristics, that may respond to specific treatment approaches and then test the interventions in people meeting these criteria. The headache research community, for example, has developed a largely clinical classification of more than 200 different headache types that are now used to inform entry criteria for trials and clinical management without seeking to prove statistically that any one patient characteristic predicts response to treatment (42).
A robust secondary analysis of a large trial of a cognitive–behavioral approach did not identify baseline characteristics that modify treatment effects, allowing only a 0.2 to 0.3 between-subgroup standardized mean difference to be detected in the primary outcomes. Much larger studies would be needed to be confident that important moderators had not been overlooked; it is unlikely that such studies will appeal to funders. New research approaches are needed to confidently identify back pain subgroups.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Underwood had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Underwood, Mistry, Lall, Lamb.
Acquisition of data. Underwood, Lall, Lamb.
Analysis and interpretation of data. Underwood, Mistry, Lall, Lamb.
- 9The back book. 2nd ed. Norwich: The Stationery Office; 2002., , , , .
- 26Medical statistics at a glance. 3rd ed. Chichester (UK): Wiley-Blackwell; 2009., .
- 42The International Classification of Headache Disorders: 2nd edition. Cephalalgia 2004; 24 Suppl: 9–160.