Meaningfulness of mean group results for determining the optimal motor rehabilitation program for an individual child with cerebral palsy
As research on the efficacy or effectiveness of interventions to improve motor functioning in cerebral palsy (CP) has accumulated and been incorporated into systematic reviews, the foundation for evidence-based practice in CP is growing. To determine whether an intervention is effective, clinical trials report mean group differences. However, even if a statistically significant mean group effect is found, this does not imply that this intervention was effective for each study participant or ensure positive outcomes for all with CP. A personalized approach to medical care is currently being advocated based primarily on increasingly recognized genetic variations in individual responses to medications and other therapies. A similar approach is also warranted, and perhaps more justifiable, in CP which includes a heterogeneous group of disorders. Even interventions deemed highly effective in CP demonstrate a range of individual responses along a continuum from a negative or negligible response to a strong positive effect, the bases for which remain incompletely understood. This narrative review recommends that the next critical step in advancing evidence-based practice is to implement research strategies to identify patient factors that predict treatment responses so we can not only answer the question ‘what works’, but also ‘what works best, for whom’.
In recent decades, numerous systematic reviews have been published comparing the efficacy or effectiveness of therapeutic interventions for improving motor outcomes in children with cerebral palsy (CP). These reviews typically synthesize the statistically significant group differences in all measured outcomes for single or multiple intervention techniques or approaches. Although these may serve as a rich source of information for advancing clinical practice, they fail to provide all of the data necessary to make informed treatment decisions. Evidence-based practice, to which all should aspire, was defined by Sackett et al. years ago as the ‘conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients’. Although rehabilitation researchers have clearly been producing more and better clinical trial evidence on a broad range of interventions, precise prediction of a positive motor outcome at the level of an individual child with CP remains an elusive goal.
Although the following statement by Buford et al. was made in reference to conclusions from research on the effects of resistance and aerobic training in healthy adults, it is also applicable to issues related to the current state of evidence for motor rehabilitation in CP: ‘The vast majority of published studies have emphasized main effects and group differences, while paying little, if any, attention to individual differences. It needs to be recognized that contributions at the level of a group may not fully apply to each member of that group’. CP is a very heterogeneous population and is in reality a group of disorders with widely varying type, timing, location, and extent of brain injuries. The distribution, character, and the severity of the resultant movement disorders also demonstrate tremendous variability even within specific diagnostic subgroups or functional classification levels. If such wide variability exists in physiological responses to exercise in a healthy compliant adult cohort, one would anticipate that this variability would be further magnified when intervening with a group of children or adults with CP.
Based on the underlying premise that identification of the sources of individual variation in treatment responses is a critical next step towards advancing evidence-based practice in rehabilitation for children with CP, this narrative review aims to (1) demonstrate why only considering mean group results is insufficient for evaluating whether an individual patient may or may not benefit from a given treatment; (2) discuss existing efforts in clinical trial (mean group studies) design and analyses that begin to either limit or explore sources of individual variability in treatment responses; and (3) discuss alternative research designs or computational strategies to more accurately identify in CP what treatment works best for whom.
Limitations of Mean Group Results
It was not that long ago that research evidence to support physical therapy practice for CP or other neurological disorders was virtually non-existent, with treatments primarily based on experientially derived approaches developed by clinicians, which were loosely based, if at all, on existing knowledge of, or assumptions about, neurophysiology. The transition of clinicians and families from these more philosophical treatment approaches to programs or techniques with more and/or better evidence has been a tremendous challenge, particularly, it seems in the field of pediatric neurological physical therapy. Accumulating results from randomized and non-randomized clinical trials and systematic reviews are contributing towards building a case to support recommendations that some of these more experiential approaches should be replaced by other more evidence-based interventions, for example constraint induced movement therapy. However, the case for or against a given intervention can only be based on the available evidence. Mean group results for the same intervention category may differ across studies and may be attributed to differences in the types of studies that are included (e.g. only randomized clinical trials or all available evidence), intervention factors such as the specific parameters of the protocol being implemented including the dose and duration, and participant factors such as age and functional level, among others.
As with the studies comprising them, the conclusions from these reviews are not always consistent, for example one systematic review stated that strengthening was not effective in CP, whereas another noted that strengthening showed the ‘best and most consistent’ evidence among all physical therapy techniques targeting the lower extremity for improving mobility. One apparent difference was that in the first review, basic principles on how to load muscles to induce strength changes were not adhered to in many of the studies, thus suggesting that intervention factors were the major source for the discrepancy. However, a recent study and a subsequent meta-analysis of three similar lower extremity strengthening studies, each of which aimed to alleviate a crouch gait posture when walking, showed considerable variability in individual responses, ranging from a substantial benefit to exacerbation of the abnormal pattern, depending on individual differences in the presence and degree of hamstring spasticity. The conclusion from that analysis was that strengthening the hip extensors in those with hamstring spasticity could lead to decreased passive range of motion in knee extension, that is greater knee flexion or crouch; whereas those without or with less spasticity had a positive response. Merely reporting a mean group response can obscure these critically important differential effects and, in reality, may deprive some individuals from receiving an effective intervention or, conversely, cause others to experience adverse effects on functioning.
If there is a wide range in the amount of change across individuals after an intervention, which is evident in CP on examination of the standard deviation of the differences between pre and post means, what can one really conclude about the clinical significance of a statistical mean difference for any given patient? Minimal detectable differences have been reported for some outcome measures, which basically indicate whether or not the change is clinically ‘observable’, but not necessarily clinically ‘significant’. Even if the mean were judged to be clinically significant by some standard, as this was a group average, a portion of the individual participants did not reach that mean, so the whole group may not have reached the predetermined threshold. Furthermore, the assumption that there may be a group level threshold for clinical significance is inherently flawed because the same increment of change may have very different clinical import for individual patients; some may have a small change that had a big impact on their functioning, whereas others may have experienced a large change in a target outcome with no functional or societal benefit. The true importance of any functional change can only be validly determined at the individual level, supporting the inclusion of measures such as the Canadian Occupational Performance Measure and Goal Attainment Scaling in clinical trials.
Personalized medicine is hardly a new concept and it has evolved mainly as a result of the tremendous variability observed in responses to medications, with no medication showing 100% efficacy and more likely showing a range of responses from a large benefit to serious, and maybe even life-threatening, adverse events. The field of pharmacogenetics is starting to unravel one of the major sources of these inter-individual differences. Although the stakes may not be as high and the sources of variability may not be as straightforward for therapeutic interventions in children with CP, the large standard deviations seen in clinical outcomes and often small or negligible effect sizes should compel us to explore this question of inter-individual responses to therapy more closely.
More Meaningful Mean Group Analyses
From the standpoint of statistics and research designs, there are several well-known methods to enhance the meaningfulness of a mean response that should really be standard in rehabilitation research conduct and reporting. These include, but are not limited to, the use of a control or comparison group to increase the confidence that the change was caused by the intervention, adequate power to be able to detect a true change, and the use of confidence intervals along with minimal clinically detectable difference cutoffs to better evaluate both statistically and clinically significant differences. An obvious method of reducing the variability within the sample is to design the clinical trial with more narrow and well defined inclusion criteria, such as including those with only unilateral or bilateral CP, or inclusion of individuals within a restricted number of Gross Motor Functional Classification Scale (GMFCS) or Manual Abilities Classification Scale levels, or even further refinements based on etiology (e.g. neonatal stroke or periventricular leukomalacia) or timing of the brain injury (pre- vs postnatal). An additional approach is to go beyond the primary mean group analysis by exploring variability in outcomes within a sample through correlation or regression techniques. These fairly simple modifications to randomized clinical trials or other cohort studies are becoming increasingly prevalent and should provide more information that brings us closer to an individual patient prescription.
As an example, Chen et al. evaluated the predictive effect of patient factors such as age, sex, and GMFCS level on physical therapy outcomes over a 3-year period. As this was a cohort study without a control group, the effects of physical and developmental maturation clearly confound the results making it impossible to determine the effects of therapy alone. Not surprisingly, younger children in the age range of 3 to 7 years had a higher rate of functional progress than those older than 7 years. Those who were GMFCS level II also had the highest rates compared with levels I, III, and IV. However, this study showed similar data to the original GMFCS curves where children in levels II and III seem to have similar absolute Gross Motor Function Measure scores and trajectories in the early years, then sharply diverge for no known reason at the present time – an intriguing phenomenon that warrants further study to identify the individual factors that may be responsible for this delayed deviation in functional prognoses.
In the previous study, the primary intervention was neurodevelopmental therapy which has failed to demonstrate efficacy in multiple systematic reviews,[4, 16] so the effects of the intervention may have been negligible compared with changes due to age that have been shown to vary by GMFCS level. It may be more illustrative instead to evaluate results from studies examining constraint induced movement therapy and other similar intensive unilateral and bilateral upper extremity training programs, which have the strongest and best evidence of all therapy approaches in children with CP. Nearly all individuals who have been studied with these interventions have unilateral or strongly asymmetrical upper limb involvement, which restricts the sample by anatomical subtype at the onset. These interventions typically show moderate to large mean effects that have been shown to be clinically meaningful and seem to be fairly robust across programs, training protocols, and settings.[17, 18] However, these interventions are not equally effective across individuals with some failing to show any benefit or in some cases, worsening,[19, 20] although the percentage of non-responders is not known for the majority of studies as only mean results are typically reported. Mean group studies have also been conducted to determine effective treatment dose but even within those studies, wide inter-individual variability has been reported in response and in the dose needed to achieve similar functional changes.
So what about those cases where no mean effect for an intervention is found; does that necessarily indicate that your hypothesis regarding potential effectiveness of your treatment is unfounded? Heilkema et al. developed a novel early intervention approach based on a set of principles that have been emerging to guide therapists when working with infants at high risk for having CP and their families, and they conducted a randomized clinical trial comparing their intervention called Coping and Caring or COPCA to neurodevelopmental therapy, and found, unexpectedly, no mean difference between outcomes. When they looked more closely at the intervention and patient factors, they found that treatment fidelity was contaminated in the two groups, with therapists showing many similar behaviors regardless of group. They then examined treatment behaviors and found that certain behaviors lead to better infant outcomes and others to worse outcomes, with the former adhering to principles of COPCA and the latter representative of behaviors that comprise neurodevelopmental therapy. They also found that infants who later were diagnosed with CP responded differentially to the behaviors compared with those who did not develop CP.
Several studies and reviews have examined the association of various patient factors with outcomes in an attempt to better predict individual responses to intense upper limb training using correlation and regression techniques or structural equation modeling. Age and functional severity have been considered most often, with inconsistent results. The consensus report by Eliasson et al. noted ‘the large standard deviations within all studies’ and summarized what is known about these two aforementioned factors. The only study to directly evaluate the effects of age, by Gordon et al., found no difference in outcomes between age groups, and it has been suggested that the greater potential for change in younger children may be balanced by the greater attentional capabilities in older children for performing the intensive training required. The effects of functional severity are even more unclear, which may be partly a consequence of the inclusion criteria which typically require at least some baseline level of functioning in the involved hand. The study which stated it was the first to use linear regression to investigate factors that predicted better outcomes found that total restraint time (assumed to be a measure of patient compliance) was the single best predictor of improved motor capacity. In that same analysis, younger age also emerged as being more likely to show a clinically meaningful improvement in motor outcome, such that a 5-year-old child was twice as likely to have a meaningful improvement as an 8-year-old. Baseline motor abilities were also a significant, but weaker, predictor of better outcomes. More recently, authors have considered the type and extent of brain lesions or cortical reorganization patterns on outcomes of unilateral and bilateral training.[19, 20] Kuhnke et al. found a differential response to bimanual training depending on patterns of brain reorganization. Specifically they demonstrated that those with retained ipsilateral connections were worse after training and proposed that bilateral training should not be done in those who already demonstrated mirror movements that were likely the result of ipsilateral control of the paretic hand, and that unilateral training would be more effective. Islam et al. evaluated brain lesion type, extent, and reorganization in 16 children who had participated in constraint induced movement therapy programs and failed to find similar differences caused by these factors, although they acknowledge that the subgroups were small. Of note, individual data were included in the report, showing that two of the 16 children in that study had worse paretic hand function after training, with a very large range of benefits in the remaining children from 6 to 59 points on the Jebsen–Taylor Hand Function Test. This demonstrates the wide variability in outcomes even from a well-established treatment in a specific subtype of CP, similar to available results from other therapeutic interventions in CP. A recommendation from the recent consensus report on intensive upper limb training protocols, which acknowledged the wide variability in outcomes, was that the smallest detectable difference be reported at least for the primary outcome along with the number (proportion) of individuals in that study who exceed it. Although this would help to predict the likelihood of a positive response to a given protocol, it still falls short of predicting the outcome on an individual patient level or elucidating why the outcomes may differ across individuals.
Given the failure of the already identified patient and intervention factors to consistently and strongly predict response to intervention, other factors obviously need to be explored. Genetic factors have clearly been shown to predict disease risk as well as response to medications, and data have been rapidly accumulating for many conditions and medications. Genetic factors that modify the risk of CP are beginning to be explored,[25, 26] but few investigations have been made into how individual genetic profiles may affect response to motor interventions. A fascinating study on the effects of dopamine and dopamine transmission genes on motor learning rates in healthy adults was published recently, which has relevance to CP and other physical disabilities. All received the same motor training for 2 weeks but were randomized to receiving either levodopa or a placebo during training. The resulting lack of a mean difference in training rates would suggest that the drug had no effect. However, when the authors related the genetic ‘dopamine transmission’ scores to the learning rates, they found that those who had higher gene scores did worse on levodopa whereas those with lower gene scores did better. So, the more likely conclusion is that dopamine is important for motor learning, but the effect of the medication varies with the baseline level of dopamine transmission. As dopamine transmission may be altered in many neurological disorders, this would appear to be an important variable to measure in training protocols in those with brain injuries such as CP. Further support for genetic influences on brain plasticity was presented in a study that compared cortical excitability in monozygotic and dizygotic twins and found a strong heritability component in their results.
Another study on variability in response in healthy adults that has broad implications for CP was the report from Buford et al. who measured the range of responses to resistance and aerobic training in a group of compliant adults and found that resultant changes in strength, endurance, and muscle size ranged from no change to percent changes, which in some cases exceeded 100% increases. Again, if this is the normal range of variability in those with healthy neuromuscular and cardiorespiratory systems, should we even be surprised when we see such variable responses across individuals in compromised patient groups?
The Way Forward: Determining What Works Best for Whom
Randomized clinical trials are clearly invaluable for evidence-based medicine, but are neither feasible nor affordable for every possible intra- or intervention question or comparison, and tend to equalize or minimize, rather than explore, the nuances so common in clinical practice. Multiple research strategies can and should be employed to uncover the sources of subgroup and ultimately individual variability in therapy outcomes, some of which have been available for years, as well as newer more statistically based large-scale approaches. Incorporating wait-list controls or multiple baseline measures into clinical trials are some more simple strategies that reduce or account for individual variability, as well as facilitate recruitment. Anonymized data sharing is becoming more prevalent and is a very rich potential source for evaluating response variability across trials and within interventions. Widespread agreement on a set of common data elements, perhaps by intervention category, would also facilitate meta-analyses of available data. However, prospective strategies would be even more preferable. Restructuring registries to include intervention details and functional outcomes is one possible strategy, but these require infrastructure changes that would involve multiple individuals or entities. Starting at the level of the investigator, encouraging more rigorously designed, conducted, and analyzed single -participant research would be a major step forward, but this would also need to be accompanied by a ‘cultural’ shift in the rehabilitation science peer review and editorial process. As an example, multiple baseline designs with replication across individuals can provide powerful evidence supporting or failing to support treatment efficacy. If the outcome of interest can be shown to change more when the treatment is done than when it is not done on that same individual, this has strong internal validity, arguably even stronger than an effect across groups made up of distinct individuals who when randomized into groups are shown to be similar on only a few specified characteristics. However, the external validity of single-participant designs can be challenged if only one child is studied or the treatment effect is inconsistent when replicated across several participants. In any event, single-participant designs, which have been advocated by some investigators for many years in rehabilitation therapy settings, are now receiving stronger support in rehabilitation science[30, 31] perhaps in part because of greater progress on the analytical and design side, and greater traction in pediatric rehabilitation journals. To evaluate feasibility and preliminary effectiveness of a novel intervention strategy using a computerized over ground unweighting system to increase self-initiated mobility in infants and toddlers with CP, we recently employed a multiple baseline design with 6-week no treatment, treatment, and withdrawal phases. We used single-participant statistics to evaluate differences in developmental rates across phases that demonstrated a powerful and fairly consistent response. These data have merit in and of themselves, but can also be helpful to justify and power a larger and longer clinical trial, preferably using a mixed methods design that retains the individual analyses but combines it with group analyses, as demonstrated so effectively in a recent study by Gannotti et al. on gait and participation outcomes in adults with CP. Finally, single-participant designs could be conducted far more frequently and cost effectively if these were integrated into clinical practice. Clinicians could either receive more academic or post-professional research training in these methodologies or they could partner with researchers who could assist with design and analyses.
At the other end of the spectrum are large-scale observational studies involving both clinicians and researchers that comprehensively characterize patients, interventions, and outcomes, and track these over time. Patient-oriented research designs, often referred to as comparative effectiveness or practice-based evidence studies,[30, 34] are designed specifically to broadly incorporate, rather than artificially restrict, the complexities of clinical care in heterogeneous populations with high treatment and outcome variability, as is often the reality in CP. Typically, all (consented) patients who are receiving care at the participating institutions within a specified time span would be enrolled in the study. Details of medical and surgical interventions would be extracted from the electronic medical record and entered into the database. The study group would decide a priori which patient factors have the potential to affect treatment outcomes, and each of these would also be captured as data points. Standardized clinical and functional measures would also be collected at designated time points, for example admission and discharge for in-patients, or at every clinic visit for outpatients. Other data not in the medical record, such as therapy details, would be collected using point-of-care documentation designed (with expert assistance) and implemented by the clinical team that is treating the patients. These designs require a large number of patients (one to several thousand) depending on the number of variables to be considered, but as most of the data are collected within the context of care, enrollment and retention are not a major impediment as they often are in randomized trials. The statistical analysis then tries to identify the patient, intervention, or patient by intervention, factors that have large positive or negative effects on outcomes that remain after the effects of other potentially confounding factors are removed. The results start to provide clear clinical guidelines not only on which treatments or treatment components seem to be the most effective, but more importantly, what works best for whom. Dahdah et al., who claim that there is a ‘dearth of rehabilitation-relevant CE studies’ except for one study in patients post stroke, recently evaluated treatment outcomes for patients with traumatic brain injury across centers, with strong reputations and which claimed to provide equivalent amounts of therapy. They found widely divergent results across centers even when controlling for obvious factors such as patient severity at enrollment, citing differences in both actual amount and type of therapy conducted across centers as probable explanations. No comparative effectiveness or practice-based evidence study has yet been conducted in pediatric rehabilitation.
Other advances in the field such as greater diagnostic precision for identifying the different disorders that are grouped under the term CP, and expansion and refinement of current classification schemes as a basis for evaluating multidimensional influences on outcomes, will also bring us closer to more individualized rehabilitation prescriptions. A final way forward is to think more expansively, for example to consider multiple genetic factors or epigenetic factors such as parental stress or children's basic personality types and how these interact with recovery from or adjustment to a brain injury or search creatively for other potential factors that have not yet been implicated. The ultimate aim is to transform intervention prescription from a one-size-fits-all approach to an evidence-based individualized care plan where every child and family can choose to participate in or receive only those interventions likely to provide maximize motor benefit, in accordance with their personal life goals and desires.
This work was funded by the Intramural Research Program at the National Institutes of Health Clinical Center.