Increasingly attention is shifting towards delivering essential packages of care, often based on clinical practice guidelines, as a means to improve maternal, child and newborn survival in low-income settings. Cost effectiveness analysis (CEA), allied to the evaluation of less complex intervention, has become an increasingly important tool for priority setting. Arguably such analyses should be extended to inform decisions around the deployment of more complex interventions. In the discussion, we illustrate some of the challenges facing the extension of CEA to this area. We suggest that there are both practical and methodological challenges to overcome when conducting economic evaluation for packages of care interventions that incorporate clinical guidelines. Some might be overcome by developing specific guidance on approaches, for example clarity in identifying relevant costs. Some require consensus on methods. The greatest challenge, however, lies in how to incorporate, as measures of effectiveness, process measures of service quality. Questions on which measures to use, how multiple measures might be combined, how improvements in one area might be compared with those in another and what value is associated with improvement in health worker practices are yet to be answered.
De plus en plus d’attention se déplace vers la délivrance d’ensembles de soins essentiels, souvent basés sur des directives de pratique clinique, comme un moyen d’améliorer la santé maternelle, infantile et la survie des nouveaux-nés dans les régions à faibles revenus. L’analyse coût-efficacité (ACE), alliée à l’évaluation d’interventions moins complexes est devenue un outil de plus en plus important pour l’établissement des priorités. On peut dire que ces analyses devraient être étendues àéclairer les décisions dans le déploiement d’interventions plus complexes. Dans la discussion, nous illustrons certains des défis auxquels fait face l’extension de l’ACE dans ce domaine. Nous suggérons qu’il existe à la fois des défis pratiques et méthodologiques à surmonter lors de la conduite d’évaluation économique des ensembles d’interventions de soins qui intègrent des directives cliniques. Certains défis pourraient être surmontés par l’élaboration de guidances spécifiques sur les approches, par exemple pour plus de clarté dans l’identification des coûts pertinents. Certains exigent un consensus sur les méthodes. Le plus grand défi, cependant, réside dans la façon d’intégrer, comme mesures de l’efficacité, des mesures du processus de la qualité du service. Des questions sur quelles mesures utiliser, comment plusieurs mesures peuvent être combinées, comment des améliorations dans un domaine pourraient être comparées à celles dans un autre et quelle est la valeur associée à une amélioration dans les pratiques des agents de santé, n’ont pas encore de réponse.
Cada vez más se reconoce la importancia de entregar paquetes de cuidados básicos - a menudo basándose en la práctica de guías clínicas - como una manera de mejorar la supervivencia materna, infantil y de neonatos, en emplazamientos con pocos recursos. Los análisis de costo efectividad (ACE), junto con las evaluaciones de intervenciones menos complejas, se han convertido en una herramienta cada vez más importante a la hora de priorizar. Se puede argumentar que dichos análisis deberían también utilizarse para la toma de decisiones sobre la puesta en marcha de intervenciones más complejas. En la discusión se ilustran algunos de los retos de llevar los ACE a esta área. Sugerimos que se trata de retos tanto prácticos como metológicos que han de solucionarse cuando se realiza la evaluación económica de paquetes de intervenciones sanitarias que incorporan las guías clínicas. Algunos pueden solucionarse desarrollando recomendaciones específicas, por ejemplo claridad a la hora de identificar costes relevantes. Otros requieren consenso sobre los métodos. El mayor reto, sin embargo, es como incorporar, como medidas de efectividad, medidas de la calidad del servicio. Las preguntas sobre qué medidas utilizar, como múltiples medidas pueden combinarse, como las mejoras en un área pueden compararse con aquellas en otra y que valor está asociado con las mejoras en las prácticas de los trabajadores sanitarios, están aún sin responder.
Increasingly attention is shifting towards delivering essential packages of care, adapted to different levels of the health system, as a means to improve maternal, child or newborn survival (Claeson & Waldman 2000; Victoria et al. 2006). Effective provision of such packages, at scale, is felt to be critical in achieving Millennium Development Goals four and five (Bryce et al. 2006, 2008; Veneman 2006). A key component of such packages is sets of clinical practice guidelines (CPGs). CPGs have been defined as systematically developed statements to assist practitioner and patient decisions on appropriate health care for specific clinical circumstances (Institute of Medicine Committee on Clinical Practice Guidelines 1992). The value of CPGs has been demonstrated, for example, for pneumonia (Menendez et al. 2007) and hypertension (Milchak et al. 2004), and they have become an increasingly familiar part of clinical care to promote the use of beneficial interventions, to make care more consistent and, on occasion, less costly (Eccles & Mason 2001).
In developing countries, CPGs have been employed most obviously in child health as a part of packages targeting conditions causing the greatest burden of disease such as the Integrated Management for Childhood Illnesses (IMCI) (Bryce et al. 2005a,b; Arifeen et al. 2009). However, based on the assumption that health systems have limited resources and that such interventions cost money, resources should be allocated across interventions so that health goals are best achieved with the available budget (Baltussen et al. 2005). Economic evaluation offers a framework for such priority setting and resource allocation (Gold et al. 1996; Baltussen et al. 2005), and it has been argued that a reallocation of 50% of the health budget from interventions which are less cost effective to those that are more cost effective could result in a 64% increase in years of lives saved in the East African region (Bobadilla et al. 1994). Ideally therefore, decisions on whether to support strategies to implement CPGs should be the subject of economic evaluations. As we shall outline below, however, package of care interventions is complex and this presents a set of challenges to economic evaluation.
As simply providing printed CPGs alone yields little effect, multifaceted approaches to implementation including training, feedback and supervision that may improve effectiveness (Pariyo et al. 2005; Grimshaw et al. 2006) are often employed. IMCI, for example, is intended as a broad implementation strategy including CPGs, training, multiple health systems strengthening approaches and a community component (WHO 2003a). Evaluations of the strategy have included randomized controlled trials, quasi-RCTs, programmatic evaluations, specific cost effectiveness analyses and a series of studies exploring success in the delivery of the interventions (Amaral et al. 2004; Schellenberg et al. 2004a; Huicho et al. 2005; Arifeen et al. 2009). We will draw on this body of work and our own work exploring a multifaceted intervention aimed at improving the quality of rural hospital care for children in Kenya (English et al. 2008; Irimu et al. 2008) to illustrate some of the challenges faced when attempting to answer the question, ‘how do these interventions compare with other child health interventions in terms of value for money?’ Our specific aim is to discuss the challenges associated with the economic evaluation of package of care interventions that incorporate CPGs for multiple conditions.
These characteristics present special challenges to the process of evaluating the economic efficiency of such interventions (Shiell et al. 2008) that are largely methodological with respect to determining costs but also conceptual when considering outcomes.
Framework for the economic evaluation of package of care interventions
Useful insights can be gained from work on quality improvement approaches that incorporate CPGs where intervention is conceptualized as composed of two phases (Figure 1): the treatments considered to be ‘best practice’ and the strategies to achieve appropriate adoption of these ‘best practices’ (Freemantle et al. 1999; Mason et al. 2001; Severens 2003). Decisions about scaling up these interventions can be made sequentially or simultaneously (Hoomans et al. 2009). A sequential approach first establishes the cost effectiveness of alternative treatments and selects the most efficient guidelines. This is followed by an analysis of the efficiency of alternative implementation strategies for the selected guidelines (Freemantle et al. 1999; Grimshaw et al. 2004). An integral approach simultaneously examines treatment options and alternative and feasible implementation strategies estimating value for money for alternative combinations in one step (Hoomans et al. 2009). It is immediately apparent that in either approach, it is essential to obtain estimates of costs and effects of both the specific guideline treatments and their implementation strategies.
Challenges in economic evaluation
Defining the intervention
For an intervention to be appropriately costed and evaluated, it should be accurately and comprehensively described (WHO 2003b; Drummond et al. 2005). This definition should include information on the setting where the intervention is delivered, the target population, the time frame, a description of intervention components, the frequency of delivery and the extent of coverage of the target population. For package of care interventions such as IMCI, components and relative intensities of their implementation often vary with every implementation exercise. Contextual characteristics of intervention and/or control settings are also rarely static. To address this problem in our own work, a framework for documenting all intervention activities and contextual changes in intervention and control sites throughout the evaluation period was developed (English et al. 2009).
Defining the counterfactual
Choosing relevant alternatives to compare with quality improvement interventions requires care (Freemantle et al. 1999). Possible options include ‘no care’, an alternative package of care or standard care (Freemantle et al. 1999). These alternatives are associated with different costs and effects and will hence result in different efficiency estimates when compared to interventions under evaluation but there is lack of consensus on the most appropriate comparator. In the case of IMCI, the MCE studies used ‘standard care’ as a comparator (Schellenberg et al. 2004a). An alternative approach, which we employed in our own work in Kenya, is to compare a full implementation strategy with a partial strategy as a means of evaluating the added benefits of a more active approach (English et al. 2009; Nzinga et al. 2009).
Generally, guideline development and implementation involves three stages whose costs can be considered (Vale et al. 2007): development of the guideline, its implementation and treatment costs as a result of the intervention.
Lack of rigour and transparency
While not unique to package of care interventions, lack of rigour and transparency has frequently been observed in reports of costing practice change or quality improvement interventions (Grimshaw et al. 2004). Rigour is certainly made more difficult by the breadth and complexity of interventions such as IMCI. One possible reason for the lack of rigour is the fact that costing data requirements and sources are rarely thought of and/or incorporated in evaluation designs a priori. Thus, cost data are often collected retrospectively from data sources not designed for the purpose, a situation compounded by the poor quality of routine health information in many low-income settings. The design of evaluation studies should thus incorporate costing data collection considerations a priori to improve the quality of costing data collected.
Variability in the range of costs
Systematic reviews of economic evaluations of guideline implementation interventions reveal variability in inclusion of different categories of costs (Vale et al. 2007; Prior et al. 2008). One notable feature is omission of development costs that include the opportunity cost of time spent on guideline development and stakeholder meetings, among others (Adam et al. 2004). Where new guidelines are developed, as in our study of hospital care, these costs are often significant. Their exclusion may thus underestimate the true resource requirements of the intervention. Consensus is needed on how development costs should be accounted for in such interventions.
Measuring change in health status
Traditionally, generic measures such as quality-adjusted life years (QALYs) and disability-adjusted life years (DALYs) have been recommended for inclusion in cost effectiveness analysis (Murray 1994; Weinstein et al. 1996). The QALY is a health outcome measure that combines quantity and quality of life (Sassi 2006). It assigns a quality weight between 0 (for death) and 1 (for full health) to each state of health and multiplies that by the number of years the health state lasts (Sassi 2006). The DALY is a measure derived by adding the years of life lost because of disease (YLL) and the years of life lived with disability (YLD)( Murray 1994; Fox-Rushby & Hanson 2001). YLL is the difference between a person’s life expectancy assuming full health and the age a person dies prematurely because of disease (Murray 1994). The YLD is obtained by assigning disability weights to health states between 0 (for full health) and 1 (for death) and multiplying this by the number of years the health state lasts (Murray 1994). QALYs and DALYs are hence composite measures of health outcomes (mortality and morbidity). These generic measures are preferred in economic evaluations because of their comparability across interventions.
It has been argued that these composite measures are unsuitable in evaluating complex interventions, with examples from palliative care (Normand 2009) and mental health (Chisholm et al. 1997). This is because complex interventions often have a range of outcomes which are inadequately captured by QALYs (Normand 2009) and DALYs (Sayers & Fliedner 1997). For example, a multifaceted quality improvement intervention such as IMCI would arguably result in changes in clinical case management because of adoption of evidence-based practice, health worker motivation because of training and skills improvement, health system improvements such as improved availability of medicines and equipment among others. Whereas these changes may result in better clinical outcomes, this relationship is non-linear and hardly predictable. How can the full range of these benefits be captured within a DALY or a QALY measured at the patient level?
More relevant to developing countries, calculation of the DALY requires estimates of changes in mortality and morbidity (Murray 1994; Anand & Hanson 1997). While this is possible for individually randomized controlled trials of specific interventions, there are challenges to be overcome in measuring these in complex interventions such as IMCI and when improving hospital care which will be described briefly.
Outcomes vs. process measures
The promotion of clinical guidelines is premised on the supposition that evidence-based care is on the causal pathway to better ‘hard’ outcomes (English et al. 2008) such as mortality, disease status and functional ability (Davies & Crombie 1995). However, outcome measures such as these may be inadequate to interpret the effects of complex interventions which often have facilities, teams or even entire populations rather than the individual patient as the unit of intervention (Davies & Crombie 1995). For example, in our study of hospital care, outcomes assessed at the hospital level included improvements in resource availability and organization of care as well as improvement in specific case management practices at the patient level (English et al. 2008).
More specifically, hard health outcomes apparently easy to observe at hospital level, such as mortality, may be influenced by many factors in addition to the nature of care provided that is the target of intervention. These include inadequate or poorly applied definitions, data quality, patient case-mix, unrecognized contextual or temporal confounding and chance (Lilford et al. 2004), especially if assessed through routine reporting systems. It is much harder to control for such possible confounders and effect modifiers than in classical individually randomized experiments. Indeed, the resources required for studies to demonstrate ‘statistically significant’ reductions in mortality that are credibly free from bias, residual or unrecognized confounding are often enormous (English et al. 2008). This makes it hard to base evaluations on hard outcomes and as a result, there is still little evidence of health impact of quality improvement initiatives (Schouten et al. 2008). These challenges perhaps also explain, in part, why the IMCI evaluation studies in Bangladesh (Arifeen et al. 2009) and Peru (Huicho et al. 2005) were unable to demonstrate significant reductions in mortality despite evidence of improvement in process measures.
In the absence of definitive clinical outcome data, increased use is being made of process measures as quality metrics to gauge intervention success (Grimshaw et al. 2006; Prior et al. 2008). Process measures are favoured over hard outcome measures because they can be measured more reliably and validly and are more sensitive to differences in one desirable endpoint, the quality of care (Davies & Crombie 1995; Mant 2001). Thus, if components of clinical guidelines such as the recommended drugs have been shown elsewhere to improve outcomes, then process measures that reflect the degree to which such best practice care is provided are themselves valid and appropriate end points (English et al. 2008). Despite the attractions of measuring process, these measures are associated with limitations, some of which are described below.
The challenge of multiple measures and comparison across programmes
The use of process measures as proxies for quality of care has already found use in the evaluation of child health interventions including IMCI. An example is the index of integrated child management developed by the WHO and used in the MCE studies (Gouws et al. 2005; Bishai et al. 2008). It will be immediately clear, however, that a large number of process of care measures are possible when package of care interventions span multiple illnesses. Attempts to summarize multiple process measures into a common metric have employed panels of experts to group related process measures based on perceived face validity (Gouws et al. 2005). This approach, while useful, was found to lead to measures that meet face but not content or construct validity (Gouws et al. 2005). An alternative approach has been to employ statistical methods like principle component analysis (PCA). PCA is a data reduction method that identifies coherent subsets of variables, known as principle components, that are relatively independent of each other (Kleinbaum et al. 1998) to assign indicators to appropriate groups that satisfy both face, content and construct validity (Gouws et al. 2005).
However, while PCA may provide an internally coherent approach to reporting from one study, the scoring approach derived from one study dataset cannot necessarily be applied to data from different contexts. Indeed, it is quite likely that the same approach applied to an alternative study would yield different components and thus calculate summary process measures differently even if exactly the same primary data are collected. More problematic is the fact that relevant process measures may also vary across and perhaps even within programmes. For example, IMCI is adapted prior to implementation to meet the needs of a specific country (Schellenberg et al. 2004b; Huicho et al. 2005). In some settings, care of children with malaria is of major interest, in others there is no malaria (Victora et al. 2005). Questions on which measures to use, how they might be combined, whether weighting for importance or prevalence is required and how to incorporate them into a generalisable summary measure useful for comparative economic evaluation are still to be answered.
Appropriate cost effectiveness ratios
We have highlighted problems with measuring costs and effects of package of care interventions that incorporate CPGs for multiple conditions. There are clear implications for cost effectiveness analysis where it is necessary to fully encompass costs and, ideally, summarize all effects. Thus, while approaches have involved using disease-specific measures to obtain ICERs such as the ‘cost per additional child receiving appropriate care’ (Rowe et al. 2009), this aggregation process may still fail to capture the full range of intervention benefits. It is also based on the contestable assumption that all process measures, for example correct treatment of malaria and correct treatment of diarrhoea, are equally important and therefore equally weighted.
Overcoming scarcity of effectiveness data
Where trial estimates are lacking, a complementary approach to estimate effects and cost effectiveness of package of care interventions is to explore decision analytic modelling (Sculpher et al. 2006). There is however a dearth of effectiveness data for the separate and joint effects of components of package of care interventions in developing countries (Goodman & Mills 1999; Evans et al. 2005). One possible solution would be to encourage trials of these interventions in developing country settings. But new research is time consuming and where possible evidence synthesis coupled with effectiveness modelling could prove useful before needed data are available (Sculpher et al. 2006). This would allow for all available evidence to be brought to bear with extrapolations where data is not available. Assumptions for the modelling, however, should be explicit and concerns about the validity and transferability of data addressed. One challenge to synthesizing effectiveness evidence for packages of care is that often studies of component interventions are based on varied endpoints (Edejer et al. 2005). Solutions could include utilizing structured methods such as the Delphi method to allow experts to extrapolate study endpoints into a common metric such as the DALY or QALY. Experts would be required to give their opinion on, for example, how many deaths would be avoided by improving adherence to CPGs such as those for malaria or pneumonia.
A further modelling challenge is the lack of standardized methods for combining intervention effects when implemented jointly. Interventions implemented together often interact and their effects are not always additive (WHO 2003b). Current approaches have included assuming an additive effect for interventions that affect different health outcomes and a multiplicative effect for interventions that affect same outcomes, an approach used recently when modelling cost effectiveness of child health intervention (Wald & Law 2003; Evans et al. 2005).
Alternatively, rather than attempting to translate measures of better care into health status outcomes perhaps it would be more useful simply to accept that this is inappropriate and instead develop a common ‘quality scale’ onto which the scope and magnitude of various package of care intervention effects could be mapped. This might be achieved by identifying the range and extent of improvements likely from any package and eliciting preferences from decision-makers and other stakeholders in a health system based on what they value and are willing to pay for. The ‘improvement domains’ included could and probably should go beyond clinical effects and capture a broad range of benefits such as health system strengthening, health worker competence and motivation for example. This approach would potentially also allow greater input at a national level to development of scales that reflect local contexts and preferences.
The ideas presented assume that it is necessary or desirable to provide economic evaluations that allow alternative interventions to be compared on the same scale. Such aggregation into a single summary measure, while attempting to promote comparability, may risk oversimplification, obscuring important features of alternative interventions and their consequences. Perhaps economic evaluations of complex interventions should therefore be restricted to cost consequence analysis. Decision-making could then be left to policy makers employing a balance sheet approach in which costs and positive and negative consequences are simply stated in a table and used as the basis for deliberation (Mcintosh et al. 1999; Severens 2003). This method however may not give clear insight into the question of efficiency.
Package of care interventions that rely heavily on CPGs are deemed to be effective in improving quality of care in low-income settings (Jones et al. 2003; Darmstadt et al. 2005; Ronsmans & Graham 2006). There is, however, a need for further work to develop appropriate methods for economic evaluation of implementation strategies. In particular, there is need for clearer standardization and thinking on intervention definition and choice of comparator. Costing methods should be more robust and transparent and the lack of effectiveness estimates from developing country settings should be addressed. Future work should also focus on the development of appropriate measures of effect for multiple and of combining effects of interventions delivered jointly.
Dr Edwine W. Barasa is supported by the Wellcome Trust strategic award (#084538). Dr Mike English is supported by a Wellcome Trust Senior Fellowship awarded to him (#076827). The funders had no role in the design, conduct, writing or submission of this opinion piece. The authors are grateful to Dr Susan Cleary of the University of Cape Town health economics unit and Dr Sasha Shepherd of the Oxford University’s health economics research unit for their useful comments on the manuscript. We are grateful to the KEMRI/Wellcome Trust Programme and the Director of KEMRI, with whose permission this work is published.