## 1. INTRODUCTION

This paper studies the effect of endogenous managed care insurance plans on expenditure for medical services within a Bayesian econometric framework. To study this issue, we define managed care in terms of three alternative insurance plans that are characterized by different degrees of use restrictions: health maintenance organization (HMO), preferred provider organization (PPO) and fee-for-service (FFS) plans. The HMO plans are the most restrictive, involving a gatekeeper physician and a preselected network of providers that provide the within-network coverage. The PPO plans also have a gatekeeper; but they do not have most of the other restrictions imposed by HMO plans. For example, out-of-network providers are also covered, but only partially. The FFS plan, representing the greatest flexibility of choice, does not have gatekeepers and extends coverage to all available providers.

Salient features of health expenditure data specifically, and utilization data more generally, include, in addition to non-negativity of outcomes, a significant fraction of zero outcomes and non-normal empirical distributions characterized by positive skewness and excess kurtosis. The econometric modeling of the effect of insurance on health care expenditures faces two challenges. First, the outcome, medical expenditure variable *Y* typically has a substantial proportion of zero values. For example, in the data used in this paper approximately 17% of the respondents have zero ambulatory medical expenditure and 94% of respondents have zero hospital expenditures. When the data have this feature of finite point probability mass no standard parametric distribution will suffice, and, instead, some kind of mixture model is needed to provide a good fit to the data. Empirical strategies for modeling such data, in the context of a regression with homoskedastic errors estimated by least squares, have been discussed by Duan *et al.* (1983) and Mullahy (1998), and more generally by Manning and Mullahy (2001). They have been surveyed extensively by Jones (2000), among others. Because the two-part model (TPM), pioneered by the RAND researchers in their analysis of the expenditure data from the RAND Health Insurance Experiment, is a widely used example of such a mixture model, it provides a point of departure for this paper. The TPM, also known as a hurdle model, introduces modeling flexibility by allowing the zero and positive values of expenditure to be generated by two separate processes. A complementary question whether a sample-selection model or the two-part model is a better framework for health expenditures has generated some lively debate in the health economics literature; see Maddala (1985). Jones (2000, section 4) surveys this debate. It is well-known that likelihood-based estimation of the Tobit model, which is often considered to be an alternative benchmark model for this type of zeros-dominated data, is inconsistent under non-normality and/or heteroskedasticity, and has been found to be an inferior modeling strategy compared with the TPM, as many studies have noted; e.g., see Melenberg and van Soest (1996).

A second modeling challenge comes from the potential endogeneity of health insurance. The standard two-part model assumes exogeneity of the insurance variable, but when working with observational data it is important to allow for endogeneity of insurance. A widely held perception in the health economics literature on selectivity into insurance plans is that healthier individuals tend to select themselves into managed care plans with a gatekeeper and smaller premiums and less healthy but more risk-averse individuals tend to select indemnity plans with higher premiums and more extensive coverage. As a result the average expenditure for the healthier group should be lower. Some studies, e.g. Goldman (1995) and Mello *et al.* (2002), allow for endogeneity, but in these papers the endogenous treatment indicator is binary. In the case of our model, as in most rich specifications of insurance status, the insurance indicator is multinomial.

We integrate the TPM and the multinomial selection model into a single framework that we call the extended (or endogenous) two-part model (ETPM). This contrasts with most of the existing studies that, with some notable exceptions, assume exogeneity—an assumption that leads to considerable computational simplicity. However, ignoring selection effects means that we cannot separate out the pure treatment effect from that which is due to self-selection. Individuals and households are more likely to choose insurance based on personal characteristics such as overall health status, the existence and severity of chronic health conditions and physical limitations, preferences for risk, preferences over intensity of treatment, and so on. If all such variables are introduced into the outcome equation, then one could control for the effects of selection. This is difficult because some of these factors are intrinsically unobservable. Hence it is unlikely then that the observable variables included in the outcome equation will adequately control for the influence of these factors, and it seems more likely that some additional statistical controls for selection on unobservables will be required. If the assumption of exogeneity of insurance is invalid, the estimates of marginal effects and treatment effects obtained from the TPM would be inconsistent. Therefore, to identify the pure treatment effect selection has to be modeled.

This paper offers two innovations, one substantive and the other methodological. The substantive focus of this paper is on the impact of managed care on total ambulatory and hospital health care expenditures. We use nationally representative data from the USA in the form of six repeated cross-sections, 1996–2001, from the Medical Expenditure Panel Survey (MEPS) and focus on two components of total medical expenditure. We model both inpatient expenditure, including all hospital treatments, and ambulatory expenditure, which includes the rest of the total expenditure for such medical treatments as office-based doctor visits, outpatient visits, emergency room visits, and expenditure on prescribed medicines. The main reason for such a division in the total expenditure is the general notion that managed care plans advocate cost containment measures, derived largely from the decreased enrollees' use of inpatient hospital services. We compute treatment effects for two separate expenditure measures to evaluate the overall effect of insurance plans.

Our methodological contribution is a parametric estimation strategy of developing and implementing a Bayesian estimation framework based on an extended two-part model (ETPM) that respects the endogeneity and multinomial nature of insurance choice. We introduce unobserved heterogeneity through latent variables, correlated across insurance choice, hurdle and expenditure equations, that can be handled by Bayesian data augmentation with Gibbs sampling. Compared with the alternative simulation-based maximum likelihood method, our Bayesian approach is computationally efficient (Geweke *et al.*, 2003, p. 1218). Our algorithm builds on and extends the previous work by Tanner and Wong (1987), Albert and Chib (1993), McCulloch *et al.* (2000), Munkin and Trivedi (2003), and Geweke *et al.* (2003).

The rest of the paper is organized as follows. Section 2 develops the two-part model with endogeneity and specifies the prior distributions of the parameters. Section 3 presents the MCMC estimation of the model. Section 4 deals with hypothesis testing of exogeneity and Section 5 calculates treatment effects. Section 6 presents an empirical application. Section 7 concludes.