## 1. INTRODUCTION

The last 30 years have witnessed the introduction of several estimators for the semiparametric binary response model under minimal distributional assumptions on the disturbance terms (see, for example, Manski, 1975, 1985; Horowitz, 1992; Powell *et al.*, 1989; Ichimura, 1993; Klein and Spady, 1993). Much of the focus on relaxing distributional assumptions in the binary response model was motivated by the fact that maximum likelihood estimation of discrete choice models would generally lead to inconsistent estimates if the underlying distribution was incorrectly chosen.

In addition to the ‘shape’ of the error distribution, it may also be misspecified in the manner in which it depends on the explanatory variables. For example, if the error exhibits multiplicative heteroscedasticity that is not a function of the ‘mean’ response, then only the above-mentioned estimators of Manski and Horowitz are consistent. However, these estimators will not recover binary response probabilities or marginal effects. By estimating binary quantile models, Kordas (2006) obtains interval estimates of the probabilities under general conditions. Kahn obtains marginal effects for a more general model than that considered here, but the estimator may be subject to the ‘curse of dimensionality’ when the model contains many explanatory variables. One of the main objectives of the present paper is to obtain probabilities and associated marginal effects that are reasonably estimated when the dimension of the explanatory variables may be large. Accordingly, we model a binary response probability as depending on two indices, where the distribution of the error may depend on the explanatory variables through one or both of the indices. This specification allows for, but is not restricted to, multiplicative heteroscedasticity that depends on one or both indices.

To estimate the binary response model described above, we extend the estimator in Klein and Spady (1993). The estimator in Klein and Spady depends on a single-index assumption, which in the present context would imply that it can handle heteroscedasticity only if the ‘error’ distribution depends on the same index that determines the ‘mean response’. Here we allow a double index formulation in which the index underlying the ‘mean response’ may differ from that upon which heteroscedasticity depends. Such an index formulation is particularly important in view of a result due to Chen and Khan (2003). They consider a binary response model where the heteroscedasticity depends on an unknown function of the explanatory variables and does not have an index structure. In this case, they show there does not exist a -consistent estimator for the model's parameters. Here, we will obtain a -consistent estimator under an index specification As an extension of Klein and Spady (1993), we conjecture that when the error in the binary response model is independent of the explanatory variables, the resulting estimator is efficient in a general class of models that satisfy a double-index restriction.1

It should be emphasized that the estimator developed here depends on density estimators obtained under estimated local smoothing, where underlying density estimators are based on windows that vary for each observation in the sample. This is analogous to characterizing a distribution with a histogram in which the bin interval is allowed to vary depending on whether one is in the tails of the unknown density (where observations are sparse) or in regions where the true density is ‘high’. With such local smoothing, the proofs for the asymptotic properties of the estimator formulated here substantially differ from those in the literature that employ bias-reducing kernels. We pursue this strategy first because density estimators under local smoothing have mean-squared-error optimal properties (Abramson, 1982). Second, and most importantly, in the present context we have found that the finite sample performance of the estimator for the binary response model is much improved under local smoothing in contrast to bias-reducing kernels. We also found further improvements in the finite sample performance of the estimators by employing dependent kernels that depend on an estimated sample covariance matrix as advocated by Fukunaga (1972). Accordingly, all proofs in this paper are for estimation under local smoothing and dependent bivariate kernels.

In adopting the above smoothing strategy, we have found it necessary to employ a property of the derivative of semiparametric probability function due to Whitney Newey. Namely, when this derivative is taken with respect to index parameters and then evaluated at the true parameter values, it coincides with the corresponding parametric derivative minus its conditional expectation (conditioned on the indices). This ‘residual-type’ property of this derivative function is important below in controlling the bias in gradient terms in the asymptotic normality argument. As is typical for many semiparametric estimators, we will need to downweight (trim) observations where density denominators become ‘too small’. To exploit the residual property of the semiparametric derivative, we will employ a trimming strategy that depends on estimated indices as opposed to the explanatory variables.

The estimator developed here for the binary response model is also related to those of Ichimura and Lee (1991) and Lee (1995), who examine alternative multiple-index models. While the present paper makes use of several key identification results of the Ichimura and Lee paper, it differs from both in several important respects. First, and most important, we have formulated the estimator and all proofs for the case of estimated local smoothing rather than bias-reducing kernels. Second, we make use of identification results in Ichimura and Lee without imposing exclusion restrictions on the indices. We emphasize that we are not concerned here with recovering the original parameters in the binary response model (which even in the presence of exclusion restrictions are still only obtained up to location and scale). Rather, we are interested in estimating those identifiable functions of the parameters that suffice to identify the semiparametric probability function. It can be argued that with binary response models one is generally not concerned with the parameters themselves but rather with the response probability and marginal effects. Such marginal effects, which examine how the probability function changes as the explanatory variables change, are identified once the probability function is identified. Moreover, while the entire probability function converges pointwise and uniformly to the true function at a rate below the parametric rate of , averaged marginal effects converge at the parametric rate. The original parameter values of the model are not required for such identification. In part, for this reason we focus on identifying the probability function itself rather than index parameters.

While one of our primary objectives is to provide an estimator for this double-index binary choice model,2 we note that applied researchers have become increasingly interested in larger systems in which the choice appears in another equation as an endogenous regressor. This type of model, frequently referred to as an endogenous binary treatment model, is at best poorly identified without an exclusion restriction. The well-known problem here is that the treatment probability, which would serve as an instrument for estimating the continuous outcomes equation, is often approximately linear in its argument. In the absence of an exclusion restriction on the continuous outcome equation, the instrument is then very close to being linearly related to the same exogenous variables in the continuous equation of interest. To resolve this problem here, we consider the case of multiplicative heteroscedasticity in the binary response equation, which is some function of the explanatory variables *X*. Write this function as *S*(*X*). In the next section we show that such heteroscedasticity may be viewed as inducing exclusion restrictions on the continuous outcomes equation. With no parametric assumptions on *S*(*X*) (other than that it depends on one or two indices) and with no parametric assumptions on the distribution of the error term in the binary response model, below we will develop an estimator that exploits such identifying information. We will then show that such information is useful both in theory and in practice (as indicated in a series of Monte Carlo experiments and in an empirical application).

For continuous simultaneous equations models, other authors have exploited heteroscedasticity as an identification strategy. For example, in a semiparametric formulation, Klein and Vella exploit such information to identify and estimate triangular simultaneous equations models without exclusion restrictions. In parametric formulations, Rummery *et al.* (1999) and Rigobon (2003) also exploit heteroscedasticity as an identification strategy for simultaneous equations. From the structure of the problem considered here, there is information in higher-order powers of the *X*'s that could be exploited to construct instruments for the outcomes equation. Dagenais and Dagenais (1997) and Lewbel (1997) exploit such information in models with measurement error. In this paper, since the nature of the heteroscedastic function in the treatment equation is unknown, it is unclear which higher orders of the *X's* should be used as instruments. Consequently, we pursue an alternative strategy here that involves direct estimation of a double-index binary response model. One could attempt to bypass estimation of this equation and determine the appropriate higher orders of *X's* to use as instruments by extending Donald and Newey (2001) to the model considered here. However, as the treatment probability is itself of direct interest, we pursue an alternative strategy that employs the estimated treatment probability in estimating the continuous outcomes equation. In the present context, the conditional treatment probability is an optimal instrument (Amemiya, 1975).

The next section outlines the model and the estimation methods. In Section 3 we provide and discuss the assumptions required to establish asymptotic results. When estimating the treatment effect, we note that our procedure is of particular value when there are no exclusion restrictions which provide instruments. Accordingly, we focus on identification in the absence of conventional exclusion restrictions. In Section 4 we establish the asymptotic properties of the estimators for both the binary response and outcome models. In so doing, we sketch out the proofs, and provide complete technical details in the Appendix, which is available online from Wiley Interscience. The proof strategy differs from other arguments in the literature as it relies on estimated local smoothing. Section 5 provides simulation evidence. In Section 6 we provide an empirical application where an individual's total education level (the outcome) depends in part on whether or not the individual attended a state-financed high school in Australia (the treatment). Section 7 concludes.