SEARCH

SEARCH BY CITATION

Keywords:

  • nonparametric maximum likelihood estimation;
  • penalised regression spline;
  • recursive bivariate probit model;
  • unobserved confounding

Summary

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Model specification
  5. 3. Methods
  6. 4. Simulation study
  7. 5. Empirical illustration
  8. 6. Discussion
  9. Appendix A:: Observed information matrix
  10. Appendix B:: Additional simulation results
  11. Acknowledgements
  12. References

We consider an extension of the recursive bivariate probit model for estimating the effect of a binary variable on a binary outcome in the presence of unobserved confounders, nonlinear covariate effects and overdispersion. Specifically, the model consists of a system of two binary outcomes with a binary endogenous regressor which includes smooth functions of covariates, hence allowing for flexible functional dependence of the responses on the continuous regressors, and arbitrary random intercepts to deal with overdispersion arising from correlated observations on clusters or from the omission of non-confounding covariates. We fit the model by maximizing a penalized likelihood using an Expectation-Maximisation algorithm. The issues of automatic multiple smoothing parameter selection and inference are also addressed. The empirical properties of the proposed algorithm are examined in a simulation study. The method is then illustrated using data from a survey on health, aging and wealth.

1. Introduction

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Model specification
  5. 3. Methods
  6. 4. Simulation study
  7. 5. Empirical illustration
  8. 6. Discussion
  9. Appendix A:: Observed information matrix
  10. Appendix B:: Additional simulation results
  11. Acknowledgements
  12. References

Quantifying the effect of a predictor of interest (also referred to as treatment) on a particular response variable is a challenging task in observational studies. This is because it is often the case that confounders which are associated with both treatment and response are either unknown or not readily quantifiable (this problem is known in econometrics as endogeneity of the variable of interest). Moreover, covariate-response relationships can exhibit nonlinear patterns and observations may be overdispersed. In such a context, the use of standard estimators neglecting the aforementioned issues yields inconsistent estimates. In this article, we consider the case in which the researcher is interested in estimating the effect of a binary endogenous variable on a binary outcome in the presence of unobserved confounders, nonlinear covariate-response relationships and overdispersion resulting from either correlations among observations on the same clusters or from the omission of non-confounding covariates.

Instrumental variable techniques are widely used for isolating the effect of a given predictor in the presence of unobserved confounding (e.g. Wooldridge 2010; Marra & Radice 2011b and references therein), and are increasingly used in epidemiological and medical studies (e.g. Goldman et al. 2001 and references therein). In the context of binary responses, it is well known, from both theoretical and empirical results, that bivariate likelihood estimation methods are superior to conventional two-stage instrumental variable procedures (e.g. Bhattacharya et al. 2006; Wooldridge 2010). First introduced by Heckman (1978), the recursive bivariate probit model represents an effective way to estimate the effect a binary regressor has on a binary outcome in the presence of unobservables. The semiparametric version of Heckman's model is an important extension since undetected nonlinearity can have severe consequences on the estimation of covariate effects (e.g. Marra & Radice 2011a). Chib & Greenberg (2007) proposed two Bayesian fitting procedures for the class of instrumental variable models including the semiparametric recursive bivariate probit model. However, as the authors point out, very large sample sizes are required to obtain reasonable estimates of the binary treatment effect, hence undermining the utility of the method for practical modeling. Marra & Radice (2011a) considered the same model and introduced a penalized likelihood based procedure which permits reliable estimation of the model coefficients at reasonably small sample sizes.

The neglect of the possible presence of overdispersion may have a detrimental impact on the estimation of the effect of an endogenous variable. This issue is dealt with by generalising the method of Marra & Radice (2011a) to include random effects, which are generated by unknown densities. The usual parametric approach, which assumes that random effects are generated by a bivariate normal density (Greene 2012), is avoided here as restrictive. Consequences of parametric assumptions have been studied extensively within the class of generalised linear mixed models (GLMMs). Several authors have shown that misspecification of the random effects distribution can affect negatively the estimation of regression parameters; see for instance Neuhau et al. (1992), Heagerty & Kurland (2001), Chen et al. (2002), and Agresti et al. (2004). In addition, the assumed distribution is a very important factor for the prediction of the random effects themselves. In fact, the shape of the distribution of the empirical Bayes estimates tends to have features that are similar to the assumed random effects distribution, even if in reality assumed and true distributions are not close together (Verbeke & Lesaffre 1996; Papageorgiou & Hinde 2012). With a nonparametric approach such pitfalls are avoided. The results of Laird (1978) and Lindsay (1983) have shown that the nonparametric maximum likelihood estimate of a mixing distribution is a discrete distribution. General fitting algorithms have been provided by Laird (1978), Lindsay (1983), Follmann & Lambert (1989) and Lesperance & Kalbfleisch (1992).

The proposed model is fitted by maximizing a penalised likelihood using an Expectation-Maximisation algorithm, where the issues of automatic multiple smoothing parameter selection and inference are also addressed. The empirical properties of the proposed algorithm are examined in a simulation study. The method is then illustrated using data from a survey on health, aging and wealth. Specifically, the aim is to estimate the effect of private health insurance on private medical care utilization. In such data, endogeneity is likely to arise because insurance coverage is not randomly assigned but rather is the result of supply and demand. Moreover, estimation of the effect of private health insurance on private medical care utilization may be adversely affected by overdispersion resulting from the heterogeneity present in the observations due to unobserved covariates related to either the response or the treatment variable. Buchmueller et al. (2005) provide an excellent review of these issues, which, if neglected, can lead to a biased estimate of the relationship of interest.

2. Model specification

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Model specification
  5. 3. Methods
  6. 4. Simulation study
  7. 5. Empirical illustration
  8. 6. Discussion
  9. Appendix A:: Observed information matrix
  10. Appendix B:: Additional simulation results
  11. Acknowledgements
  12. References

The recursive bivariate probit model consists of a reduced form or treatment equation for the potentially endogenous binary variable and a second structural form or outcome equation for the binary response variable. The mixed effects semiparametric version of this model takes the form

  • display math(1)

where inline image denotes the number of clusters, inline image is the number of observations within the inline image cluster, and inline image is a latent continuous variable which determines its observable counterpart inline image through the rule inline image, for inline image, where inline image is the indicator function; inline image is the coefficient of the endogenous binary variable inline image; vector inline image contains inline image parametric model components (such as dummy and categorical observed confounders, but not intercepts as we do not impose a zero mean on the random effects), with corresponding parameter vector inline image. The inline image are unknown smooth functions of the inline image continuous observed confounders inline image. Varying coefficients models can be obtained by multiplying a smooth term by some predictor (Hastie & Tibshirani 1993). Smooth functions of two covariates such as inline image can also be implemented (e.g. Wood 2006, pp. 154–167). Similarly, inline image is a vector of dimension inline image with associated parameter vector inline image, the inline image are unknown smooth terms of the inline image continuous observed confounders inline image. For identification purposes, the smooth functions are subject to the centering constraint inline image for all terms (Wood 2006 pp. 167–168). The pair of random effects inline image is cluster specific, hence it induces correlation among multiple observations on the same cluster or can be used to handle overdispersion in case of independent observations, i.e. inline image for all inline image. For instance, a large value of inline image will tend to make inline image large for all inline image observations within the inline imageth cluster. Similar comments hold for inline image. As in Chib & Greenberg (2007) and Marra & Radice (2011a), we make the assumption that unobserved confounders have a linear impact on the response. That is, the error terms inline image are assumed to follow the bivariate distribution

  • display math

where inline image is a correlation coefficient and the error variances are set to 1 as the model parameters can only be identified up to a scale coefficient (e.g. Greene 2012). Parameter inline image accounts for the correlation between the responses not accounted for by the pair inline image. As in Greene (2012), inline image and inline image can be correlated. Further, it is assumed that the error terms and random effects are independent.

The recursive structure of (1) follows from the condition of logical consistency. It states that only one observed endogenous variable is allowed on the right-hand side of system (1). This is because the probabilities for the different possible value combinations of the two binary variables have to sum to one (e.g. Maddala 1983, p. 118). To identify the model parameters, it is typically assumed that the exclusion restriction on the exogenous variables holds (e.g. Maddala 1983, p. 122). That is, the exogenous covariates in the first equation of (1) should contain at least one regressor not included in the second equation. Such covariates are regarded as instrumental variables which induce variation in the treatment, do not directly affect the outcome, and are independent of the error terms given the covariates (e.g. Chib & Greenberg 2007). However, under correct model specification, this restriction may not be strictly necessary as pointed out by Wilde (2000) and Marra & Radice (2011a).

The smooth functions are represented using regression splines. The key idea is to approximate a generic function inline image by a linear combination of known spline basis functions, inline image, and regression parameters, inline image,

  • display math

where inline image is the number of bases (hence regression coefficients) used to represent inline image, inline image is a vector containing inline image basis functions evaluated at observation inline image, i.e. inline image, and inline image is the corresponding parameter vector. Basis functions should be chosen to have convenient mathematical properties and good numerical stability. Many choices are possible within the framework adopted in this article. These include B-splines, cubic and thin plate regression splines (see, e.g. Ruppert et al. 2003; Wood 2006 for a more detailed introduction); we opt for the latter. Based on the above regression spline representation, model (1) is written as

  • display math

where inline image, inline image, for inline image, and the linear predictors, inline image, have the obvious definitions.

In the current context, the effect of inline image is of primary interest. This is typically calculated using the average treatment effect (ATE). Given estimates for the random effects, parametric and smooth function components, the ATE can be estimated as follows

  • display math

where inline image and inline image are the distribution functions of a standardized univariate normal and a standardized bivariate normal with correlation inline image, and inline image indicates the linear predictor evaluated at inline image equal to 1 or 0. Coefficient inline image is also of interest as it can be used to ascertain the presence of unobserved confounding (endogeneity). It can be interpreted as the correlation between the unobserved confounders in the two equations (e.g. Monfardini & Radice 2008). If inline image then inline image and inline image are uncorrelated and hence there is not a problem of endogeneity. Because model (1) can capture, and hence separate, two different sources of variability (represented by inline image and inline image), estimation of inline image will be done more reliably by model (1) than by a model which does not account for overdispersion (e.g. Greene 2012).

3. Methods

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Model specification
  5. 3. Methods
  6. 4. Simulation study
  7. 5. Empirical illustration
  8. 6. Discussion
  9. Appendix A:: Observed information matrix
  10. Appendix B:: Additional simulation results
  11. Acknowledgements
  12. References

3.1. Estimation approach

Recall that the error terms inline image are assumed to follow a bivariate normal distribution. Define the parameter vector inline image, and pairs of random effects inline image. Vector inline image contains the parameters pertaining to the random effects distribution (see next section). In the current context, the data identify four possible events, inline image with inline image for inline image, with the following conditional probabilities

  • display math(2)
  • display math(3)
  • display math(4)
  • display math(5)

The penalised log-likelihood function of the observed data inline image, where inline image, is

  • display math(6)

where inline image, inline image, inline image, and the inline image are positive semi-definite known square matrices measuring the (second-order, here) roughness of the smooth terms in the model, that is inline image. The inline image are smoothing parameters controlling the trade-off between fit and smoothness.

Note that because of the presence of smooth components in the model, unpenalised estimation would yield exceedingly wiggly curve estimates which can have a detrimental impact on the estimation of the ATE (Marra & Radice 2011a). This is why the log-likelihood is augmented by a penalty term. In addition, because inline image is bounded in inline image, we use the common transform for correlation inline image, so that inline image is mapped to the real line.

3.1.1. EM penalised log-likelihood maximisation

We make no assumptions about the form of the density that gives rise to the model's random effects inline image. The nonparametric maximum likelihood estimate of a mixing distribution is discrete (Laird 1978; Lindsay 1983) and thus the density of inline image can be represented by inline image bivariate mass points, inline image, with corresponding probabilities, inline image, where inline image. Hence the parameter vector inline image, first introduced in Section 'Estimation approach', consists of inline image. We will treat inline image as a tuning constant.

An EM algorithm (Dempster et al. 1977) is employed for maximising (6). We consider inline image to be the complete data and indirectly maximise inline image by iteratively maximising the expectation of the penalized log-likelihood of the complete data, where the expectation is taken with respect to the conditional distribution of the missing given the observed data

  • display math

where inline image is the current value of the parameter vector. Let inline image be the parameter vector that maximises inline image. Under regularity conditions, at convergence inline image maximises both the complete and the observed data log-likelihoods.

Conditionally on the data and current parameter estimates, the distribution of inline image is discrete with points inline image, and probability masses given by

  • display math

Given the inline image, we have the following expression for the penalised complete data log-likelihood

  • display math(7)

where the inline image are given in (2)–(5).

Note that in (7) the parameter vector inline image separates into two independent subvectors, namely the vector inline image that appears in the triple sum and penalty term and the vector inline image that appears in the double sum. Consequently, maximisation of inline image is achieved in two steps. Firstly, the triple sum has summands that, ignoring fixed inline image are exactly the same as those that would have been obtained by assuming a model without random effects. It follows that the triple sum with penalty term can be maximized using the algorithm presented in Marra & Radice (2011a). Secondly, the double sum is used to update the masses of the random effects distribution resulting in closed form formulas inline image.

3.1.2. Smoothing parameter selection

If the model has more than two or three smooth terms, then it becomes crucial to estimate the smoothing parameters using an automatic, quick and reliable procedure. There are several techniques for automatic multiple smoothing parameter selection for univariate models (see Ruppert et al. 2003; Wood 2006 for a detailed overview). These include the performance-oriented iteration method first introduced by Gu (1992) which consists of applying the generalized cross validation or unbiased risk estimator (UBRE, Craven & Wahba 1979) to each working linear model of the penalized iteratively re-weighted least squares scheme used to fit the model. In what follows, we employ an adaptation of Gu's approach. Also, we suppress the superscript inline image to avoid clutter.

Given values for inline image and inline image, an estimate for inline image can be obtained by minimisation of

  • display math(8)

where

  • display math

inline image is an overall blockdiagonal penalty matrix made up of the inline image and zero vectors corresponding to the model parameters which are not penalised, and inline image is a vector containing the masses as defined in the previous section. Assuming, for simplicity and without loss of generality, that inline image and inline image for each inline image so that the total number of observations is inline image, inline image is a 3-dimensional vector given by inline image, inline image, inline image, inline image is a inline image matrix with inline imageth element

  • display math(9)

inline image is a inline image block diagonal matrix, i.e. inline image, where each of the vectors inline image and inline image contain inline image zero elements but the inline imageth which is set to 1, and the definitions of the linear predictors in (9) follow from the definition of inline image. The square root and inverse of inline image are obtained via eigendecomposition.

The smoothing parameter vector inline image is selected so that the estimated smooth terms are as close as possible to the true functions (Craven & Wahba 1979). Given an estimate for inline image, multiple smoothing parameter estimation for problem (8) can be achieved by minimization of the approximate UBRE score

  • display math(10)

where the working linear model quantities are calculated using the parameter estimates from the optimisation step mentioned in Section 'EM penalised log-likelihood maximisation', inline image, inline image is the hat matrix, and inline image the estimated degrees of freedom of the penalised model. For each working linear model of iteration (8), inline image is minimized with respect to inline image. In practice, this is implemented employing the approach by Wood (2004), which is based on the Newton-Raphson method. In evaluating score (10) and their derivatives, efficiency and stability are achieved using a combination of pivoted QR and singular value decompositions (see Wood 2004 for full details). Note that because each of the inline image is a non-diagonal matrix of dimension inline image, computation can quickly become prohibitive, hence its sparse structure is exploited in implementation.

3.2. Inference

Inference in penalised models is complicated by the presence of smoothing penalties which undermines the usefulness of classic frequentist results for practical modelling. Solutions to this problem have been introduced in the literature (see, e.g., Gu 2002; Wood 2006 for an overview). Here we show how to construct pointwise confidence intervals for the terms of a mixed effects semi-parametric bivariate model by adapting the well known Bayesian confidence intervals, originally proposed by Wahba (1983) and Silverman (1985). An appealing feature of these intervals is that they have close to nominal ‘across-the-function’ frequentist coverage probabilities (Marra & Wood 2012). This is because the Wahba/Silverman type intervals include both a bias and variance component. Moreover, their empirical performance has little sensitivity to the neglect of smoothing parameter uncertainty. For a generic term inline image intervals can be constructed by seeking constants inline image and inline image, such that

  • display math(11)

where ‘ACP’ denotes ‘Average Coverage Probability’, inline image is a constant between 0 and 1, and inline image is the inline image critical point from a standard normal distribution. Defining inline image and inline image, so that inline image, and inline image to be a random variable uniformly distributed on inline image, we have that inline image, where inline image and inline image. It is then necessary to find the distribution of inline image and values for inline image and inline image so that requirement (11) is met. As shown in Marra & Wood (2012), in the context of non-Gaussian response models involving several smooth components, such a requirement is approximately met when confidence intervals for the inline image are constructed using the distribution

  • display math(12)

where, in our context, inline image refers to the binary response vectors, inline image is an estimate of inline image, and inline image where inline image is the information matrix. Specifically,

  • display math

where inline image and inline image are the submatrix of inline image and the basis functions corresponding to the regression spline parameters associated with inline image. In addition, intervals for non-linear functions of the model parameters, such as the ATE, can be conveniently obtained by simulation from (12).

In practice, inline image can be replaced by its observed version inline image. In the present context, however, inline image cannot be obtained as a byproduct of the estimation procedure and of the second order derivatives of inline image, in (7), used therein. Second derivatives of the log-likelihood of the ‘complete’ data inline image would overestimate the information about the model parameters in the sample. Ultimately, this is attributed to treating the weights inline image that appear in inline image as fixed. We therefore find the observed information using the method of Louis (1982), by which the observed information matrix is expressed as

  • display math

which makes clear how the second derivative of the complete data likelihood inline image is adjusted for the unobserved data inline image. Details about the approach, including the definition of inline image, are provided in Appendix A. It is important to stress that there is no contradiction in fitting the model using the method of Section 'Estimation approach' and then constructing intervals following a Bayesian result, and such an approach has been employed many times in the literature (e.g. Gu 2002; Wood 2006 and references therein).

3.3. Algorithm

As indicated previously, we treat inline image, the number of mass points of the nonparametric mixing distribution, as a tuning parameter. It is common practice to find its value as the one that minimizes Akaike's information criterion. This, for a given value of inline image, takes the following form: inline image, where diminline image denotes the effective dimension of inline image for a fixed value of inline image, and the log-likelihood is obtained from Equation (6), in which we express inline image, as also shown in Appendix A.

Having fixed a value of inline image, we need to choose starting values, inline image, for the model parameters. Starting values for inline image are chosen as the maximum likelihood estimates of the corresponding fixed effects model, i.e. the model assuming inline image, fitted using the method of Marra & Radice (2011a). Starting values for the masses inline image, are all set to inline image, while starting values for the mass points, inline image, as set to a multiple (here, square root of two) of the Gauss-Hermite quadrature nodes.

Given inline image and inline image, parameter estimates are found using an iterative algorithm. Iteration inline image consists of finding the maximizer of inline image using the algorithm described after (7). For a given estimate of inline image, smoothing parameter selection is achieved by minimisation of (10), as described in Section 'Smoothing parameter selection'. The two main steps, one for inline image the other for inline image, are iterated until convergence. The rule that we follow for stopping the iterative algorithm is that the maximum absolute change in the parameter estimates from successive iterations is less than inline image.

At convergence, we calculate log-likelihood and AICinline image to guide model choice, standard errors of the estimates by inverting the observed information obtained as described in the previous section, and random effects predictions using (14) as these are needed for estimating the ATE.

4. Simulation study

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Model specification
  5. 3. Methods
  6. 4. Simulation study
  7. 5. Empirical illustration
  8. 6. Discussion
  9. Appendix A:: Observed information matrix
  10. Appendix B:: Additional simulation results
  11. Acknowledgements
  12. References

To gain insight into the empirical effectiveness of the proposed method, a Monte Carlo simulation study was conducted. All computations were performed in the R environment (R Development Core Team 2013) using the package SemiParBIVProbit (Marra & Radice 2013).

4.1. Design and model fitting details

The sampling experiments were based on the model

  • display math(13)

where the binary outcomes inline image and inline image were determined as described in Section 'Model specification'. The test functions used were inline image, inline image and inline image (see Fig. 1). Covariates inline image, inline image and inline image were generated as binary, uniform and normal correlated predictors, respectively. This was achieved by drawing standardised multivariate normal random variables with correlation 0.5 (using rmvnorm() in the package mvtnorm) and then transforming the first two of them with round() and pnorm() (e.g. Marra & Radice 2011a). Bivariate normal errors with zero means, standard deviations equal to one, and correlations inline image were considered. Sample sizes were set to 2000 and 6000 in the following two ways. In the first case, inline image was set to a randomly chosen number between 9 and 11 and inline image was set to 200 and 600. In the second case, inline image and inline image was set to 2000 and 6000. The pairs of random effects inline image were generated according to three scenarios: bivariate normal variates with mean vector (0,2), standard deviations inline image and correlation inline image; mixture of two equally weighted bivariate normals with mean vectors inline image and inline image, and with the remaining parameters as above; bivariate gamma variates with shape and scale parameters equal to 0.5. This last was achieved via a normal copula with inline image using mvdc() in the package copula. Each scenario was replicated inline image times and the quantities of interest, estimated ATE and inline image (see final paragraph of Section 'Model specification'), recorded.

The smooth components of continuous covariates in the model were represented using penalized thin plate regression splines with basis dimensions equal to inline image and penalties based on second-order derivatives (Wood 2006, pp. 154–160). The spline basis representation used here is a low rank eigen-approximation version of the full rank version introduced by Duchon (1977). It represents a general solution to the problem of estimating efficiently, and without having to choose knot locations, a smooth function of multiple predictors from noisy observations of the function. Smoothing parameters were chosen by approximate UBRE as described in Section 'Smoothing parameter selection'. The tuning constant inline image was identified to be 3; further increasing the value of this parameter did not change the results reported in the next section.

True values for the ATEs, under the scenarios detailed above, were obtained via simulation. Specifically, 10000 replicate datasets were generated according to model (13) and ATEs calculated based on the true linear predictors. The simulated average true ATEs for the normal, mixture of normals and gamma cases are -0.43, -0.15 and -0.45, respectively.

Estimates of the ATE were obtained using the proposed mixed model and, for the sake of comparison, the semiparametric bivariate model of Marra & Radice (2011a) which neglects the presence of random effects (henceforth, these two models will be referred to as mixed SRBP and SRBP, respectively). The calculation of the ATE for mixed SRBP requires an estimate of the random effects distribution. This was obtained, using empirical Bayes, as weighted averages of the estimated mass points, inline image, with respective weights inline image. That is, for each inline image,

  • display math(14)

4.2. Results

Tables 1 and 2 display the percentage biases and the root mean squared errors (RMSEs) of the estimated ATEs and inline image's obtained using SRBP and mixed SRBP, when inline image is a randomly chosen number between 9 and 11 and inline image is set to inline image and inline image, and random effects are generated using bivariate normal (N), mixture of normals (MN) and gamma (G) distributions. Tables 4 and 5, reported in Appendix B, provide the same information but for the case in which inline image is set 1 and inline image set to inline image and inline image.

Table 1. Percentage biases and root mean squared errors of the estimated average treatment effects (ATEs) obtained using the semiparametric recursive bivariate probit model without and with random effects (SRBP and mixed SRBP, respectively)
   inline image inline image
  NMNGNMNG
inline image MethodBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSE
  1. Notes: Letters N, MN and G stand for bivariate normal, mixture of normals and gamma variates for the random effects. True ATE values are -0.43, -0.15 and -0.45 for the N, MN and G cases. inline image was set to a randomly chosen number between 9 and 11, and inline image was set to 200 and 600; these produced sample sizes approximately equal to 2000 and 6000. Results are based on 200 replications. See Section 4.1 for further details.

0.1SRBP-2.650.039-64.680.107-1.650.040-2.580.025-66.570.103-2.010.026
mixed SRBP-0.470.0370.130.021-1.580.039-0.210.0231.840.012-0.530.026
0.5SRBP-2.740.036-49.230.083-1.490.036-2.380.023-49.760.078-1.620.024
mixed SRBP-0.430.033-0.580.025-0.040.0350.030.0212.130.011-0.050.021
0.9SRBP-1.780.029-30.490.055-1.300.028-2.220.019-32.760.052-1.440.020
mixed SRBP-0.010.027-4.410.0290.290.028-0.360.016-3.830.0200.180.019
-0.1SRBP-2.770.041-72.950.121-1.460.039-2.680.026-75.500.117-1.790.024
mixed SRBP-0.340.042-1.260.027-2.580.047-0.550.0251.390.013-0.100.026
-0.5SRBP-2.640.041-90.540.147-1.310.037-2.670.026-94.110.145-1.450.023
mixed SRBP-0.630.038-4.150.039-0.610.037-0.770.0240.410.015-0.520.023
-0.9SRBP-3.330.040-115.130.183-1.940.037-3.180.027-115.150.177-0.500.022
mixed SRBP-0.530.0343.670.028-1.310.0350.010.0211.000.023-0.410.022

The main results can be summarized as follows:

Table 2. Percentage biases and root mean squared errors of the estimated correlations between the model errors (inline image) obtained using the semiparametric recursive bivariate probit model without and with random effects (SRBP and mixed SRBP, respectively)
   inline image inline image
  NMNGNMNG
inline image MethodBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSE
  1. Notes: Letters N, MN and G stand for bivariate normal, mixture of normals and gamma variates for the random effects. See the caption of Table 1 for further details.

0.1SRBP84.790.125753.900.75539.370.10283.230.097750.040.750-38.400.099
mixed SRBP-7.930.1049.410.201-5.340.242-8.180.0650.780.110-4.890.111
0.5SRBP1.180.07581.380.408-0.710.0781.080.04581.030.405-0.750.070
mixed SRBP-1.370.084-5.200.180-0.820.067-1.730.056-7.440.090-0.630.058
0.9SRBP-7.980.0866.880.064-5.230.065-8.490.0806.630.060-5.010.062
mixed SRBP-1.770.050-3.120.052-0.280.042-2.660.033-2.630.044-0.310.039
-0.1SRBP-123.510.155-926.960.928-63.310.111-122.160.133-922.600.923-61.690.081
mixed SRBP10.100.121-13.360.246-1.000.10418.080.065-15.550.1258.090.062
-0.5SRBP-40.770.219-254.411.274-22.880.135-40.280.207-253.491.268-22.230.120
mixed SRBP0.820.079-2.310.2940.180.0732.000.047-8.740.157-0.010.044
-0.9SRBP-30.810.283-178.981.613-18.770.180-30.810.280-179.101.612-19.730.182
mixed SRBP-2.520.046-7.500.231-0.520.038-2.790.033-8.130.145-1.240.023
  • Table 1 shows that, under the N and G scenarios, mixed SRBP is only slightly better than SRBP, in terms of accuracy and precision of the estimated ATEs. This suggests that, under the N and G cases, the model neglecting cluster specific random effects can still yield good estimates of the average treatment effect. A likely explanation is that the parameter that links the two equations of the bivariate model (i.e. inline image) captures correlations due to both unobserved confounders and cluster or ‘litter’ effect. However, this is not true when the bivariate random effects distribution is not unimodal, the case in which mixed SRBP considerably outperforms SRBP. These conclusions are in agreement with previously reported findings on the impact of misspecification of the random effects distribution on parameter estimation within the class of GLMMs; see Heagerty (1999), Chen et al. (2002) and Agresti et al. (2004).
  • Table 2 shows that, under all random effects distribution scenarios, mixed SRBP performs considerably better than SRBP, in terms of accuracy and precision of the estimated inline images. The unsatisfactory performance of SRBP can be attributed to the fact that a model neglecting the presence of overdispersion will not be able to disentangle different sources of variability (in this case, one due to endogeneity and the other due to overdispersion). This finding is important because the parameter linking the two model equations is useful to ascertaining the presence of endogeneity, and the estimates produced using SRBP can clearly lead to erroneous conclusions.
  • The findings for the more computationally challenging scenarios, in which inline image and inline image, are essentially the same as those reported above, except that, as expected, the estimates obtained using mixed SRBP are more variable. These results can be found in Tables 4, 5 given in Appendix B.

Figure 1 provides an example of estimated smooths with corresponding inline image Bayesian pointwise confidence intervals obtained using the mixed SRBP model. The function estimates recover the true functions reasonably well. This is a good result given the complexity of the model.

5. Empirical illustration

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Model specification
  5. 3. Methods
  6. 4. Simulation study
  7. 5. Empirical illustration
  8. 6. Discussion
  9. Appendix A:: Observed information matrix
  10. Appendix B:: Additional simulation results
  11. Acknowledgements
  12. References

The modeling framework described in this article is illustrated using data from an Italian population based survey. The aim of this study is to estimate the causal effect of private health insurance on private medical care utilization in the presence of unobserved confounding and overdispersion. The problem of unobserved confounding arises in such data because insurance coverage is not randomly assigned as in a controlled trial but rather is the result of supply and demand, including individual preferences and health status. As a consequence, differences in outcomes for insured and uninsured individuals might be due not only to the effect of health insurance but also to the effect of unobserved characteristics that are associated with insurance coverage and medical care utilization. If we do not account for the endogeneity of coverage insurance then the estimated effect will be biased, hence leading to distorted assessments of health policy implications. Overdispersion, which in this study can result from unobserved predictors of either private health insurance or private medical care utilization, can also bias the effect of interest. Buchmueller et al. (2005) provide an excellent review of these issues. The direction of the bias due to unobserved confounding is unclear a priori. Specifically, standard economic models of insurance markets point to the problem of adverse selection: individuals with a greater demand for medical care, because of poor health for instance, are expected to have a greater demand for insurance. In this case, adverse selection would impart a positive bias on the estimate of the insurance effect on medical care utilization. On the other hand, there could be a problem of moral hazard; once insured, individuals consume more care than optimal. Here, moral hazard would contribute to bias in the opposite direction.

5.1. Data

We used data from the Survey on Health, Aging and Wealth (SHAW; Brugiavini et al. 2002) which was conducted by the leading Italian polling agency DOXA in 2001. The SHAW sample consists of 1068 households whose head is over 50 years old and mainly provides information about individual health status, utilization of health services, types of insurance coverage, as well as socio-economic features. The response is utilization of private health care (util): an indicator variable that takes value 1 if the subject has private examinations and 0 otherwise. The treatment variable is private health insurance (ins): a dummy variable with value 1 if the respondent has private insurance coverage and 0 otherwise. The observed confounders are the continuous covariates age (age), income (inc), body mass index (bmi), the binary variables indicating whether the individual is a male (male), is unmarried or widower (single), is unemployed (unemp), suffers from chronic conditions (cond), has a condition that limits activities of daily life (lim), suffers from hearing and/or eyesight troubles (heey), has ever smoked (smoke), and a factor indicating self-reported health status (poor, good and excellent self-perceived health: poor, good and exc, respectively).

image

Figure 1. Test functions used in the simulation study (dotted lines) and a realization of estimated smooths (black lines) with corresponding 95% pointwise confidence intervals (shaded regions). The smooth estimates were obtained applying the proposed method on data from a typical sample with inline image and inline image, and bivariate gamma random variates for the random components. The estimates are on the scale of the respective linear predictors. Due to identifiability constraints, the curves centered around zero. The numbers in brackets in the y-axis captions are the estimated degrees of freedom of the smooth curves, while the rug plot, at the bottom of each graph, shows the observed covariate values.

Download figure to PowerPoint

5.2. Health care modeling

The methodology presented here is suitable to tackle both endogeneity and overdispersion; the bivariate model allows us to account for unobserved confounding and for the source of variation due to the heterogeneity in the households. Following previous work on the subject (e.g. Holly et al. 1998; Fabbri & Monfardini 2003; Marra & Radice 2011b), we specified a mixed SRBP model with main terms only. Specifically, the equations for ins and util are:

  • display math

The parameters in the model have the obvious definitions and thin plate regression splines of the continuous covariates with the same settings as those used for the simulation study were employed. The optimal value for tuning parameter inline image was identified to be inline image, that is the random effects distribution is represented by a two point discrete distribution. The non-linear specification for age, inc and bmi arises from the fact that these covariates embody productivity and life-cycle effects that are likely to affect ins and util non-linearly. In fact, Holly et al. (1998) and Fabbri & Monfardini (2003) considered a model for health care utilization that contains linear and quadratic terms in age, inc and bmi, whereas Marra & Radice (2011b) specified a model containing smooth functions of them. For comparison purposes, we also employ the SRBP model and a classic univariate probit model using the same functional form specification. Mixed SRBP can account simultaneously for unobserved confounding and overdispersion, SRBP accounts for unobserved confounding only whereas the probit model cannot account for either of these issues.

Results are displayed in Table 3. Bayesian confidence intervals for the ATE and correlation coefficient were obtained using 1000 coefficient vectors simulated from the posterior distribution of the estimated model parameters (see Section 'Inference').

Table 3. Estimates of the ATE and inline image in the health care study obtained using the univariate probit model, and semiparametric recursive bivariate probit without and with nonparametric random effects (SRBP and mixed SRBP, respectively)
  inline image inline image
  1. Notes: Bayesian confidence intervals for ATE and inline image were calculated using 1000 coefficient vectors simulated from the posterior distribution of the estimated model parameters.

Probit0.07 (−0.05,0.19)
SRBP0.25 (0.13,0.36)−0.26 (−0.44,−0.07)
mixed SRBP0.38 (0.18,0.58)−0.46 (−0.68,−0.24)

For the mixed SRBP model, the estimated bivariate mass points are inline image and inline image, with probabilities inline image and inline image, suggesting the absence of relevant predictors of private health insurance. The estimates of inline image are both negative and statistically significant, suggesting the presence of endogeneity. Specifically, the point estimate obtained with mixed SRBP is larger than that of SRBP, although their intervals overlap. This confirms the finding by Holly et al. (1998) which is consistent with the interpretation that unobserved confounders are present and have an opposite significant effect on ins and util.

Moving on to the estimated ATE, the result obtained with the univariate probit model suggests that the effect of private health care insurance is not significant. However, this estimate may be biased due to the unmodelled effects. If we look at the results obtained with the SRBP models, that is models which account for unobserved confounding, private health care insurance has a significant positive impact on the probability of using private health care services. Specifically, the mixed SRBP estimate suggests that the probability of using private medical services increases by 0.38 points for an individual with private health coverage as compared to an individual without private insurance. The point estimate obtained with mixed SRBP is larger than that obtained using SRBP, although their intervals overlap. Results for the other parametric coefficients (not reported here) are in agreement with those found in the literature. The change in the correlation coefficient and ATE of mixed SRBP suggests that decomposing the disturbance in the model into a part attributed to endogeneity and another attributed to overdispersion might have led to a more accurate estimate of the effect of interest. Figure 2 shows the impacts of age, inc and bmi for the treatment and outcome equations obtained using the mixed SRBP model. These results support the presence of nonlinear effects in the outcome equation.

image

Figure 2. Function estimates in the health care study, on the scale of the respective linear predictors, obtained using a mixed SRBP model. Dashed lines represent 95% Bayesian confidence intervals. The plots in the two panels show the estimated smooth terms of age, inc and bmi for the treatment and outcome equations, respectively. The numbers in brackets in the y-axis captions are the estimated degrees of freedom of the smooth curves.

Download figure to PowerPoint

In summary, if we employ a univariate probit model to estimate the effect of private health insurance, the impact appears not to be statistically significant. However, this result is likely to be biased by the presence of unobserved confounding and overdispersion. The estimates obtained with the SRBP models, which account for unobserved confounding, are likely to be more realistic, with mixed SRBP also accounting for overdispersion.

6. Discussion

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Model specification
  5. 3. Methods
  6. 4. Simulation study
  7. 5. Empirical illustration
  8. 6. Discussion
  9. Appendix A:: Observed information matrix
  10. Appendix B:: Additional simulation results
  11. Acknowledgements
  12. References

In this paper, we introduce an algorithm for the simultaneous estimation of the equations of a semiparametric recursive bivariate probit model with nonparametric mixing. Estimation is carried out by maximising a penalised likelihood function using an Expectation-Maximisation algorithm. We also address the issues of automatic multiple smoothing parameter selection and inference. Results from our simulation study suggest that the approach is effective for estimating the effect of an endogenous binary predictor on a binary outcome. Interestingly, the model neglecting overdispersion yields average treatment effect estimates exhibiting substantial bias only for the case of bimodal random effects densities. However, this is not true for the estimation of the parameter linking the two model equations (which is important for ascertaining the presence of endogeneity), where substantial bias is observed in all simulation settings. The methodology was illustrated using data from a survey on private medical care utilization. For this application, differences in the point estimates of the average treatment effects were found between the models with and without random effects, and a classic univariate probit model.

Maximum likelihood estimators are typically sensitive to model error misspecifications. This creates a need for considering different joint distributions of the model errors. A copula approach can be used to that end (e.g. Nelsen 2006). As for the nonparametric approach to the estimation of the random effects distribution, although it yields reasonably efficient parameter estimates, it has several drawbacks. For instance, the resulting discrete estimate of the distribution is not satisfactory as it is more likely to be continuous than discrete. A more relevant drawback is the amount of information required to obtain an accurate estimate of the nonparametric mixing distribution (Carroll & Hall 1988), which can ultimately affect the precision of the effect of interest. We plan on extending the approach presented here in order to include random effects generated by flexible densities that avoid the restrictive assumption of normality but also allow for smooth estimates of the random effects densities. Such densities can, for instance, be represented by mixtures of Gaussians.

Appendix A:: Observed information matrix

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Model specification
  5. 3. Methods
  6. 4. Simulation study
  7. 5. Empirical illustration
  8. 6. Discussion
  9. Appendix A:: Observed information matrix
  10. Appendix B:: Additional simulation results
  11. Acknowledgements
  12. References

We briefly describe the method we used to obtain the observed information matrix. First, the log-likelihood, inline image, of the hierarchical model is written in terms of both the observed data and random effects, as inline image, which for the sake of notational convenience is expressed as inline image. From this, we obtain the score function, inline image as

  • display math

where inline image and inline image have the obvious definitions. Note that the above score function and the one obtained by differentiating (7) are exactly the same (except for the penalty term). Now, the observed information matrix inline image, is obtained as

  • display math

with expressions involving model parameters being evaluated at parameter estimates obtained at convergence of the fitting algorithm.

Appendix B:: Additional simulation results

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Model specification
  5. 3. Methods
  6. 4. Simulation study
  7. 5. Empirical illustration
  8. 6. Discussion
  9. Appendix A:: Observed information matrix
  10. Appendix B:: Additional simulation results
  11. Acknowledgements
  12. References
Table 4. Percentage biases and root mean squared errors of the estimated average treatment effects (ATEs) obtained using the semiparametric recursive bivariate probit model without and with random effects (SRBP and mixed SRBP, respectively)
   inline image inline image
  NMNGNMNG
inline image MethodBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSE
  1. Notes: Letters N, MN and G stand for bivariate normal, mixture of normals and gamma variates for the random effects. True ATE values are -0.43, -0.15 and -0.45 for the N, MN and G cases. inline image was set 1 and inline image was set to 2000 and 6000; these produced sample sizes equal to 2000 and 6000. Results are based on 200 replications. See Section 4.1 for further details.

0.1SRBP-2.890.040-62.670.104-3.900.041-2.690.028-64.710.099-3.940.033
mixed SRBP1.760.0807.890.1171.960.0800.870.0785.350.1090.990.076
0.5SRBP-2.320.035-46.410.081-3.510.037-2.290.028-46.920.072-3.490.029
mixed SRBP2.020.0752.310.0931.540.0611.560.0761.970.0881.230.056
0.9SRBP-2.180.027-30.440.057-4.070.033-2.210.019-31.550.043-5.110.028
mixed SRBP1.470.0463.200.0560.310.0451.220.0372.820.0520.440.039
-0.1SRBP-3.080.042-71.310.118-3.880.042-3.120.037-73.850.094-3.680.037
mixed SRBP2.810.0903.700.1212.100.0802.530.0853.170.1082.000.081
-0.5SRBP-3.140.042-91.010.147-4.330.045-3.470.036-98.570.136-4.950.036
mixed SRBP2.880.0894.210.1401.460.0771.960.0813.750.1281.410.069
-0.9SRBP-4.220.043-110.130.177-3.320.040-5.560.038-99.780.152-4.410.034
mixed SRBP2.810.088-3.180.152-0.650.0642.080.079-2.250.143-0.710.062
Table 5. Percentage biases and root mean squared errors of the estimated correlations between the model errors (inline image) obtained using the semiparametric recursive bivariate probit model without and with random effects (SRBP and mixed SRBP, respectively)
   inline image inline image
  NMNGNMNG
inline image MethodBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSEBias (%)RMSE
  1. Notes: Letters N, MN and G stand for bivariate normal, mixture of normals and gamma variates for the random effects. See the caption of Table 4 for further details.

0.1SRBP78.820.117755.050.75634.600.09473.450.087747.050.68635.600.082
mixed SRBP-7.940.3648.010.421-8.410.329-6.130.2996.570.412-5.410.314
0.5SRBP1.430.07481.950.411-0.440.0722.020.06879.830.385-2.750.068
mixed SRBP0.950.2798.970.308-0.020.3160.830.2676.830.299-0.450.286
0.9SRBP-8.070.0846.920.064-5.740.068-8.450.0696.570.059-6.140.054
mixed SRBP-2.230.104-0.620.167-1.110.108-2.130.094-0.740.151-0.580.087
-0.1SRBP-116.270.145-927.660.929-57.400.108-108.540.129-947.780.867-54.730.096
mixed SRBP9.150.396-11.280.43313.060.3887.450.361-9.570.44511.310.347
-0.5SRBP-39.060.210-254.341.273-20.060.129-41.270.187-275.841.135-19.650.109
mixed SRBP3.970.289-12.950.4399.750.2442.670.263-10.650.4119.240.219
-0.9SRBP-29.870.276-179.841.620-18.690.177-31.250.263-187.651.434-16.650.169
mixed SRBP-11.170.189-14.350.408-1.280.065-9.380.177-10.240.396-1.540.066

References

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Model specification
  5. 3. Methods
  6. 4. Simulation study
  7. 5. Empirical illustration
  8. 6. Discussion
  9. Appendix A:: Observed information matrix
  10. Appendix B:: Additional simulation results
  11. Acknowledgements
  12. References
  • Agresti, A., Caffo, B. & Ohman-Strickland, P. (2004). Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Comput. Statist. Data Anal. 47, 639653.
  • Bhattacharya, J., Goldman, D. & McCaffrey, D. (2006). Estimating probit models with self-selected treatments. Statist. Med. 25, 389413.
  • Brugiavini, A., Jappelli, T. & Weber, G. (2002). The survey on health, aging and wealth. CSEF Working Papers 86, Centre for Studies in Economics and Finance (CSEF), University of Naples, Italy. Available from URL: http://ideas.repec.org/p/sef/csefwp/86.html [Last accessed 24 August 2013].
  • Buchmueller, T., Grumbach, K., Kronick, R. & Kahn, J. (2005). The effect of health insurance on medical care utilization and implications for insurance expansion: a review of the literature. Med. Care Res. Rev. 62, 330.
  • Carroll, R.J. & Hall, P. (1988). Optimal rates of convergence for deconvolving a density. J. Amer. Statist. Assoc. 83, 11841186.
  • Chen, J., Zhang, D. & Davidian, M. (2002). A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution. Biostatistics 3, 347360.
  • Chib, S. & Greenberg, E. (2007). Semiparametric modeling and estimation of instrumental variable models. J. Comput. Graph. Statist. 16, 86114.
  • Craven, P. & Wahba, G. (1979). Smoothing noisy data with spline functions. Numer. Math. 31, 377403.
  • Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. Ser. B Statist. Methodol. 39, 122.
  • Duchon, J. (1977). Splines minimizing rotation-invariant semi-norms in solobev spaces. In ‘Construction Theory of Functions of Several Variables, eds. W. Schemp & K. Zeller, pp. 85100. Springer: Springer.
  • Fabbri, D. & Monfardini, C. (2003). Public vs. private health care services demand in italy. Giornale degli Economisti 62, 93123.
  • Follmann, D.A. & Lambert, D. (1989). Generalizing logistic regression by nonparametric mixing. J. Amer. Statist. Assoc. 84, 295300.
  • Goldman, D., Bhattacharya, J., McCaffrey, D., Duan, N., Leibowitz, A., Joyce, G. & Morton, S. (2001). Effect of insurance on mortality in an hiv-positive population in care. J. Amer. Statist. Assoc. 96, 883894.
  • Greene, W.H. (2012). Econometric Analysis. New York: Prentice Hall.
  • Gu, C. (1992). Cross validating non-gaussian data. J. Comput. Graph. Statist. 1, 169179.
  • Gu, C. (2002). Smoothing Spline ANOVA Models. London: Springer-Verlag.
  • Hastie, T. & Tibshirani, R. (1993). Varying-coefficient models. J. R. Statist. Soc. Ser. B Statist. Methodol. 55, 757796.
  • Heagerty, P.J. (1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics 55, 688698.
  • Heagerty, P.J. & Kurland, B.F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika 88, 973985.
  • Heckman, J. (1978). Dummy endogenous variables in a simultaneous equation system. Econometrica 46, 931959.
  • Holly, A., Gardiol, L., Domenighetti, G. & Bisig, B. (1998). An econometric model of health care utilization and health insurance in switzerland. Eur. Econ. Rev. 42, 513522.
  • Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixing distribution. J. Amer. Statist. Assoc. 73, 805811.
  • Lesperance, M.L. & Kalbfleisch, J.D. (1992). An algorithm for computing the nonparametric MLE of a mixing distribution. J. Amer. Statist. Assoc. 87, 120126.
  • Lindsay, B.G. (1983). The geometry of mixture likelihoods, Part II: the exponential family. Annals Statist. 11, 783792.
  • Louis, T.A. (1982). Finding the observed information matrix when using the EM algorithm. J. R. Statist. Soc. Ser. B Statist. Methodol. 44, 226233.
  • Maddala, G.S. (1983). Limited Dependent and Qualitative Variables in Econometrics. Cambridge: Cambridge University Press.
  • Marra, G. & Radice, R. (2011a). Estimation of a semiparametric recursive bivariate probit model in the presence of endogeneity. Canad. J. Statist. 39, 259279.
  • Marra, G. & Radice, R. (2011b). A flexible instrumental variable approach. Statist. Model. 11, 581279.
  • Marra, G. & Radice, R. (2013). SemiParBIVProbit: Semiparametric Bivariate Probit Modelling. R package version 3.2-8. Available from URL: http://CRAN.R-project.org/package=SemiParBIVProbit [Last accessed 24 August 2013.]
  • Marra, G. & Wood, S.N. (2012). Coverage properties of confidence intervals for generalized additive model components. Scand. J. Statist. 39, 5374.
  • Monfardini, C. & Radice, R. (2008). Testing exogeneity in the bivariate probit model: a monte carlo study. Oxford B. Econ. Statist. 70, 271282.
  • Nelsen, R. (2006). An Introduction to Copulas. New York: Springer.
  • Neuhaus, J.M., Hauck, W.W. & Kalbfleisch, J.D. (1992). The effects of mixture distribution misspecification when fitting mixed-effects logistic models. Biometrika 79, 755762.
  • Papageorgiou, G. & Hinde, J. (2012). Multivariate generalized linear mixed models with semi-nonparametric and smooth nonparametric random effects densities. Statist. Comput. 22, 7992.
  • R Development Core Team. (2013). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0. Available from URL: http://www.R-project.org [Last accessed 24 August 2013.]
  • Ruppert, D., Wand, M.P. & Carroll, R.J. (2003). Semiparametric Regression. New York: Cambridge University Press.
  • Silverman, B. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve fitting. J. R. Statist. Soc. Ser. B Statist. Methodol. 47, 152.
  • Verbeke, G. & Lesaffre, E. (1996). A linear mixed-effects model with heterogeneity in the random-effects population. J. Amer. Statist. Assoc. 91, 217221.
  • Wahba, G. (1983). Bayesian ‘confidence intervals’ for the cross-validated smoothing spline. J. R. Statist. Soc. Ser. B Statist. Methodol. 45, 133150.
  • Wilde, J. (2000). Identification of multiple equation probit models with endogenous dummy regressors. Econom. Lett. 69, 309312.
  • Wood, S.N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc. 99, 673686.
  • Wood, S.N. (2006). Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC, London.
  • Wooldridge, J.M. (2010). Econometric Analysis of Cross Section and Panel Data. Cambridge: MIT Press.