SEARCH

SEARCH BY CITATION

Keywords:

  • EM algorithm;
  • multiple imputation;
  • non-ignorable missing mechanism

Summary

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Basic setup
  5. 3. Fractional imputation with a follow-up
  6. 4. Variance estimation
  7. 5. Simulation study
  8. References

Incomplete data subject to non-ignorable non-response are often encountered in practice and have a non-identifiability problem. A follow-up sample is randomly selected from the set of non-respondents to avoid the non-identifiability problem and get complete responses. Glynn, Laird, & Rubin analyzed non-ignorable missing data with a follow-up sample under a pattern mixture model. In this article, maximum likelihood estimation of parameters of the categorical missing data is considered with a follow-up sample under a selection model. To estimate the parameters with non-ignorable missing data, the EM algorithm with weighting, proposed by Ibrahim, is used. That is, in the E-step, the weighted mean is calculated using the fractional weights for imputed data. Variances are estimated using the approximated jacknife method. Simulation results are presented to compare the proposed method with previously presented methods.


1. Introduction

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Basic setup
  5. 3. Fractional imputation with a follow-up
  6. 4. Variance estimation
  7. 5. Simulation study
  8. References

Missing data often arise in sample surveys. When non-response is not related directly to the missing values, the mechanism is called missing at random (MAR), as defined by Rubin (1976). In practice, however, non-response is often directly related to the values of the missing variable, even after adjusting for the auxiliary variables. For example, in exit polls, people are less likely to respond to questions about the party that they voted for if the party is not very popular. The missing data mechanism is considered non-ignorable when the non-response is directly related to the values of the missing variable.

Under MAR, Chen & Fienberg (1974) suggested a maximum likelihood method of parameter estimation for two-dimensional categorical data. Fuchs (1982) focused on the maximum likelihood estimation of log-linear models using the expectation-maximisation (EM) algorithm which was proposed by Dempster, Laird & Rubin (1977). Little & Schluchter (1985) suggested the maximum likelihood estimation method for mixed continuous and categorical data using the EM algorithm to estimate the regression coefficients in the generalised linear model. Schafer (1987) analyzed data with covariates which were measured with errors in the generalised linear model, and used the EM algorithm to estimate the regression coefficients in the generalised linear model. Ibrahim (1990) proposed the EM-method by weighting for generalised linear models with missing categorical data. For non-ignorable missing data, however, some parameters may not be identifiable, and are termed non-identifiable. Nordheim (1984) allowed the probability of uncertain classification to depend on category identity and analyzed the data obtained from a genetic study on Turner’s syndrome. Baker & Laird (1988) examined the incomplete contingency tables under non-ignorable non-response and fitted them into a log-linear model. Little (1993) considered pattern mixture models and analyzed non-ignorable missing data under some restrictions, or by using prior information. Glynn, Laird & Rubin (1993) used a follow-up sample to avoid the non-identifiability problem and analyzed non-ignorable missing data using the pattern mixture model. Park & Brown (1997) examined non-ignorable missing categorical data under a log-linear model. To avoid the non-identifiability problem, they restricted the boundary of parameters and used the maximum likelihood estimation method with constraints. Ibrahim & Lipsitz (1999) considered the generalised linear regression problem with non-ignorable missing covariates using the weighting scheme proposed by Ibrahim (1990).

In this paper, to overcome the non-identifiability problem for non-ignorable missing data, a follow-up sample is used that is randomly selected from the non-respondents. To estimate parameters, the mean score approach using fractional weights for imputed data, is proposed, and the maximum likelihood estimators are obtained via the EM algorithm using the method of weights proposed by Ibrahim (1990). To estimate variance, the one-step jackknife varianace estimation method is used. This paper is organised as follows. In Section 2, the basic setup is introduced. In Section 3, we propose the parameter estimation method for non-ignorable categorical missing data with a follow-up sample under the selection model. In Section 4, the variance estimation method is discussed. In Section 5, some simulation results are presented.

2. Basic setup

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Basic setup
  5. 3. Fractional imputation with a follow-up
  6. 4. Variance estimation
  7. 5. Simulation study
  8. References

Suppose that inline image are independent and identically distributed (i.i.d.) realisations of p-dimensional auxiliary variables that are always observed, and that inline image are also i.i.d. realisations of the univariate categorical random variable y from a parametric distribution with density inline image, parameterised by a q-dimensional unknown parameter inline image Without loss of generality, it may be assumed that y is scalar, even though it can be extended directly to the case of a vector y of categorical variables. The parameter of interest is inline image, and under complete response, the likelihood function for inline image is

  • image

When missing data occur, the sample inline image is decomposed as inline image, where AR is the set of respondents and AM is the set of non-respondents. The response indicator variable r is defined as

  • image

for inline image. Let the density of ri be

  • image

Specifically, it is assumed that inline image is the density function from a Bernoulli distribution parameterised by inline image. That is,

  • image

where inline image is a known function up to inline image. Thus, the response probability is allowed to depend on y as well as inline image, so that the non-response mechanism is non-ignorable. The observed likelihood function is then

  • image(1)

Our purpose is to estimate inline image. Note that the observed likelihood function depends on the missing mechanism. When the missing mechanism is MAR, the observed likelihood function in (1) is factorised as

  • image(2)

where

  • image

and

  • image

Thus, under MAR, inline image need not be estimated in order to estimate inline image. However, under the non-ignorable missing mechanism, the observed likelihood function cannot be expressed as (2), and inline image and inline image must be estimated simultaneously.

Nonetheless, under non-ignorable missing data, the parameters in (2) may not be identified. To illustrate the non-identifiability problem, suppose that y is the dichotomous response variable with values 0 or 1, x is the auxiliary variable which is fully observed, and r is the response indicator random variable. Define

  • image

where inline image, for inline image. Then,

  • image

Define also inline image if yi=0 and inline image if yi=1. The observed likelihood function of inline image is then

  • image

Under the logistic regression model,

  • image(3)

where inline image and c is a constant. In (3), there are more parameters than the number of sufficient statistics, so, the parameters are not identifiable.

To overcome the non-identifiability problem, further assumptions are made on the parameters. Glynn et al. (1993) suggested estimating the parameters using a follow-up sample under a pattern mixture model based on the multiple imputation method. We discuss the parametric fractional imputation method with follow-up data under a selection model in the next section.

3. Fractional imputation with a follow-up

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Basic setup
  5. 3. Fractional imputation with a follow-up
  6. 4. Variance estimation
  7. 5. Simulation study
  8. References

Fractional imputation, proposed by Kalton & Kish (1984) and further developed by Kim & Fuller (2004), is extended here to make inference under non-ignorable missing data. In fractional imputation, more than one imputed value is created for each missing item, and a fractional weight is assigned to each imputed value. Let inline image be M possible values of yi. In fractional imputation, inline image is the jth imputed value for the missing yi, and inline image is a fractional weight assigned to inline image satisfying inline image.

To compute the maximum likelihood estimator, we use a follow-up sample to observe yi and assume that there is no non-response in the follow-up sample. Let inline image be the index set of the follow-up data. Assume that AV is randomly selected from the non-respondents by Bernoulli sampling. Define inline image. The sample inclusion indicator random variable inline image for the follow-up sample is defined by

  • image

For inline image, we observe inline image. For inline image, we observe inline image. We observe inline image for inline image. Assume that

  • image

for some function inline image known up to inline image. For the follow-up sample mechanism, we have

  • image

where inline image is a known constant predetermined at the design stage of the follow-up survey. The probability mass function in (4) is

  • image

because inline image.

In this setup, the observed likelihood function may be written as

  • image

where

  • image(4)

and

  • image

In (4), the conditional distribution of yi given inline image is denoted by inline image where inline image is a vector of parameters that specifies the conditional distribution, and inline image for inline image. Writing inline image, the observed likelihood function for inline image is

  • image

To find the maximum likelihood estimator inline image that maximises the observed likelihood inline image, we have to solve

  • image(5)

We use the mean score equation instead of using (5), defined by

  • image(6)

where inline image is the score function of inline image under complete response, inline image,  yobs is the observed part of inline image, and inline image. Louis (1982) proved the equivalence of (5) and (6).

The full sample score function is defined as

  • image

where

  • image(7)

and

  • image(8)

Note that inline image is a function of the unknown parameter set inline image. In the EM algorithm, the mean score equation in (6) can be solved iteratively by

  • image(9)

With categorical data, as noted by Ibrahim (1990), the conditional expectation in (9) is viewed as the weighted mean. Thus, the mean score function in (9) is called the weighted mean score function using fractional imputation. The weighted mean score function in (9) can be partitioned as

  • image

where

  • image

with inline image defined in (7), and

  • image

with inline image defined in (8). Here, inline image is used as the fractional weight assigned to inline image in fractional imputation. In the M-step, we can update the parameters by solving

  • image

The parametric fractional imputation algorithm for missing categorical data using follow-up data is described as follows.

Step 1 Let inline image be the current parameter estimate values. We denote inline image. Then inline image is the fractional weight. By Bayes' rule, we compute the fractional weight as

  • image(10)

Step 2 Solve the weighted mean score equation using the fractional weight obtained in Step 1. That is, solve

  • image

where

  • image

and

  • image

Step 3 Go to Step 1 until inline image converges.

The approach using the weighted mean score equation based on fractional imputation is computationally attractive. In fractional imputation in categorical data, we only need to impute a finite number of values, and the number of imputed values is equal to the number of categories. To each imputed value, we assign the fractional weight which is computed from the estimated conditional probability of obtaining the imputed value given other information. That is, if the jth imputed value of yi is equal to inline image, the fractional weight assigned to inline image is

  • image

where inline image is the maximum likelihood estimator of inline image that maximises the observed likelihood function in (1). The final fractional weight is assigned to the fractionally imputed data values, and the augmented data can be used just as complete data.

If the augmented data can be obtained with the fractional weight, we can estimate various parameters such as domain estimates and proportions. Because of the availability of full data values and their weights, the fractional weights only need to be used to estimate the parameters.

4. Variance estimation

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Basic setup
  5. 3. Fractional imputation with a follow-up
  6. 4. Variance estimation
  7. 5. Simulation study
  8. References

Variance estimation can be performed by a replication method. Under complete response, let inline image be the kth replication weight for unit i. Assume that the replicate variance estimator

  • image

where inline image, and inline image, is consistent for the variance of inline image. For example, for inline image, the jackknife replication weight is defined as

  • image

and ck= (n−1)/n. Now, consider variance estimation of the parameter estimates from the fractional imputation method described in Section 3. To use the replication method for variance estimation, we need the replicated fractional weights. Thus, inline image, the kth replicates of inline image, are required to compute the replicated fractional weights. We let inline image denote the point estimators for inline image. Let

  • image(11)

where inline image is the replicated weight for unit i and inline image is the fractional weight for the jth imputed data value shown in (10) when inline image. To compute inline image, we need to solve the replicated score equation inline image. But this is computationally heavy because the score functions in (11) are usually nonlinear and have to be solved iteratively. So, we consider a one-step approximation method using a Taylor linearisation for obtaining inline image. For the score functions in (11), we have

  • image(12)

Taylor expansion of (12) gives

  • image(13)

By (13), we can get

  • image(14)

where

  • image

The approximation formula (14) can be implemented as

  • image

where inline image is the estimator of the kth observed information matrix for inline image. Louis (1982) showed that the observed information matrix for inline image is expressed as

  • image(15)

where inline image is the information matrix of inline image, inline image, and inline image means that inline image. Using (15), inline image is computed as

  • image

where

  • image

and inline image. The consistency of the variance estimator follows by the argument of Rao & Tausi (2004).

5. Simulation study

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Basic setup
  5. 3. Fractional imputation with a follow-up
  6. 4. Variance estimation
  7. 5. Simulation study
  8. References

To compare parametric fractional imputation with other existing methods, we have performed a small simulation study.

5.1. Simulation setup

In the simulation study, we used B=3000 Monte Carlo samples of size n=300 from

  • image

where logit (pi) = 0.5 +xi, and the response indicator variable ri is distributed as

  • image

where inline image for inline image. The average response rate is about 57%. Let inline image be the sample inclusion indicator for the follow-up survey with

  • image

where inline image.

The parameters of interest are:

  • 1
    p : the probability of y=1, and
  • 2
    inline image : regression coefficients for the logistic regression of r on (1, x, y).

For each parameter, we have computed four estimators:

  • 1
    The complete sample estimator,
  • 2
    The fractional imputation estimator (FIE), and
  • 3
    The multiple imputation estimator (MIE) with M=10 and M=100 imputations for each missing value.

In fractional imputation, as y is a dichotomous random variable, only two values are imputed, y=0 and y=1. Fractional weights are assigned by formula (10). The parameters p and inline image are specified in the selection model. For fractional imputation, the one-step jackknife method is used for variance estimation. For multiple imputation, the variance estimator is

  • image

where

  • image

inline image is the variance estimator under the kth imputed sample, and

  • image

5.2. Simulation results

In the simulation study, we computed the Monte Carlo (MC) means and variances of the point estimators. Table 1 presents the MC means, the MC variances, and the MC standardised variances of the point estimators with 26% follow-up. The MC standardised variance is computed by dividing the MC variance of the corresponding point estimator by that of the complete sample estimator.

Table 1.  Mean, variance and standardised variance of the point estimators with 26%follow-up.
ParameterMethodMeanVarianceStd Var (Standardised variance)
 
  1. FIE indicates the fractional imputation estimator; MIE indicates the multiple imputation estimator.

p Complete sample0.450.00083100
 FIE0.450.00166200
 MIE(M=10)0.450.00174211
 MIE(M=100)0.450.00169204
inline image Complete sample−1.530.18854100
 FIE−1.530.20674104
 MIE(M=10)−1.500.19564101
 MIE(M=100)−1.500.19469100
inline image Complete sample0.510.01780100
 FIE0.510.01816101
 MIE(M=10)0.500.01758 99
 MIE(M=100)0.500.01753 99
inline image Complete sample0.710.06158100
 FIE0.730.15928258
 MIE(M=10)0.720.16993276
 MIE(M=100)0.720.16250264

Table 1 shows that the point estimators are all approximately unbiased. The theoretical variance of the complete sample estimator of p is inline image, which is consistent with the simulation result in Table 1. The theoretical variance of the fractional imputation sample estimator of p is inline image. The simulation result in Table 1 shows a greater variance for inline image, which is partly explained by the large variance in the estimation of inline image; only inline image samples are used to estimate inline image. In Table 1, the FIEs show smaller variances than the MIEs for all parameters.

The FIE is more efficient than the MIE. The FIE is a deterministic imputation method whereas the MIE is a stochastic imputation method. Because the MIE is a stochastic imputation estimator, the MIE is subject to the imputation variance that is of order M−1. Thus, letting inline image will remove the imputation variance of the MIE. For variance estimation, relative bias is computed by dividing the difference of the expected value of the variance estimator and the variance by the variance of the point estimator. Table 2 shows that the t-statistics are not significant at the 95% level and the variance estimators are asymptotically unbiased.

Table 2.  Monte Carlo relative biases and t-statistics of the variance estimators for the imputation estimators with 26%follow-up.
ModelParameterMethodRelative Bias(%)t-statistics
 
  1. SM indicates selection model; App. JK indicates the approximated jackknife estimation method; FIE is the fractional imputation estimator; MIE is the multiple imputation estimator.

SM inline image App. JK(for FIE)−2.35−0.83
  MIE(M = 100)−3.43−1.32

References

  1. Top of page
  2. Summary
  3. 1. Introduction
  4. 2. Basic setup
  5. 3. Fractional imputation with a follow-up
  6. 4. Variance estimation
  7. 5. Simulation study
  8. References