Multiple imputation for discrete data: Evaluation of the joint latent normal model

Abstract Missing data are ubiquitous in clinical and social research, and multiple imputation (MI) is increasingly the methodology of choice for practitioners. Two principal strategies for imputation have been proposed in the literature: joint modelling multiple imputation (JM‐MI) and full conditional specification multiple imputation (FCS‐MI). While JM‐MI is arguably a preferable approach, because it involves specification of an explicit imputation model, FCS‐MI is pragmatically appealing, because of its flexibility in handling different types of variables. JM‐MI has developed from the multivariate normal model, and latent normal variables have been proposed as a natural way to extend this model to handle categorical variables. In this article, we evaluate the latent normal model through an extensive simulation study and an application on data from the German Breast Cancer Study Group, comparing the results with FCS‐MI. We divide our investigation in four sections, focusing on (i) binary, (ii) categorical, (iii) ordinal, and (iv) count data. Using data simulated from both the latent normal model and the general location model, we find that in all but one extreme general location model setting JM‐MI works very well, and sometimes outperforms FCS‐MI. We conclude the latent normal model, implemented in the R package jomo, can be used with confidence by researchers, both for single and multilevel multiple imputation.

As discussed in Carpenter and Kenward (2013) ch. 3-5, there are a number of different ways in which we can impute the missing data. In most situations, data are missing in multiple variables, i.e. in both outcome and covariates in the substantive scientific model. In these cases, there are two main strategies for imputing the missing values: full conditional specification multiple imputation (FCS-MI) and joint modelling multiple imputation (JM-MI). FCS-MI (van Buuren, Brand, Groothuis-Oudshoorn, & Rubin, 2006), also known as imputation by chained equations (ICE), builds on the idea of imputing the missing values for each partially observed variable by setting up a univariate model fully conditional on all the other variables. By contrast, JM uses a joint multivariate model for all the partially observed data. Once the joint distribution has been defined, a Gibbs sampler is used to update the parameters of this model and the missing data are imputed from the proper conditional distribution given the observed data and the current values of the parameters.
As long as the joint imputation model is (reasonably) compatible with the analysis model, JM imputation is the theoretically best way of imputing the data and it leads to unbiased inference under MAR. When FCS is used, the joint model for the data is only defined implicitly. Care needs to be taken to ensure the various conditional specifications are not inconsistent. Further, establishing conditions for the validity of the method is more difficult, although the method has been shown to perform well in practice. However, recently, conditions for equivalence of FCS to JM have been explored in two different studies (Hughes et al., 2014;Liu, Gelman, Hill, Su, & Kropko, 2013), which broadly concluded that: • the two methods are equivalent when the joint model is multivariate normal; • the two methods are equivalent when the so-called "noninformative margins" condition holds, otherwise an order effect occurs, and • in real data the magnitude of this order effect is usually small enough to be considered negligible.
Therefore the two methods seem to be interchangeable in most situations. However, FCS has become more popular than JM and the main reasons for its success are: • its relative flexibility for accommodating different kinds of variables; e.g. linear regression can be used for the univariate conditional model to impute continuous variables, while a logistic model can be used for binary data, a Poisson for count data, etc., and • the availability of well-maintained and accessible software packages, for example ICE (now MI impute chained, Stata) and mice (R, van Buuren and Groothuis-Oudshoorn (2011)).
Finding an appropriate joint model with noncontinuous variables, for example binary or categorical variables, is more challenging. Lee and Mitra (2016) proposed a method based on sequential generalized linear models, while Schafer (1997) proposed a strategy based on a general location model. On the software front, some good packages for JM-MI are available, but to the best of our knowledge none of them is able to handle a mix of partially observed continuous and categorical variables in a way that naturally extends to a multilevel setting. One approach that has been suggested is to simply treat binary variables as continuous in the imputation model (see the discussion in Carpenter & Kenward, 2013, ch. 4-5). This can be a reasonable approach when the partially observed binary variables are included as covariates in the substantive analysis model. However, a proportion of the imputed values will fall outside of the 0∕1 set of values. Rounding can be used, but its use has been discouraged and shown to be prone to bias in certain settings (Horton, Lipsitz, & Parzen, 2003).
A more theoretically sound approach is to make use of latent normal variables to handle noncontinuous data, as suggested in Goldstein, Carpenter, Kenward, and Levin (2009). We developed the R package jomo (Quartagno & Carpenter, 2014), which allows for the imputation of a mix of continuous, binary or nominal data under this model. The aim of this paper is to explore, via a comprehensive set of simulations, the validity and practical utility of the latent normal model for imputing a mix of partially observed continuous and categorical data.
The article is structured as follows. In Section 2, we describe the two approaches we compare, namely imputation by joint modeling and full conditional specification. In Section 3, we report the results of a series of increasingly challenging simulation studies, designed to establish the validity of the latent normal approach to categorical variables in increasingly complex models. Section 4 illustrates the approaches using data derived from a cohort study from the German Breast Cancer Study Group. We conclude with a discussion in Section 5. All the simulations reported here were carried out with the freely available R-package jomo (Quartagno, Grund, & Carpenter, 2018).

METHODS
We now give an overview of Joint Modelling Multiple Imputation (JM-MI), presenting the general form of the imputation model which we evaluate in the subsequent simulation studies.
Suppose we intend to collect observations on variables , but we end up with missing values in each (or at least some) of these variables. To use JM-MI to impute these missing data, the first step is to set up the joint model for the partially observed data. If the data are plausibly multivariate normal, we use the multivariate normal model: where coefficients are the fixed effect parameters, the error terms and Ω is the unstructured variance-covariance matrix of the residuals. When some of the variables are binary or categorical,  proposed a natural extension via latent normal variables, based on a previous proposal by Albert and Chib (1993). To understand the approach, suppose that , is a binary variable, and that we include in the model not , , but instead two latent normal variables, one for each level of , . Denote these two normal variables by ,1, and ,2, , where , = 1 if ,2, > ,1, and 0 otherwise. Model (1) becomes: (2) Unfortunately, this model is nonidentifiable (because we have replaced a binary variable, with one parameter, by two latent normals whose joint distribution has five parameters). However, this can be addressed with two simple tweaks: (i) we fix the value of the variance for the latent normals to an arbitrary value, say for example to 0.5, and (ii) we subtract the equation for ,2, from the one for ,1, : where the variance of the remaining latent normal is now 1, because it is obtained from the subtraction of two normals with variance 0.5. In this new formulation, it follows that binary , = 1 if , > 0 and 0 otherwise. The same reasoning extends naturally to -level unordered categorical variables. In the resulting model, we have − 1 latent normals, each of which has a fixed variance of 1, and covariance with the other latent normal variables of 0.5. For example, in a situation where we have a continuous variable, a 4-level categorical variable and a binary variable, the latent normal model is: ,1 = 0,1 + ,1 ,2,1 = 0,2,1 + ,2,1 ,2,2 = 0,2,2 + ,2,2 ,2,3 = 0,2,3 + ,2,3 , 1 This model has two sets of parameters: the fixed effect parameters = ( 0,1 , 0,2,1 , 0,2,2 , 0,2,3 , 0,3 ) and the covariance matrix . Fitting this model in the Bayesian framework using MCMC allows imputation of any missing values of the variables under the Missing At Random (MAR) assumption using the data-augmentation approach (Tanner & Wong, 1987). We: 1. draw values of the parameters, given the data and the priors, using MCMC (Gibbs sampling where possible), then 2. draw the missing values given the observed data and current parameter draws.
As usual with MI, we run (update) this algorithm until it has stochastically converged, then keep the current draws of the missing values. Together with the observed data these make the first imputed dataset. Then, after a further set of updates, the current draws of the missing values are retained with the observed data to make the second imputed dataset, and so on. We choose the number of updates so that, conditional on the observed data, draws of the missing data in successive imputed datasets are independent.
We use uninformative priors (flat priors for the fixed parameters, ) to give the greatest weight to the data. At each step of the algorithm, we need to know the proper conditional distributions from which to draw the new parameter values. Unfortunately, because of the constraints in the covariance matrix, we cannot draw a new value for the covariance matrix from a known distribution. Therefore, we rely on a Metropolis-Hastings step to update it element-wise, following Browne (2006).
Additionally, with the latent normal model, at each step we need to draw new values for the latent normal variables, ,2, and ,3 . This is done using a rejection sampling step: for each categorical variable in turn, for each individual, proposed values of the associated latent normal variables are drawn from the corresponding conditional normal distribution given the other observed data and draws of the missing data (or associated latent normals). This is repeated till the draws of the latent normals are consistent with the observed categorical value. For example, in the model above, if the 4-level categorical variable ,2 = 4, we draw the latent normal triple ( ,2,1 , ,2,2 , ,2,3 ) from the conditional normal distribution given ( ,1 , ,3 ) (at the current parameter values) until all three of ( ,2,1 , ,2,2 , ,2,3 ) are less than zero; we then accept this draw and move on. A detailed explanation and worked example is given in Carpenter and Kenward (2013, §5.2).
The algorithm presented in this section was developed to handle unordered categorical variables. A similar algorithm, but based on a single latent normal variable with appropriate thresholds between different categories, is again described in  and it is implemented for example in softwares REALCOM (Carpenter, Goldstein, & Kenward, 2011) and M-Plus (Muthen, Muthén, & Angeles, 2017). This should be more efficient when dealing with ordinal data, as it explicitly reflects the order between categories. However, it is not implemented in the R package jomo. One of the goals of this paper is to establish whether the use of the general algorithm for the imputation of unordered data when imputing ordinal variables is suitable, or whether the loss in efficiency is large.

SIMULATION STUDY
In this section, we present the results of a series of simulation studies designed to evaluate the use of latent-normal JM-MI with a mix of partially observed binary, categorical, ordinal, count, and continuous variables. We begin in Subsection 3.1 with a simple data generating model that matches (4). Then, we investigate what happens with different data generating mechanisms: in Subsection 3.2 we consider binary data, in Subsection 3.3 we consider categorical and ordinal data and lastly in Subsection 3.4 we consider count data.
Across all of the simulation scenarios, we present the results of the analyses with different methods in terms of mean estimates, mean standard error estimated from the models, empirical standard error (i.e. standard deviation of the simulation estimates) and coverage level. A valid method should yield unbiased results, similar model, and empirical standard errors and coverage levels close to 95%.

Matching latent normal model for data generation and imputation
Suppose we intend to collect = 1, … , = 1000 observations on three variables, continuous , 4-level categorical ,1 and binary ,2 . The data generating model is: 2 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 1 T A B L E 1 Results for subsection 3.1 Data are generated from (5), and made missing using the MCAR and MAR mechanisms described in the text. Mean, model, and empirical SE and coverage level are reported for the five model parameters in (6). where ( ,1,1 , ,1,2 , ,1,3 ) is the latent normal triple corresponding to the 4-level categorical variable ,1 , and ,2 is the latent normal corresponding to binary ,2 . Next, suppose that the substantive analysis model is the following linear model:

Mean mSE eSE Cov Mean mSE eSE Cov Mean mSE eSE Cov Mean mSE eSE Cov Mean mSE eSE Cov
where [ . ] is an indicator for the event in brackets. We (i) simulate data from (5); (ii) make some values missing; (iii) impute using (4) and fit the substantive model (6) to the imputed data and (iv) combine the results for inference using Rubin's rules.
Missing values are generated as follows. Values of are made missing completely at random (MCAR) with probability 0.2. Independently, values of ,2 are made MCAR with probability 0.2. For ,1 we explore two different missingness mechanisms: 1. MCAR with probability 0.4; 2. for individuals with observed, the probability of ,1 being missing is given by: (1 + exp(3 − )) −1 , leading to around 35% of ,1 MAR given .
Mechanism (1) is chosen to explore the extent to which JM-MI can recover information relative to a complete records analysis, given that both methods lead to valid inference under MCAR. The second mechanism involves the outcome, and hence CR estimates are expected to lead to bias; we explore whether JM-MI can successfully remove this bias. The simulation study uses 1000 replications. For each replication, after generating the data and making values missing, we apply JM-MI using the R package jomo (Quartagno & Carpenter, 2014), imputation model (4), a burn in of 500 updates and between-imputation updates of 500 to generate 20 imputed datasets. Then we fit the substantive model (6) to (a) the remaining complete records after making data missing, and to (b) each of the 20 imputed datasets, combining the results using Rubin's rules.
For comparison, we also use standard FCS-MI through the R package mice, using a linear regression model for the continuous variable, a logistic model for the binary variable and a multinomial logistic model for the categorical variable. Table 1 shows the results. When data are MCAR, complete records analysis is valid and therefore, as expected, it gives consistent estimates of the parameters. However, the SEs show that MI recovers some information. Results are similar when using either JM-MI or FCS-MI, although with FCS there is a suggestion of undercoverage for 1 and 3 .

Results
In the MAR scenario, the complete records estimates of 0 , 3 , and 4 are biased, as expected when data are MAR depending on the outcome of the analysis model. On the other hand, MI is valid under MAR, and therefore estimates should be unbiased. We see that JM-MI gives good results for all parameters; however for FCS the MI standard error (mSE) is slightly underestimated for 0 -3 , and in consequence the confidence interval coverage is slightly reduced.

Binary data
In the previous subsection, we generated data from a multivariate normal distribution, with latent normals used to determine values of the categorical and binary variables. From now on, we explore the performance of latent normal JM-MI (with its implicit probit link functions for categorical variables) when data come from different data generating models, to give some insight into how the approach will perform in actual applications. We begin with binary data.

Logistic regression
We generate data from a logistic model: Using a logistic link means this model is not strictly equivalent to the latent normal JM-MI, which implicitly uses a probit link, although the differences between the two are small. We generate data with three different sets of parameter values: In all three scenarios, data are made missing independently in all three variables with probability ∼ 0.2, so that about 0.8 3 = 0.51 of the records are complete. In particular, for and 2 we consider an MCAR mechanism, while for 1 we assume an MAR mechanism dependent on the outcome; this invalidates the complete records estimates. We generate 1000 replications, each with = 300 observations, and compare CR, JM-MI (imputation model with all variables as outcomes, i.e. similar to (4), same burn in and between imputation updates as before) and FCS-MI (using a logistic model to impute the outcome and linear regression models to impute the covariates). We create 20 imputed datasets for each of the two imputation methods.

Results
From Table 2, we can see that, in the small and medium effect scenarios, the estimates of the fixed effects after imputation are almost perfect, with negligible bias (< 1%) and standard errors always smaller than in complete records analysis, irrespective of the imputation method.
When the effects are larger, even the full data estimates suffer from some small bias (as is typical with logistic regression). However, JM-MI gives similar results to the full data and recovers information compared to CR, although the MI standard errors (mSE) are slightly smaller than the empirical standard errors (eSE). However, FCS-MI suffers from bias that causes the coverage level to fall short of the 95% nominal level, possibly because the noninformative margin condition does not hold here. Absolute bias and coverage levels with the different methods are compared in the top panels of Figure 1.

Binary covariate of regression model
Another situation where we need to impute a mix of binary and normal data is when we have some partially observed binary variables we wish to include as covariates in a linear regression. For example, suppose we observed continuous variables and 2 and binary 1 and that the substantive analysis model is: and that, consistent with (8) the data generation process is: 1 ∼ Bernoulli(0.5) 2 ∼ (0, 1) ∼ ( 0 + 1 1 + 2 2 , 1).
As before, we simulate 1000 datasets with 300 observations each, and then independently generate missing data in each of the three variables with probability ∼ 0.2 (MCAR for Y and 2 , MAR given the outcome for 1 ). We create 20 imputations with T A B L E 2 Results from data generating mechanism (7) and (9) JM-MI, with a model similar to (4), with all the variables as outcomes, and using FCS-MI. Again, we explore three different scenarios with increasing effect magnitude.

Results
The results are summarized in the bottom half of Table 2 and Figure 1. Again, in the two examples with smaller effect sizes, both MI methods behave well, giving unbiased fixed effect estimates and standard errors smaller than with complete records, reflecting the amount of information recovered by imputing the missing data. In the larger effect example, the conditional distribution of 1 given and 2 obtained from the joint model assumed by JM-MI is quite different from the one in the data generating mechanism, and hence results are not perfect. Nevertheless, bias is minimal, and the main consequence of the  (7) scenarios. Bottom panels: absolute bias (left) and model SE (right) in the estimation of 1 in the binary covariate (9) scenarios misspecification is an increase in the SEs. In this example, this means there is no information recovery relative to the CR analysis, but model and empirical SEs agree well and hence coverage remains good.

Categorical and ordinal data
In this subsection, we explore various scenarios in which we have at least one categorical variable, with ≥ 2 categories, among the partially observed variables. The levels of this type of variable can either be ordinal (e.g. items of a Likert scale, "good," "decent," "poor") or unordered categorical (e.g. ethinicity, "Asian," "black," "white").  proposed using a proportional probit model for imputing ordinal data, i.e. one based on a single latent normal with thresholds defining different levels. This is implemented in the standalone software REALCOM (Carpenter et al., 2011), but not in jomo. Therefore, at the end of this subsection, we evaluate imputing ordered categorical data using the unordered latent normal imputation model.

Results
The results, shown in the top box of Table 3, show latent normal variable approach performs well even with four categories, with all the nine parameter estimates are unbiased, coverage levels achieving the 95% nominal level, mSE's and eSE's similar and always smaller than with CR.

Categorical covariate of regression model
Here, we generate a 4-level categorical 1 : where category 1 is used as reference and [ . ] is an indicator for the event in brackets. We set ( 0 , 1 , 2 , 3 , 4 ) = (0.1, 0.1, −0.2, 0.05, 0.1). Once again, we simulate 1000 datasets of 300 observations, make data missing independently on each variables with probability ∼ 0.2 (MAR given the outcome for 1 ), and compare CR, FCS-MI, and JM-MI. For FCS-MI we use multinomial logistic regression for categorical variables and linear regression for normal variables.

Results
From Table 3 (second box), and the left panel of Figure 2, we see both JM-MI and FCS-MI give excellent results.

Results
The bottom box of Table 3 and the right panel of Figure 2 give the results. Once again, the latent normal variables JM-MI algorithm seems to perform well, with negligible information loss when the data are truly ordinal. The results give no reason to suggest that using the generic latent normal categorical algorithm will not work well even if data are truly ordinal. Moreover, using the generic algorithm avoids needing to first check whether, in fact, the ordinal model is appropriate.

Count data
For our final scenario, we consider imputing count data; as before, first when the count is an outcome and second when it is a covariate.
While some promising approaches specifically tailored for the imputation of Poisson data within the JM-MI framework have been developed (Goldstein & Kounali, 2009), they are not currently implemented in general software. Instead, count variables can either be included in the imputation model as categorical variables, or as continuous. The first option is only viable when the mean of the underlying Poisson distribution is quite low, so that a small number of distinct, low counts are observed. With large counts, the categorical approach becomes unfeasible, and instead we have to treat them as continuous in the JM-MI framework. It is generally thought that Poisson distributions with mean > 20 can be well approximated by a normal distribution. However, as in the Poisson distribution the variance depends from the mean, the variance-stabilizing square-root transformation could be helpful. Therefore, we compare the behavior of JM-MI including our count variable in the model either untransformed or square rooted. Additionally, since the link function for Poisson regression models is the logarithm, we explore the behavior of the log transformation.
As usual, we divide our analysis in two parts, first when the count variable is the dependent variable in the substantive model and second when it is a covariate.
We investigate first a situation in which 0 = 3, leading to a quite large average ∼ 20, and second one where 0 = −0.5, leading to average ∼ 0.7. For the first situation we consider (i) a small ( 1 = 2 = 0.1) covariate effect and (ii) a larger effect ( 1 = 2 = 0.3). With the larger covariate effect, for each unit increase in 1 or 2 , increases by 7.
We generate 1000 datasets each of size 300, introduce 20% MCAR data independently in and 2 , and MAR in 1 , and compare the results of full data and complete records analyses with those after MI. For FCS-MI, we use predictive mean matching to impute count variables, as Poisson imputation is not implemented in the R package mice, although an add-on package for imputing Poisson data, countimp (Kleinke & Reinecke, 2013), has been recently published. For JM-MI, we compare the results including the count variable in the model either untransformed, square rooted, or log-transformed. In the scenario with small 0 , we also treat the count variable as categorical.

Results
The top three boxes of Table 4 show the results. In the scenarios with larger , all JM-MI methods seem to behave well when the covariate effects are small. However, with larger effects, the variance-stabilizing transformation leads to better parameter estimates, while-primarily because of small bias in the parameter estimates-the untransformed method leads to undercoverage for the two slope parameters (Figure 3). Even here, though, JM-MI outperforms FCS. The log transformation leads to accurate results, although the standard errors get inflated for increasing values of 1 and 2 (results not shown).
With small , the method including as categorical in the imputation model leads to similarly good results. In summary, when including a count outcome variable in the imputation model with JM-MI, if is small the categorical approach is viable and effective, while with larger values of , can be included in the model as continuous, possibly after a log or square-root transformation.

Count covariate of a regression model
Although less common, a count variable may be included in a regression model as a covariate. Here, we explore this by simulating 1000 datasets of size 300 from the following distribution:

Results
When the partially observed count variable is included as a covariate in the substantive analysis model, results of these simulations clearly show that the log transformation is a poor option. This is because it is inconsistent with the linear relationship in (15). This is also the reason why the variance-stabilizing square-root transformation struggles. Instead, including it in the imputation model untransformed is the best option, as it does not alter the linear association of the variable with the outcome (see e.g. Lee & Carlin, 2017).

A REAL DATA EXAMPLE: THE GERMAN BREAST CANCER STUDY GROUP
Here, we evaluate the latent normal JM-MI approach for the imputation of real, rather than simulated, data, again comparing the results with those obtained using FCS-MI. We use data from a prospective study on node-positive breast cancer patients of the German Breast Cancer Study Group (Sauerbrei, Royston, Bojar, Schmoor, & Schumacher, 1999;Schumacher et al., 1994). The study was a Comprehensive Cohort Study, where patients satisfying eligibility criteria were asked whether they consented to be randomized; otherwise, physicians chose their preferred treatment. Table 5 gives a summary of the baseline characteristics for the 686 patients included in the study. Suppose we are interested in the effect of hormone therapy on survival. For this purpose, we want to fit a Cox proportional hazards model with hormone therapy as the main exposure, adjusting for age, tumor size and grade, number of active nodes and menopausal state.
T A B L E 4 Count data simulation results, with Poisson dependent data generated from models (14)  Following (Sauerbrei et al., 1999), the number of nodes is exponentially transformed to reflect medical knowledge about the effect on survival.
In order to evaluate the imputation methods, we sample with replacement from the original data 1000 datasets with the same sample size, and in each of these we introduce missing data in three variables, that we later impute with either FCS-MI or JM-MI: 1. Tumor grade: this is a 3-level categorical variable, and we make ∼ 20% missing with an MAR mechanism dependent on the outcome (survival time). Being an ordered variable, we impute it with an ordinal regression model within FCS-MI while in JM-MI we handle it with two latent normal variables. Given the results of the simulation section, we expect this choice not to lead to excessive loss of efficiency compared to a model where the order between categories is acknowledged;

2.
Menopausal state: this variable is binary, and we make ∼ 20% observations missing with an MAR mechanism dependent on age at recruitment; we include it in the imputation model for JM-MI with a latent normal variable, while for FCS-MI, we use a simple logistic regression model; 3. Number of active nodes: this is a count variable, and we introduce 20% missing values with an MCAR mechanism. We use PMM for FCS-MI, and, given the results of the simulation section, we do not further transform the variable before including it in the JM-MI imputation model.
Given that the analysis model is a Cox regression model, we follow the suggestion in White and Royston (2009) and include the Nelson-Aalen estimator and the event indicator in the imputation model. For each sample, we create 20 imputed datasets. Figure 4 compares coverage levels for the set of fixed effect parameter estimates from full data analysis, and handling missing data with either complete records, FCS-MI or JM-MI. Coverage levels for both imputation methods are close to 95%, although on average slightly below the nominal level because of small biases in some parameter estimates. Hence, a comparison between the two imputation strategies is consistent with the results from the simulation studies, and confirms that the two methods perform very similarly even with real data and a larger number of variables.

DISCUSSION
In this paper, we reported the results of a comprehensive series of simulation studies to investigate the validity of the latent normal variables approach to include binary, categorical, ordinal, and count variables in the Joint Modelling imputation framework. We explored both cases where such data were included as outcomes or covariates in our substantive analysis models. Our results show that, provided we choose an appropriate imputation model-at least approximately congenial with the analysis model of interest-this approach gives valid results across a range of scenarios, and sometimes outperforms FCS. Starting with binary and categorical outcome data, the results were uniformly good. With categorical covariate data, in most situations results were very good, with only a single scenario where, with very strong covariate effects, inference after JM-MI led to slight overestimation of the standard errors. The underlying cause of this is that the data generating mechanism (from the general location model) was not fully compatible with the latent normal model. While analysts need to be aware this may be an issue, the parameters used in the scenario were purposely quite extreme, and hence we believe that in the large majority of settings this is of negligible consequence in practice.
Turning to ordinal data, our results showed that the general latent normal variables algorithm for dealing with categorical variables works very well for ordinal data, with negligible loss of efficiency. Thus automatic use of the general categorical model is a sensible approach in applications, as it also avoids the need to first establish whether the simpler proportional probit/logit model is plausible.
Lastly, we considered count data. When the count is the dependent variable and the mean small, best results were achieved by treating it as categorical. When the mean is larger, best results were achieved with a log transformation (consistent with the log-linear relationship in the substantive model) although square root and untransformed approaches also performed reasonably. However, when the count is a covariate, the log or square root transformation should not be used, because they are both markedly inconsistent with the substantive model. Instead, including the count variable untransformed was the best approach. Note that, across all the simulations, we did not round imputed values of count outcome variables, but only truncated them at 0 to avoid negative counts.
We compared JM-MI with FCS-MI in each scenario, finding that the two methods performed similarly in most situations, although as detailed in the results, in some settings FCS-MI was slightly worse than JM-MI and vice versa. Therefore, in practice we believe that users can decide to use either of the two methods interchangeably. At least compared to the current implementation of jomo, FCS-MI potentially has an advantage compared to JM-MI in terms of computational time with large datasets, particularly with a large number of categorical variables. The big advantage of JM-MI is that it involves an explicit formulation of the joint imputation model (so it avoids issues of compatibility between the univariate models) and it extends naturally to the multilevel setting.