Abstract
- Top of page
- Abstract
- 1. Introduction
- 2. Response time modelling
- 3. The linear transformation model
- 4. Model estimation
- 5. Simulation study
- 6. Real data application
- 7. Discussion
- Acknowledgements
- References
The item response times (RTs) collected from computerized testing represent an underutilized source of information about items and examinees. In addition to knowing the examinees’ responses to each item, we can investigate the amount of time examinees spend on each item. In this paper, we propose a semi-parametric model for RTs, the linear transformation model with a latent speed covariate, which combines the flexibility of non-parametric modelling and the brevity as well as interpretability of parametric modelling. In this new model, the RTs, after some non-parametric monotone transformation, become a linear model with latent speed as covariate plus an error term. The distribution of the error term implicitly defines the relationship between the RT and examinees’ latent speeds; whereas the non-parametric transformation is able to describe various shapes of RT distributions. The linear transformation model represents a rich family of models that includes the Cox proportional hazards model, the Box–Cox normal model, and many other models as special cases. This new model is embedded in a hierarchical framework so that both RTs and responses are modelled simultaneously. A two-stage estimation method is proposed. In the first stage, the Markov chain Monte Carlo method is employed to estimate the parametric part of the model. In the second stage, an estimating equation method with a recursive algorithm is adopted to estimate the non-parametric transformation. Applicability of the new model is demonstrated with a simulation study and a real data application. Finally, methods to evaluate the model fit are suggested.

2044-8317/asset/olbannerleft.png?v=1&s=8856da07bc63124271bd41692effae1b4eb4d01b)
2044-8317/asset/olbannerright.png?v=1&s=0e71c188d8e53b18773ec432a90707d049af0643)





. Then the hazard function can be written as


is the speed parameter for test taker i. Hj represents the monotone transformation for item j, and this item-level transformation implies that different types of RT transformations will be possible for different items. βj is a discrimination-like parameter. Negative βj means examinees with higher speed will tend to have shorter RTs. The ɛij are independent and identically distributed with distribution F and independent of the τi. Because the τi are latent variables, they are sometimes referred to as frailties. We assume F is known and the same across different items. The three distributions we consider in the simulations are normal, extreme value and logistic.
are assumed to be randomly drawn from a multivariate normal distribution, that is,


. The mean of τ is fixed to fix the centre of the non-parametric transformation H, and the variance of τ is fixed to remove the trade-off between βj and τi. The scale of H is determined by the fixed distribution of the error term ɛij. For instance, when the error term ɛij follows a normal distribution,
, we restrict it to be standard normal with σ2= 1. This is because the σ can be easily absorbed into the monotone transformation and the item parameters β on both sides of
is equivalent to the original 
, which implies that
. The item time intensity parameter, in this case, is represented by
.
as well as the non-parametric transformations Hj. In particular, we would like our estimation technique to allow for data obtained by computerized adaptive testing, in which every test taker is given different items, based on his or her adaptively estimated θ level. So the random sampling of θ (or τ) from a common distribution cannot be assumed. Consequently, the usual marginal likelihood approaches used in latent variable modelling are no longer appropriate. Also because we have a latent frailty term and an unknown transformation H(·), the methods of 
is equivalent to knowing the ranks of the ti and τi, …, τN. Therefore, it is reasonable to use the marginal likelihood of
, which does not depend on Hj, to make inference about βj without any loss of information (
, which resonates with the model assumption described earlier. By the definition of
as

and ζ=βτi. The key is to obtain approximation for 
, let
be the order statistics of a sample of size N from the standard normal distribution. Then
, and
where
. Details of the derivation are given in
.
from the observed response times. So if ξik is the (i, k)th element in the variance–covariance matrix for the standard normal order statistics, then
where ri and rk are the ranks for the ith and kth observations. Because the response times are continuous, we assume that the ranks are uniquely assigned to each observation without ties. But when there are ties, it is easier to break them by adding a very small number to one of the tied observations. More elegantly, the rank-based likelihood can be adjusted slightly, as described in 


i= 1, …, N, are the estimates in the first step. The solution
to (
is obtained by solving

) in the proportional hazards model, and by virtue of the one-to-one connection between the transformation (H(t)) and the baseline hazard (
), we can estimate H(t) directly from the Breslow estimator, and the results will be comparable to the estimate calculated from (
, where
and
A normal prior is chosen for each regression parameter βj with means equal to 0 and variance chosen to be 10. Here we deliberately selected a large variance to make the prior less informative. The correlation term ρθτ is also chosen to have a vague normal prior as in
. This treatment of restricting the prior region of the covariance (in our case the correlation) parameter to the area that supports a positive definite covariance is discussed in
. The gamma prior was chosen in
(or ρθτ), θ, τ and β. The details of the Metropolis–Hastings within Gibbs sampler are presented below.
, and
, respectively. To perform the sampling for parameters with support on the entire real line, we use normal proposal distributions with mean equal to the current estimation and variance chosen to give a Metropolis acceptance rate of between 25 and 50 per cent. For parameters with support not on the real line, we first transform them to the real line and then sample them from normal proposal distribution.
,
, and
by
,
and
, respectively.
is the maximum likelihood estimator (MLE), maximizing the likelihood function formed by
is calculated by the sample variance of
. The initial value of
is obtained differently depending upon the specific test conditions. The details are given in
,
is obtained by maximizing the approximation of the rank-based likelihood in (
is the sample mean of the θs. Set the iteration counter iter = 1.
,
,
,
and
,
. Sample each parameter sequentially as follows.
(variance of
). Because the variance is always non-negative, we draw
from
with acceptance probability

,
and variance–covariance matrix
is the inverse-gamma prior for the variance term.
and
). Because −1 ≤ρθτ≤ 1, we need to first transform ρθτ to the real line; the transformation we adopt is
. Then draw ϕ* from
, and the acceptance probability of the corresponding
is
is the truncated normal prior density of the correlation term, with
being the one-to-one transformation of ϕ*. The Jacobian matrix
is involved due to the transformation of
, and one will notice that it is the same as the transition matrix as if drawing
from a non-symmetric distribution (instead of drawing ϕ* from a symmetric normal distribution).
from
with acceptance probability
is again a product of multivariate normal densities, and
is the normal prior density.
from a bivariate normal distribution with mean
and variance–covariance matrix

.
is a bivariate normal with mean
and variance–covariance matrix
. IRT(·) is calculated from
from a normal distribution
with the acceptance probability defined as


, in which M is the number of iterations of the algorithm, and
.
, for i= 1, …, N, from which we can estimate its density by kernel smoothing as

, with λ∼U(0.5, 1.5). To ensure H(t) was on the real line, we subtracted 5 from the transformation. However, some other arbitrary value could be chosen. When λ < 1, it is a monotone concave function, and when λ > 1, it is a monotone convex function. The third transformation, Hj(t) =λj(t− 5)3 with λ∼U(0.5, 2), has a inflection point in the middle. Again, 5 is chosen because it is the median of the response times, but other arbitrary values can also been chosen. The fourth transformation was a mixture of the previous three, with 7, 7, and 6 items belonging to log, Box–Cox, and inflection transformations. These transformations and corresponding parameters were chosen to produce realistic response time distributions.
.
, only bias was calculated.
(MLE of θ estimated from responses only) and the final estimate of
. The recovery of the non-parametric transformation was evaluated by the standardized version of the integrated absolute difference between the true Hj(t) and its estimate
for the jth item,
was often estimated with large positive bias. Considering that the ability variance will not affect our interpretations of the RT information as well as its relationships with responses, the results are still acceptable. The non-parametric transformation can also be accurately recovered by displaying a small standardized integrated difference. Only the log-transformation showed slightly larger differences between the true and estimated transformations, and this is because the log transformation has a long and nearly flat tail that is relatively hard to capture, especially bearing in mind that only a few examinees will have extremely long RTs (see
and σθτ. The bivariate normal distribution with mean μ= (μθ, 0) and covariance matrix
was chosen as the prior for (θ, τ) for each examinee.