Journal of the Royal Statistical Society: Series A (Statistics in Society)

Cover image for Vol. 180 Issue 1

Edited By: H. Goldstein and L. Sharples

Impact Factor: 1.702

ISI Journal Citation Reports © Ranking: 2015: 13/49 (Social Sciences Mathematical Methods); 24/123 (Statistics & Probability)

Online ISSN: 1467-985X

Associated Title(s): Journal of the Royal Statistical Society: Series B (Statistical Methodology), Journal of the Royal Statistical Society: Series C (Applied Statistics), Significance


The role of tobacco taxes in starting and quitting smoking: duration analysis of British data, by M. Forster and A. M. Jones, Journal of the Royal Statistical Society, Series A, Statistics in Society, Volume 164 (2001), part 3, pages 517 - 547

Most of the estimates reported in our paper are computed using standard commands from Stata v.6.0. However estimation of the split population model of starting and the gamma model of quitting with heaping effects requires custom programs:-


Estimation of the split population model is done with Stata programs, written for the maximum likelihood "d0" and "lf" routines. To deal with TVCs the dataset is "expanded" by the age of starting. The data are stset using id(.) and, in the case of method d0, the subroutine mlsum is used to allow for the repeated observations on each respondent. The Stata program is contained in an accompanying ASCII file.


Torelli and Trivellato (1993) propose a solution to the "heaping effect" based on an explicit measurement model. This model is superimposed on the underlying duration model leading to a reformulation of the log-likelihood function. Torelli and Trivellato compare four methods of dealing with heaping:-

i. Re-formulating the likelihood to allow for the measurement model. This requires specifying a parametric model of the measurement errors.
ii. The ad hoc approach of adding dummy variables for the heaped observations.
iii. Smoothing the data prior to estimation by using random draws from a uniform distribution to spread the actual heaped observations. This means that the results are contingent on the random numbers that are generated.
iv. Ignoring the heaping and estimating the underlying duration model.

Method i: We have programmed ml estimation of the gamma model, using the "lf" routine in Stata. We assume that the heaped observations are those where EXFGAGAN is a multiple of 5 or 10. Because heaping is due to EXFAGAN the problem only relates to complete spells i.e., those who have quit smoking. For the observations, the usual contribution to the likelihood, f(ti), is replaced by, F(uti) - F(lti) where lti is the lower limit and uti. the upper limit of an interval of length 5 around ti. The stata program is contained in an accompanying ASCII file.

Method iii: The problem of heaping relates to EXFAGAN, rather than the other components of the dependent variable, so we apply the smoothing method to this variable. For each of the potentially heaped values (5,10,...) the actual observation is smoothed using pseudo random integers (the stata command generates EXSMOOTH = EXFAGAN - 3 +int(5*uniform()) with the seed set at 123456789). No adjustment is required for the censored observations whose durations do not depend on EXFAGAN.

Torelli, N. and Trivellato, U. (1993) Modelling inaccuracies in job-search duration data. Journal of Econometrics, 59:187-211.

Contact details:

Professor Andrew Jones
Department of Economics and Related Studies
University of York
YO10 5DD
United Kingdom
Fax: +44 1904 433759

Dataset (11kb)