The paper that G. U. Yule read to the Royal Statistical Society in 1899 is, by virtue of its application of multiple regression to observational data, a landmark in social statistics. It is also an illustration of the value of relating a change in an explanatory variable to a change in the response when wishing to draw causal conclusions. This paper returns to Yule's data and analysis from a 21st-century perspective. A range of multilevel and fixed effects models are fitted to the reconstructed data set and his conclusions are re-examined. The social and political contexts of Yule's work are also considered.

The objective of this analysis was to explore temporal and spatial variation in teen birth rates TBRs across counties in the USA, from 2003 to 2012, by using hierarchical Bayesian models. Prior examination of spatiotemporal variation in TBRs has been limited by the reliance on large-scale geographies such as states, because of the potential instability in TBRs at smaller geographical scales such as counties. We implemented hierarchical Bayesian models with space–time interaction terms and spatially structured and unstructured random effects to produce smoothed county level TBR estimates, allowing for examination of spatiotemporal patterns and trends in TBRs at a smaller geographic scale across the USA. The results may help to highlight US counties where TBRs are higher or lower and to inform efforts to reduce birth rates to adolescents in the USA further.

We propose several parsimonious models for higher order Markov chains, applied to the study of municipal rating migrations in credit risk. In full parameterized Markov chain models, the number of parameters increases very rapidly as the order in the Markov chain grows and this can yield biased estimates when certain sequences of states are rare. For some processes, as in the case of credit ratings, this problem is accentuated because the transitions between distant states are unlikely (*persistent* transitions). We introduce the *short* and *long persistence* models and compare them with the full parameterized Markov chain, achieving a better fit with a lower number of parameters. Furthermore, downgrade *momentum* effects are found in the rating process, which are consistent with recent empirical findings.

We consider the use of representativeness indicators to monitor risks of non-response bias during survey data collection. The analysis benefits from use of a unique data set linking call record paradata from three UK social surveys to census auxiliary attribute information on sample households. We investigate the utility of census information for this purpose and the performance of representativeness indicators (the *R*-indicator and the coefficient of variation of response propensities) in monitoring representativeness over call records. We also investigate the extent and effects of misspecification of auxiliary covariate sets used in indicator computation and design phase capacity points in call records beyond which survey data set improvements are minimal, and whether such points are generalizable across surveys. Given our findings, we then offer guidance to survey practitioners on the use of such methods and implications for optimizing data collection and efficiency savings.

Several studies have shown that conversational interviewing (CI) reduces response bias for complex survey questions relative to standardized interviewing. However, no studies have addressed concerns about whether CI increases intra-interviewer correlations (IICs) in the responses collected, which could negatively impact the overall quality of survey estimates. The paper reports the results of an experimental investigation addressing this question in a national face-to-face survey. We find that CI improves response quality, as in previous studies, without substantially or frequently increasing IICs. Furthermore, any slight increases in the IICs do not offset the reduced bias in survey estimates engendered by CI.

Returns to education are variable both within and between educational group. If uncertain pay-offs are a concern to individuals when selecting an education, wage variance is relevant. The variation is a combination of unobserved heterogeneity and pure uncertainty or risk. The first element is known to the individual, but unknown to the researcher; the second is unknown to both. As a result, the variance of wages observed in the data will overestimate the real magnitude of educational uncertainty and the effect that risk has on educational decisions. We apply a semiparametric estimation technique to tackle the selectivity issues. This method does not rely on distributional assumptions of the errors in the schooling choice and wage equations. Our results suggest that risk is decreasing in schooling. Private information accounts for a share varying between 0% and 13% of total wage variance observed depending on the educational level. Finally, we conclude that the estimation results are very sensitive to the functional relation that is imposed on the error structure.

We examine forecasting performance of the recent fractionally cointegrated vector auto-regressive (FCVAR) model. We use daily polling data of political support in the UK for 2010–2015 and compare with popular competing models at several forecast horizons. Our findings show that the four variants of the FCVAR model considered are generally ranked as the top four models in terms of forecast accuracy, and the FCVAR model significantly outperforms both univariate fractional models and the standard cointegrated vector auto-regressive model at all forecast horizons. The relative forecast improvement is higher at longer forecast horizons, where the root-mean-squared forecast error of the FCVAR model is up to 15% lower than that of the univariate fractional models and up to 20% lower than that of the cointegrated vector auto-regressive model. In an empirical application to the 2015 UK general election, the estimated common stochastic trend from the model follows the vote share of the UK Independence Party very closely, and we thus interpret it as a measure of Euroscepticism in public opinion rather than an indicator of the more traditional left–right political spectrum. In terms of prediction of vote shares in the election, forecasts generated by the FCVAR model leading to the election appear to provide a more informative assessment of the current state of public opinion on electoral support than the hung Parliament prediction of the opinion poll.

We contribute to the small, but important, literature exploring the incidence and implications of misreporting in survey data. Specifically, when modelling ‘social bads’, such as illegal drug consumption, researchers are often faced with exceptionally low reported participation rates. We propose a modelling framework where firstly an individual decides whether to participate or not and, secondly, for participants there is a subsequent decision to misreport or not. We explore misreporting in the context of the consumption of a system of drugs and specify a *multivariate inflated probit model*. Compared with observed participation rates of 12.2%, 3.2% and 1.3% (for use of marijuana, speed and cocaine respectively) the true participation rates are estimated to be almost double for marijuana (23%), and more than double for speed (8%) and cocaine (5%). The estimated chances that a user would misreport their participation is a staggering 65% for a hard drug like cocaine, and still about 31% and 17%, for the softer drugs of marijuana and speed.

Recently personalized medicine and dynamic treatment regimes have drawn considerable attention. Dynamic treatment regimes are rules that govern the treatment of subjects depending on their intermediate responses or covariates. Two-stage randomization is a useful set-up to gather data for making inference on such regimes. Meanwhile, the number of clinical trials involving competing risk censoring has risen, where subjects in a study are exposed to more than one possible failure and the specific event of interest may not be observed because of competing events. We aim to compare several treatment regimes from a two-stage randomized trial on survival outcomes that are subject to competing risk censoring. The cumulative incidence function (CIF) has been widely used to quantify the cumulative probability of occurrence of the target event over time. However, if we use only the data from those subjects who have followed a specific treatment regime to estimate the CIF, the resulting estimator may be biased. Hence, we propose alternative non-parametric estimators for the CIF by using inverse probability weighting, and we provide inference procedures including procedures to compare the CIFs from two treatment regimes. We show the practicality and advantages of the proposed estimators through numerical studies.

Before adjustment, low and high frequency data sets from national accounts are frequently inconsistent. Benchmarking is the procedure used by economic agencies to make such data sets consistent. It typically involves adjusting the high frequency time series (e.g. quarterly data) so that they become consistent with the lower frequency version (e.g. annual data). Various methods have been developed to approach this problem of inconsistency between data sets. The paper introduces a new statistical procedure, namely wavelet benchmarking. Wavelet properties allow high and low frequency processes to be jointly analysed and we show that benchmarking can be formulated and approached succinctly in the wavelet domain. Furthermore the time and frequency localization properties of wavelets are ideal for handling more complicated benchmarking problems. The versatility of the procedure is demonstrated by using simulation studies where we provide evidence showing that it substantially outperforms currently used methods. Finally, we apply this novel method of wavelet benchmarking to official data from the UK's Office for National Statistics.

The existing literature that estimates the incidence of arrears relies on either household survey data or administrative data derived from the lender's records of their borrowers. But estimates based on these different sources will give different estimates of arrears. Moreover, the estimates are not useful for policy analysis or for a bank's lending decision, since they ignore the fact that some households do not borrow. The paper discusses the selection issues that are involved in using either source of data and is the first paper to bound the estimate of the household's underlying propensity to repay. To demonstrate the methodology, it uses data from the European Union Survey of Income and Living Conditions survey for 2008 to estimate the factors that affect repayment among Eurozone households.

We analyse the effect of income on mortality in Austria by using administrative social security data. To tackle potential endogeneity concerns arising in this context, we estimate time invariant firm-specific wage components and use them as instruments for actual wages. Although we find quantitatively small yet statistically significant effects in our naive least squares estimations, instrumental variables regressions reveal a robust zero effect of income on 10-year death rates for workers aged 40–60 years, both in terms of coefficient magnitude and narrow width of confidence intervals. These results are robust to various sample specifications and both linear and non-linear estimation methods.

In sample selection models, a treatment can influence the observed outcome in two ways: by affecting the binary selection or participation decision and by affecting the latent outcome. The former is called the ‘extensive margin effect’, and the latter is called the ‘intensive margin effect’. Despite the popularity of these effects, however, the intensive margin effect does not have the traditional causal parameter interpretation because it is conditioned on the selecting or participating decision, which is a post-treatment variable possibly affected by the treatment. The paper presents a causal framework for sample selection models and introduces various subpopulation effects. It is difficult to separate such effects in general; however, in certain popular models (nearly parametric sample selection models, semiparametric ‘independence models’, semiparametric zero-censored models and ‘polynomial approximation’ models) with linear latent equations, they are separately identified and easily estimable with probit and least squares estimators. An empirical analysis is provided to illustrate these causal effects in sample selection models.

It has previously been shown that, across three British birth cohorts, relative rates of intergenerational social class mobility have remained at an essentially constant level among men and also among women who have worked only full time. We establish the pattern of this prevailing level of social fluidity and its sources and determine whether it also persists over time, and we bring out its implications for inequalities in relative mobility chances. We develop a parsimonious model for the log-odds-ratios which express the associations between individuals’ class origins and destinations. This model is derived from a topological model that comprises three kinds of readily interpretable binary characteristics and eight effects in all, each of which does, or does not, apply to particular cells of the mobility table, i.e. effects of class hierarchy, class inheritance and status affinity. Results show that the pattern as well as the level of social fluidity are essentially unchanged across the cohorts, that gender differences in this prevailing pattern are limited and that marked differences in the degree of inequality in relative mobility chances arise with long-range transitions where inheritance effects are reinforced by hierarchy effects that are not offset by status affinity effects.

The paper investigates the group structure in a terrorist network through the latent class model and a Bayesian model comparison method for the number of latent classes. The analysis of the terrorist network is sensitive to the model specification. Under one model it clearly identifies a group containing the leaders and organizers, and the group structure suggests a hierarchy of leaders, trainers and ‘foot soldiers’ who carry out the attacks.

We investigate a long-debated question, which is how to create predictive models of recidivism that are sufficiently accurate, transparent and interpretable to use for decision making. This question is complicated as these models are used to support different decisions, from sentencing, to determining release on probation to allocating preventative social services. Each case might have an objective other than classification accuracy, such as a desired true positive rate TPR or false positive rate FPR. Each (TPR, FPR) pair is a point on the receiver operator characteristic (ROC) curve. We use popular machine learning methods to create models along the full ROC curve on a wide range of recidivism prediction problems. We show that many methods (support vector machines, stochastic gradient boosting and ridge regression) produce equally accurate models along the full ROC curve. However, methods that are designed for interpretability (classification and regression trees and C5.0) cannot be tuned to produce models that are accurate and/or interpretable. To handle this shortcoming, we use a recent method called supersparse linear integer models to produce accurate, transparent and interpretable scoring systems along the full ROC curve. These scoring systems can be used for decision making for many different use cases, since they are just as accurate as the most powerful black box machine learning models for many applications, but completely transparent, and highly interpretable.

The aim of the paper is to assess how climate change is reflected in the variation of the seasonal patterns of the monthly central England temperature time series between 1772 and 2013. In particular, we model changes in the amplitude and phase of the seasonal cycle. Starting from the seminal work of Thomson, various studies have documented a shift in the phase of the annual cycle, implying an earlier onset of the spring season at various European locations. A significant reduction in the amplitude of the seasonal cycle is also documented. The literature so far has concentrated on the measurement of this phenomenon by various methods, among which complex demodulation and wavelet decompositions are prominent. We offer new insight by considering a model that allows for seasonally varying deterministic and stochastic trends, as well as seasonally varying auto-correlation and residual variances. The model can be summarized as containing a permanent and a transitory component, where global warming is captured in the permanent component, on which the seasons load differentially. The phase of the seasonal cycle, in contrast, seems to be following the trend that was identified by Thomson with Earth's precession in a stable manner. We identify the reported fluctuations as transitory.

The relationship between aging, health and healthcare expenditures is of central importance to academics and public policy makers. Generally, it is observed that, with advancing age, health deteriorates and healthcare expenditures increase. This seems to imply that increases in life expectancy would strongly increase both the demand for healthcare expenditures and the number of years lived in poor health. Previous research has shown that such straightforward conclusions may be flawed. For example, it has been established that not age but ‘time to death’ is the main driver of increased healthcare expenditures at advanced ages. The paper extends this line of research by investigating the relationship between age, time to death and health, the last being longitudinally measured via a health-related quality-of-life questionnaire. We propose an approach for modelling the health-related quality-of-life outcome that accounts for both the non-standard nature of this response variable (e.g. bounded, left skewed or heteroscedastic) and the panel structure of the data. Analyses were performed within a Bayesian framework. We found that health losses are centred in the final phase of life, which indicates that future increases in longevity will not necessarily increase life years spent in poor health. This may alleviate the consequences of population aging.

The paper employs a recently developed instrumental variable approach for the estimation of dynamic quantile regression models with fixed effects to model the dynamics of health outcomes. Our proposed estimator not only allows us to control for individual-specific heterogeneity via fixed effects in the dynamic quantile regression framework but may also reduce the bias that exists in conventional fixed effects estimation of dynamic quantile regression models with small numbers of time periods. Using data on the children of the US National Longitudinal Survey of Youth 1979 cohort, we examine the extent of true state dependence in youth depression conditional on unobserved individual heterogeneity and family socio-economic status. Our results suggest that true state dependence in youth depression among the survey respondents is very low and the observed positive association between previous and current depression is mainly due to time invariant unobserved individual heterogeneity.

The paper extends the latent promotion time cure rate marker model of Kim, Xi and Chen for right-censored survival data. Instead of modelling the cure rate parameter as a deterministic function of risk factors, they assumed that the cure rate parameter of a targeted population is distributed over a number of ordinal levels according to the probabilities governed by the risk factors. We propose to use a mixture of linear dependent tail-free processes as the prior for the distribution of the cure rate parameter, resulting in a latent promotion time cure rate model. This approach provides an immediate answer to perhaps one of the most pressing questions ‘what is the probability that a targeted population has high proportions (e.g. greater than 70%) of being cured?’. The approach proposed can accommodate a rich class of distributions for the cure rate parameter, while centred at gamma densities. The algorithms that are developed in this work allow the fitting of latent promotion time cure rate models with several survival models for metastatic tumour cells.

Suppose that we have a historical time series with samples taken at a slow rate, e.g. quarterly. The paper proposes a new method to answer the question: is it worth sampling the series at a faster rate, e.g. monthly? Our contention is that classical time series methods are designed to analyse a series at a single and given sampling rate with the consequence that analysts are not often encouraged to think carefully about what an appropriate sampling rate might be. To answer the sampling rate question we propose a novel Bayesian method that incorporates the historical series, cost information and small amounts of pilot data sampled at the faster rate. The heart of our method is a new Bayesian spectral estimation technique that is capable of coherently using data sampled at multiple rates and is demonstrated to have superior practical performance compared with alternatives. Additionally, we introduce a method for hindcasting historical data at the faster rate. A freeware R package, regspec, is available that implements our methods. We illustrate our work by using official statistics time series including the UK consumer price index and counts of UK residents travelling abroad, but our methods are general and apply to any situation where time series data are collected.

Hospital performance metrics, often in the form of risk-adjusted hospital mortality rates, are increasingly being made available in the public domain to compare hospitals. Despite the proliferation of these metrics, uncertainty remains regarding their validity and reliability given the noise surrounding their underlying measures. The paper considers a quality measure of hospital performance developed by McClellan and Staiger which smooths within hospitals and over time, while remaining computationally straightforward. The McClellan and Staiger method improves on others by incorporating different measures of outcome, eliminating systematic bias arising from the heterogeneous mix of hospital outputs and the noise that is inherent in other measures of quality. The technique also allows the forecasting of future quality. Using English hospital episode statistics for the years 2000–2005 for acute myocardial infarction and hip replacement, we use this technique to return quality measures based on hospital fixed effects estimated from yearly cross-sectional patient level data, and vector auto-regressions estimated over time, which then combine information from different time periods and across conditions to produce robust hospital quality measures. Our results suggest that this method is well suited to measure and predict provider quality of care in the English setting.

The paper concerns the statistical modelling of emergency service response times. We apply advanced methods from spatial survival analysis to deliver inference for data collected by the London Fire Brigade on response times to reported dwelling fires. Existing approaches to the analysis of these data have been mainly descriptive; we describe and demonstrate the advantages of a more sophisticated approach. Our final parametric proportional hazards model includes harmonic regression terms to describe how the response time varies with the time of day and shared spatially correlated frailties on an auxiliary grid for computational efficiency. We investigate the short-term effect of fire station closures in 2014. Although the London Fire Brigade are working hard to keep response times down, our findings suggest that there is a limit to what can be achieved logistically: the paper identifies areas around the now closed Belsize, Bow, Downham, Kingsland, Knightsbridge, Silvertown, Southwark, Westminster and Woolwich fire stations in which there should perhaps be some concern about the provision of fire services.

We re-explore Abel-Smith and Townsend's landmark study of poverty in early post World War 2 Britain. They found a large increase in poverty between 1953–1954 and 1960, which was a period of relatively strong economic growth. Our re-examination is a first exploitation of the data extracted from the recent digitization of the Ministry of Labour's ‘Enquiry into household expenditure’ in 1953–1954. First we closely replicate their results. We find that Abel-Smith and Townsend's method generated a greater rise in poverty than other reasonable methods. Using contemporary standard poverty lines, we find that the relative poverty rate grew only a little at most, and the absolute poverty rate fell, between 1953–1954 and 1961, as might be expected in a period of rising real incomes and steady inequality. We also extend the poverty rate time series of Goodman and Webb back to 1953–1954.

Does universal preschool constitute an effective policy tool to promote the development and integration of children from minority groups? We address this question for the children of the Roma—the largest and most disadvantaged minority group in Europe. To tackle the issue of non-random selection into preschool, we exploit variation in the individual distance to the nearest preschool facility. Non-parametric instrumental variable estimations reveal significant short-term gains in terms of children's literacy. Preschool attendance also increases the prevalence of vaccinations but has no effect on other observed health outcomes. Overall, preschool also does not seem to enhance integration measured by children's language proficiency or social–emotional development, at least not in the short term.

Contactless credit cards and stored value cards are touted as a fast and convenient method of payment to replace cash at the point of sale. Cross-sectional approaches find a large effect of these retail payment innovations on cash usage (around 10%). Using a semiparametric panel model that accounts for unobserved heterogeneity and general forms of attrition, we find no significant effect for contactless credit cards and only a 2% reduction in cash usage stemming from single-purpose stored value cards. These results point to the uneven pace of payment innovation diffusion.

Cross-classified multilevel models deal with data pertaining to two different non-hierarchical classifications. It is unclear how much interpenetration is needed for a cross-classified multilevel model to work well and to estimate the two higher-level effects reliably. The paper investigates this question and the properties of cross-classified multilevel logistic models under various survey conditions. The effects of different membership allocation schemes, total sample sizes, group sizes, number of groups, overall rates of response and the variance partitioning coefficient on the properties of the estimators and the power of the Wald test are considered. The work is motivated by an application to separate area and interviewer effects on survey non-response which are often confounded. The results indicate that limited interviewer dispersion (around three areas per interviewer) provides sufficient interpenetration for good estimator properties. Further dispersion yields only very small or negligible gains in the properties. Interviewer dispersion also acts as a moderating factor on the effect of the other simulation factors (sample size, the ratio of interviewers to areas, the overall probability and the variance values) on the properties of the estimators and test statistics. The results also indicate that a higher number of interviewers for a set number of areas and a set total sample size improves these properties.

We propose a cross-classified mixed effects location–scale model for the analysis of interviewer effects in survey data. The model extends the standard two-way cross-classified random-intercept model (respondents nested in interviewers crossed with areas) by specifying the residual variance to be a function of covariates and an additional interviewer random effect. This extension provides a way to study interviewers’ effects on not just the ‘location’ (mean) of respondents’ responses, but additionally on their ‘scale’ (variability). It therefore allows researchers to address new questions such as ‘Do interviewers influence the variability of their respondents’ responses in addition to their average, and if so why?’. In doing so, the model facilitates a more complete and flexible assessment of the factors that are associated with interviewer error. We illustrate this model by using data from wave 3 of the UK Household Longitudinal Survey, which we link to a range of interviewer characteristics measured in an independent survey of interviewers. By identifying both interviewer characteristics in general, but also specific interviewers who are associated with unusually high or low or homogeneous or heterogeneous responses, the model provides a way to inform improvements to survey quality.

In randomized controlled trials with non-adherence, instrumental variable (IV) methods are frequently used to report the complier average causal effect. With binary outcomes, many of the available IV estimation methods impose distributional assumptions. We develop a randomization-inference-based method of IV estimation for binary outcomes. The method is non-parametric and is based on Fisher's exact test, and estimates can be easily calculated from a set of 2×2 or 2×2×2 tables. Although we retain the standard IV identification assumptions for confidence regions and point estimates, the IV estimand under randomization inference is sample specific and does not assume that the randomized controlled trials participants are a random sample from the target population. We illustrate the method with the ‘IMPROVE’ trial that compares emergency endovascular *versus* open surgical repair for patients with ruptured aortic aneurysms.

We analyse exchange rate pass-through into import prices for a large group of 33 emerging and developed economies from 1980, quarter 1, to 2010, quarter 4. Our error correction models permit asymmetric pass-through for currency appreciations and depreciations over three horizons of interest: on impact, in the short run and in the long run. We find that depreciations are typically passed through more strongly than appreciations in the long run, suggesting that exporters may exert a degree of long-run pricing power. This asymmetry is stronger in economies which are more import dependent but is moderated by freedom to trade and a positive output gap. Given that this pass-through asymmetry is welfare reducing for consumers in the destination market, a key macroeconomic implication is that import-dependent economies, in particular, can benefit from trade liberalization.

Dropouts and delayed graduations are critical issues in higher education systems world wide. A key task in this context is to identify risk factors associated with these events, providing potential targets for mitigating policies. For this, we employ a discrete time competing risks survival model, dealing simultaneously with university outcomes and its associated temporal component. We define survival times as the duration of the student's enrolment at university and possible outcomes as graduation or two types of dropout (voluntary and involuntary), exploring the information recorded at admission time (e.g. educational level of the parents) as potential predictors. Although similar strategies have been previously implemented, we extend the previous methods by handling covariate selection within a Bayesian variable selection framework, where model uncertainty is formally addressed through Bayesian model averaging. Our methodology is general; however, here we focus on undergraduate students enrolled in three selected degree programmes of the Pontificia Universidad Católica de Chile during the period 2000–2011. Our analysis reveals interesting insights, highlighting the main covariates that influence students’ risk of dropout and delayed graduation.

A pair of municipalities may consolidate services if they are contiguous. Traditional estimation methods assume that each voting process is independent. Instead we propose a new estimation procedure that allows the probability of consolidation to be influenced by neighbouring decisions. We extend a model of local interaction by allowing consolidation effort of neighbours to be either strategic substitutes or strategic complements. We disentangle direct effects arising from a change in one's own characteristics from indirect or spillover effects associated with a change in the other municipalities’ characteristics. Results reveal that the endogenous peer effect coming from neighbours is a primary determinant of willingness to consolidate.

In a health context, dependence is defined as a lack of autonomy in performing the basic activities of daily living and requiring care giving or significant help from another person. However, this contingency, if present, changes over one's lifetime. Empirical evidence shows that, once this situation occurs, it is almost impossible to return to the previous state and in most cases increases in intensity. In the paper, the evolution of the intensity of this situation is studied for the Spanish population affected by this contingency. Evolution in dependence can be seen as sparsely observed functional data, where we obtain a curve for each individual that is observed at only those points where changes in his or her condition of dependence occur. We use functional data analysis techniques, such as curve registration, functional data depth and distance-based clustering, to analyse this type of data. This approach proves to be useful in this context because it considers the dynamics of the dependence process and provides more meaningful conclusions than simple pointwise or multivariate analysis. We use the sample statistics obtained to predict the future evolution of dependence. The database analysed originates from the ‘Survey on disability, personal autonomy and dependence situations’ in Spain in 2008. The survey is the largest and most complete survey to be made available in Europe for the study of disability. In addition, the Spanish legislation is one of the most recent in Europe and provides a detailed quantitative scale to assess dependence. In the paper, the scale value according to this legislation has been calculated for each individual included in the survey. Differences between sex, age and the time of first appearance were considered, and a prediction of the future evolution of dependence is obtained.