Multilevel modelling is a popular approach for longitudinal data analysis. Statistical models conventionally target a parameter at the centre of a distribution. However, when the distribution of the data is asymmetric, modelling other location parameters, e.g. percentiles, may be more informative. We present a new approach, *M*-quantile random-effects regression, for modelling multilevel data. The proposed method is used for modelling location parameters of the distribution of the strengths and difficulties questionnaire scores of children in England who participate in the Millennium Cohort Study. Quantile mixed models are also considered. The analyses offer insights to child psychologists about the differential effects of risk factors on children's outcomes.

We use data from the four sweeps of the UK Millennium Cohort Study of children born at the turn of the 21st century to document the effect that poverty, and in particular persistent poverty, has on their cognitive development in their early years. Using structural equation modelling, we show that children born into poverty have significantly lower test scores at age 3, age 5 and age 7 years, and that continually living in poverty in their early years has a cumulative negative effect on their cognitive development. For children who are persistently in poverty throughout their early years, their cognitive development test scores at age 7 years are almost 20 percentile ranks lower than children who have never experienced poverty, even after controlling for a wide range of background characteristics and parental investment.

The paper uses data from the consumer expenditure surveys to demonstrate that the mode of collection is important for the analysis of consumption data. We first show that population figures obtained with diaries markedly differ from figures obtained by using recall questions. We then exploit multiple measurements of food expenditure to identify the effects of the mode of collection on the distribution of reported consumption. Finally, we show how to combine information from multiple reports to obtain a single measure of total expenditure in consumer expenditure surveys. The paper concludes by offering guidelines for empirical analyses based on these data, and by providing an application of the methods proposed to the measurement of inequality and wellbeing.

A new semiparametric and robust approach to small area estimation for discrete outcomes is proposed. The methodology represents an efficient and easily computed alternative to prediction by using a generalized linear mixed model and is based on an extension of *M*-quantile regression. In addition, two estimators of the prediction mean-squared error are described: one based on Taylor linearization and another based on the block bootstrap. The methodology proposed is applied to UK annual Labour Force Survey data for estimating the proportion of the unemployed in local authorities in the UK. The properties of estimators are further empirically assessed in model-based simulations.

A mismatch between the timescale of a structural vector auto-regressive model and that of the time series data used for its estimation can have serious consequences for identification, estimation and interpretation of the impulse response functions. However, the use of mixed frequency data, combined with a proper estimation approach, can alleviate the temporal aggregation bias, mitigate the identification issues and yield more reliable responses to shocks. The problems and possible remedy are illustrated analytically and with both simulated and actual data.

Numerous studies have investigated the relationship between the built environment and physical activity. However, these studies assume that these relationships are invariant over space. In this study, we introduce a novel method to analyse the association between access to recreational facilities and exercise allowing for spatial heterogeneity. In addition, this association is studied before and after controlling for crime, which is a variable that could explain spatial heterogeneity of associations. We use data from the Chicago site of the ‘Multi-ethnic study of atherosclerosis’ of 781 adults aged 46 years and over. A spatially varying coefficient tobit regression model is implemented in the Bayesian setting to allow for the association of interest to vary over space. The relationship is shown to vary over Chicago, being positive in the south but negative or null in the north. Controlling for crime weakens the association in the south with little change observed in northern Chicago. The results of this study indicate that spatial heterogeneity in associations of environmental factors with health may vary over space and deserve further exploration.

The paper reports results from the first ever study of the effect of short-term weather and long-term climate on self-reported life satisfaction that uses longitudinal data. We find robust evidence that day-to-day weather variation impacts self-reported life satisfaction. Utilizing two sources of variation in the cognitive complexity of satisfaction questions, we present evidence that weather effects arise because of the cognitive challenge of reporting life satisfaction. We do not detect a relationship between long-term climate and self-reported life satisfaction by using an individual fixed effects specification, which identifies climate impacts through individuals moving location.

A common problem is to compare two cross-sectional estimates for the same study variable taken on two different waves or occasions, and to judge whether the change observed is statistically significant. This involves the estimation of the sampling variance of the estimator of change. The estimation of this variance would be relatively straightforward if cross-sectional estimates were based on the same sample. Unfortunately, samples are not completely overlapping, because of rotations used in repeated surveys. We propose a simple approach based on a multivariate (general) linear regression model. The variance estimator proposed is not a model-based estimator. We show that the estimator proposed is design consistent when the sampling fractions are negligible. It can accommodate stratified and two-stage sampling designs. The main advantage of the approach proposed is its simplicity and flexibility. It can be applied to a wide class of sampling designs and can be implemented with standard statistical regression techniques. Because of its flexibility, the approach proposed is well suited for the estimation of variance for the European Union Statistics on Income and Living Conditions surveys. It allows us to use a common approach for variance estimation for the different types of design. The approach proposed is a useful tool, because it involves only modelling skills and requires limited knowledge of survey sampling theory.

We consider the estimation of the number of severely disabled people by using data from the Italian survey on ‘Health conditions and appeal to Medicare’. In this survey, disability is indirectly measured by using a set of categorical items, which consider a set of functions concerning the ability of a person to accomplish everyday tasks. Latent class models can be employed to classify the population according to different levels of a latent variable connected with disability. The survey is designed to provide reliable estimates at the level of administrative regions ‘*Nomenclature des unités territoriales statistiques*’, level 2), whereas local authorities are interested in quantifying the number of people who belong to each latent class at a subregional level. Therefore, small area estimation techniques should be used. The challenge is that the variable of interest is not observed. Adopting a full Bayesian approach, we base small area estimation on a latent class model in which the probability of belonging to each latent class changes with covariates and the influence of age is learnt from the data by using penalized splines. Demmler–Reinsch bases are shown to improve speed and mixing of Markov chain Monte Carlo chains used to simulate posteriors.

Large sporting events affect criminal behaviour via three channels: fan concentration, self-incapacitation and police displacement. I exploit information on football matches for London teams linked to detailed recorded crime data at the area level to estimate these effects empirically. I find that only property crime increases in the communities hosting matches but not violent offences. There is a negative away game attendance effect on crime which is due to voluntary incapacitation of potential offenders attending a match. Police displacement during home games increases property crime by 7 percentage points for every extra 10000 supporters in areas that are left underprotected.

The study of temporal and spatial trends in large databases, such as behavioural risk factor surveillance data, can be a great challenge, especially when the intent is to study the time-related effects of multiple independent variables; this is an issue which is not usually addressed in trend analysis in epidemiological studies. This study demonstrates the use of varying coefficient models using non-parametric techniques, which can show how coefficients vary in time or space; it is a useful statistical tool that is applied for the first time to health surveillance data. Using the US ‘Behavioral risk factor surveillance system’, a varying coefficient model is constructed using obesity as an outcome measure. Odds ratio plots and probability maps illustrate the temporal or spatial changes in coefficients of the independent variables; these results can be used to identify changes in at-risk subgroups of the population for the odds of obesity.

Statistics Netherlands applies a design-based estimation procedure to produce road transportation figures. Frequent survey redesigns caused discontinuities in these series which obstruct the comparability of figures over time. Reductions in the sample size and changes in the sample design resulted in variance breaks and unacceptably large sampling errors in the recent part of the series. Both problems are addressed and solved simultaneously. Discontinuities and small sample sizes are accounted for by using a multivariate structural time series model that borrows strength over time and space. The paper illustrates an increased precision when we move from univariate models to a multivariate model where the domains are jointly modelled. This increase is especially significant in the most recent period when sample sizes become smaller, with standard errors of the design-based estimator of the target variables being reduced by 40 & #x2013;70 & #x0025; with the model-based approach.

Passing the ball is one of the key skills of a football player yet the metrics commonly used to evaluate passing ability are crude and largely limited to various forms of a pass completion rate. These metrics can be misleading for two general reasons: they do not account for the difficulty of the attempted pass nor the various levels of uncertainty involved in empirical observations based on different numbers of passes per player. We address both these deficiencies by building a statistical model in which the success of a pass depends on the skill of the executing player as well as other factors including the origin and destination of the pass, the skill of his teammates and the opponents, and proxies for the defensive pressure put on the executing player as well as random chance. We fit the model by using data from the 2006–2007 season of the English Premier League provided by Opta, estimate each player's passing skill and make predictions for the next season. The model predictions considerably outperform a naive method of simply using the previous season's completion rate as a predictor of the following season's completion rate. In particular, we show how a change in the difficulty of passes attempted in both seasons explains a significant proportion of the shift in the observed performance of some players—a fact that is ignored if the raw completion rate is used to evaluate player skill.

New cervical cancer screening guidelines in the USA and many European countries recommend that women are tested for human papilloma virus (HPV). To inform decisions about screening intervals, we calculate the increase in precancer or cancer risk per year of continued HPV infection. However, both time to onset of precancer or cancer and time to HPV clearance are interval censored, and onset of precancer or cancer strongly informatively censors HPV clearance. We analyse these bivariate informatively interval-censored data by developing a novel joint model for time to clearance of HPV and time to precancer or cancer by using shared random effects, where the estimated mean duration of each woman's HPV infection is a covariate in the submodel for time to precancer or cancer. The model was fitted to data on 9553 HPV positive and negative women undergoing cervical cancer screening at Kaiser Permanente Northern California: data that were pivotal to the development of US screening guidelines. We compare the implications for screening intervals of this joint model with those from population-average marginal models of precancer or cancer risk. In particular, after 2 years the marginal population-average precancer or cancer risk was 5%, suggesting a 2-year interval to control *population-average* risk at 5%. In contrast, the joint model reveals that almost all women exceeding 5% individual risk in 2 years also exceeded 5% in 1 year, suggesting that a 1-year interval is better to control *individual* risk at 5%. The example suggests that sophisticated risk models that can predict individual risk may have implications that are different from those of population-average risk models that are currently used for informing medical guideline development.

We examine the average cost function for property and casualty insurers. The cost function describes the relationship between a firm's minimum production cost and outputs. A comparison of cost functions could shed light on the relative cost efficiency of individual firms, which is of interest to many market participants and has been given extensive attention in the insurance industry. To identify and to compare the cost function, current practice is to assume a common functional form between costs and outputs across insurers and then to rank insurers according to the centre of the cost distribution. However, the assumption of a common cost–output relationship could be misleading because insurers tend to adopt different technologies that are reflected by the cost function in their production process. The centre-based comparison could also lead to biased inference especially when the cost distribution is skewed with a heavy tail. To address these issues, we model the average production cost of insurers by using a Bayesian quantile regression approach. Quantile regression enables the modelling of different quantiles of the cost distribution as opposed to just the centre. The Bayesian approach helps to estimate the cost-to-output functional relationship at a firm level by borrowing information across firms. In the analysis of US property–casualty insurers, we show that better insights into efficiency are gained by comparing different quantiles of the cost distribution.

During redesigns of repeated surveys, the old and new approaches are often conducted in parallel to quantify discontinuities that are initiated by modifications in the survey process. For budget limitations, the sample size allocated to the alternative approach is often considerably smaller compared with the regular survey that is used for official publication. In this paper, small area estimation techniques are considered to improve the accuracy of domain estimates obtained under the alternative approach. Besides auxiliary information that is available from administrations, direct domain estimates available from the regular survey are useful auxiliary variables to construct model-based small area estimators. These methods are applied to a redesign of the Dutch Crime Victimization Survey.

We compare three major UK surveys, the British Household Panel Survey, Family Resources Survey and the English Longitudinal Study of Ageing, in terms of the picture that they give of the relationship between disability and receipt of the *attendance allowance* benefit. Using the different disability indicators that are available in each survey, we use a structural equation approach involving a latent concept of disability in which probabilities of receiving attendance allowance depend on disability. Despite major differences in design, once sample composition has been standardized through statistical matching, the surveys deliver similar results for the model of disability and receipt of attendance allowance. Provided that surveys offer a sufficiently wide range of disability indicators, the detail of disability measurement appears relatively unimportant.

In line with recent developments in the statistical analysis of functional data, we develop the semiparametric functional auto-regressive modelling approach to the density forecasting analysis of national rates of inflation by using sectoral inflation rates in the UK over the period January 1997–September 2013. The pseudo-out-of-sample forecasting evaluation and test results provide an overall support to superior performance of our proposed models over the aggregate auto-regressive models and their statistical validity. The fan chart analysis and the probability event forecasting exercise provide further support for our approach in a qualitative sense, revealing that the modified functional auto-regressive models can provide a complementary tool for generating the density forecast of inflation, and for analysing the performance of a central bank in achieving announced inflation targets. As inflation targeting monetary policies are usually set with recourse to the medium-term forecasts, our proposed work may provide policy makers with an invaluably enriched information set.

Knowledge of the current state of the art in information and communication technology of businesses (ICTB) is an important issue for governments, markets and policy makers, because information technology improves access to information and plays an important role in firms' competitiveness. Statistical agencies use normalized surveys to provide harmonized statistics about the use of technology in enterprises. Classical design-based estimators are appropriate for large domains, because direct estimates are consistent and easy to obtain by using sampling weights. However, to supply estimates for unplanned domains, where the sample size is random, model-based estimators are usually required. In this paper, alternative logistic model-based estimators are suggested to derive small area estimates from ICTB surveys. Final estimates are benchmarked to achieve coherence with direct estimates in larger domains, and standard errors are given by using bootstrap techniques. A Monte Carlo simulation study is conducted to compare the performance of the small area estimators proposed and to evaluate the behaviour of the mean-squared error estimator. Results are illustrated with the 2010 ICTB survey of the Basque country (Spain).

The original version of Bayesian reconstruction, which is a method for estimating age-specific fertility, mortality, migration and population counts of the recent past with uncertainty, produced estimates for female-only populations. Here we show how two-sex populations can be similarly reconstructed and probabilistic estimates of various sex ratio quantities obtained. We demonstrate the method by reconstructing the populations of India from 1971 to 2001, Thailand from 1960 to 2000 and Laos from 1985 to 2005. We found evidence that, in India, the sex ratio at birth exceeded its conventional upper limit of 1.06, and, further, increased over the period of study, with posterior probability above 0.9. In addition, almost uniquely, we found evidence that life expectancy at birth, , was lower for females than for males in India (posterior probability for 1971–1976 equal to 0.79), although there was strong evidence for a reversal of the gap through to 2001. In both Thailand and Laos, we found strong evidence for the more usual result that was greater for females and, in Thailand, that the difference increased over the period of study.

We propose a model-based strategy for ranking scientific journals starting from a set of observed bibliometric indicators that represent imperfect measures of the unobserved ‘value’ of a journal. After discretizing the available indicators, we estimate an extended latent class model for polytomous item response data and use the estimated model to cluster journals. We illustrate our approach by using the data from the Italian research evaluation exercise that was carried out for the period 2004–2010, focusing on the set of journals that are considered relevant for the subarea statistics and financial mathematics. Using four bibliometric indicators (IF, IF5, AIS and the *h*-index), some of which are not available for all journals, and the information contained in a set of covariates, we derive a complete ordering of these journals. We show that the methodology proposed is relatively simple to implement, even when the aim is to cluster journals into a small number of ordered groups of a fixed size. We also analyse the robustness of the obtained ranking with respect to different discretization rules.

The statistical analysis of observational data for fair lending purposes relies on the assumption that, at the firm level, racial discrimination (or the lack thereof) is stable across time. Using data from a mortgage lender during the period 1998–2006, we examine this crucial assumption for the case of pricing differentials for black applicants in household mortgage lending, effectively evaluating possible dynamics in aggregate discrimination patterns. We offer evidence that these estimated pricing differentials may vary substantially across time.

The paper investigates the relationship between fertility and women's education in Italy, using data from the 2009 Household Multipurpose Survey of Family and Social Subjects. We use event history models, adopting a Bayesian approach for inference to study the association between fertility and women's education in the presence of a time varying unobserved component. Our analysis shows that either disregarding the unobserved component or assuming a time constant unobserved heterogeneity can lead to misleading results, at least in the context studied.

We analyse whether female athletes differ from male athletes in their risk-taking behaviour in a competitive setting. Data from high jump and pole vault competitions allow us to identify risky strategies. We estimate whether female athletes use risky strategies as often as male athletes and whether or not their returns to risky strategies differ. Returns to risky strategies are identified via an instrumental variable approach where we use competitive pressure to instrument individual risk taking. Female athletes take fewer risky decisions than men and could improve their outcomes by incurring more risk. We show that competitive pressure results in more risky decisions by both men and women; however, men react stronger to competitive pressure.

More than 1100 abandoned mines, milling sites and waste piles from the uranium mining period are scattered across the Navajo Nation, resulting in exposures to environmental metals, including uranium. The Diné Network for Environmental Health project began in response to concerns regarding the community health effects of these environmental exposures on chronic disease. The paper presents the results of the initial Diné Network for Environmental Health survey of 1304 individuals living on the Navajo Nation. We examine the relationship between uranium mine waste exposure and kidney disease, diabetes and hypertension. These chronic diseases are found at high prevalences in the study population, present major public health risks and have been linked to metals exposures in other studies. We model the exposure–outcome relationship by using a multivariate model for the three binary responses. We implement a Bayesian multivariate *t*-model, which has marginal log-odds ratio parameter interpretations and is computationally efficient. In examining environmental exposures, appropriately adjusting for potential confounders is pivotal to obtaining policy relevant effect estimates. We use Bayesian model averaging to account for uncertainty in the functional form for confounding adjustment within a small set of measured confounders. Using this multivariate framework, we find evidence of associations between these chronic diseases and both historic mining era and legacy mining exposures.

Two-phase study designs are appealing since they allow for the oversampling of rare subpopulations, which improves efficiency. We describe a Bayesian hierarchical model for the analysis of two-phase data. Such a model is particularly appealing in a spatial setting in which random effects are introduced to model between-area variability. In such a situation, one may be interested in estimating regression coefficients or, in the context of small area estimation, in reconstructing the population totals by strata. The gains in efficiency of the two-phase sampling scheme are compared with standard approaches by using 2011 birth data from the research triangle area of North Carolina. We show that the method proposed can overcome small sample difficulties and improve on existing techniques. We conclude that the two-phase design is an attractive approach for small area estimation.

The paper develops a method for producing current quarter forecasts of gross domestic product growth with a (possibly large) range of available within-the-quarter monthly observations of economic indicators, such as employment and industrial production, and financial indicators, such as stock prices and interest rates. In light of existing evidence of time variation in the variances of shocks to gross domestic product, we consider versions of the model with both constant variances and stochastic volatility. We use Bayesian methods to estimate the model, to facilitate providing shrinkage on the (possibly large) set of model parameters and conveniently generate predictive densities. We provide results on the accuracy of nowcasts of realtime gross domestic product growth in the USA from 1985 through 2011. In terms of point forecasts, our proposal improves significantly on auto-regressive models and performs comparably with survey forecasts. In addition, it provides reliable density forecasts, for which the stochastic volatility specification is quite useful.

Consistent negative correlations between sibship size and cognitive performance (as measured by intelligence quotient and other mental aptitude tests) have been observed in past empirical studies. However, parental decisions on family size may correlate with variables affecting child cognitive performance. The aim of this study is to demonstrate how selection bias in studies of sibship size effects can be adjusted for. We extend existing knowledge in two aspects: as factors affecting decisions to increase family size may vary across the number and composition of current family size, we propose a sequential probit model (as opposed to binary or ordered models) for the propensity to increase family size; to disentangle selection and causality we propose multilevel multiprocess modelling where a continuous model for performance is estimated jointly with a sequential probit model for family size decisions. This allows us to estimate and adjust for the correlation between unmeasured heterogeneity affecting both family size decisions and child cognitive performance. The issues are illustrated through analyses of scores on Peabody individual achievement tests among children of the US National Longitudinal Survey of Youth 1979. We find substantial between-family heterogeneity in the propensity to increase family size. Ignoring such selection led to overestimation of the negative effects of sibship size on cognitive performance for families with 1–3 children, when known sources of selection were accounted for. However, the multiprocess modelling proposed could efficiently identify and control for such bias due to adverse selection.

This study compares the extent of selection error (non-response and coverage error) evoked by the four major contemporary modes of data collection (face to face, telephone, mail and Web) and three sequential mixed mode designs (telephone, mail and Web with face-to-face follow-up) for the case of the Dutch Crime Victimization Survey. Sociodemographic characteristics and target variables from the survey serve as benchmark variables. A special two-wave experimental design allows studying design differences in selection error on Crime Victimization Survey variables independently from differences in measurement error. Despite large differences in response rates, only small or no differences in selection error between the four single-mode designs are found on both types of variable. We observe cases when the error is enlarged or mitigated in the mixed mode designs despite the fact that the designs yielded large response increases. Our results question the use of response rates to motivate the choice of mode and use of mixed mode surveys.

Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between protection of confidentiality and quality of inference. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The US Census Bureau collects millions of interrelated time series microdata that are hierarchical and contain many 0s and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian generalized linear mixed models with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the magnitudes or number of entities. We find that, as the prior distributions of the variance components in the Bayesian generalized linear mixed model become more precise towards zero, protection of confidentiality increases and the quality of inference deteriorates. We evaluate our methodology by using a strict privacy measure, empirical differential privacy and a newly defined risk measure, the probability of range identification, which directly measures attribute disclosure risk. We illustrate our results with the US Census Bureau's quarterly workforce indicators.

We propose a classical approach to estimate factor-augmented vector auto-regressive (FAVAR) models with time variation in the parameters. When the time varying FAVAR model is estimated by using a large quarterly data set of US variables from 1972 to 2012, the results indicate some changes in the factor dynamics, and more marked variation in the factors' shock volatility and their loading parameters. Forecasts from the time varying FAVAR model are more accurate, in particular over the global financial crisis period, than forecasts from other benchmark models. Finally, we use the time varying FAVAR model to assess how monetary transmission to the economy has changed.

The aim of the paper is the estimation of small area labour force indicators like totals of employed and unemployed people and unemployment rates. Small area estimators of these quantities are derived from four multinomial logit mixed models, including a model with correlated time and area random effects. Mean-squared errors are used to measure the accuracy of the estimators proposed and they are estimated by analytic and bootstrap methods. The methodology introduced is applied to real data from the Spanish Labour Force Survey of Galicia.

The assessment of patient-reported outcome measures (PROMs) is of central importance in many areas of research and public policy. Unfortunately, it is quite common for clinical studies to employ different PROMs, thus limiting the comparability of the evidence base that they contribute to. This issue is exacerbated by the fact that some national agencies are now explicit about which PROMs must be used to generate evidence in support of claims for reimbursement. The National Institute for Health and Care Excellence for England and Wales, for instance, has identified in EuroQoL-5D, EQ-5D, the PROM of choice, while accepting the use of a ‘mapping’ approach to predict EQ-5D from other PROMs when EQ-5D data have not been collected. Here we consider the problem of directly predicting EQ-5D responses from ‘Short form 12', while recognizing both the likely dependence between the five dimensions of the EQ-5D responses at the patient level, and the fact that the levels of each health dimension are naturally ordered. We carry out the analysis within a Bayesian framework. We also address the key problem of choosing an appropriate summary measure of agreement between predicted and actual results when analysing PROMs, with particular attention devoted to scoring rules.

Disability and dependence (lack of autonomy in performing common everyday actions) affect health status and quality of life; therefore they are significant public health issues. The main purpose of this study is to use classical multi-dimensional scaling techniques to design dependence profiles for Spanish children between 3 and 6 years old. The data come from the Survey about Disabilities, Personal Autonomy and Dependence Situations, 2008. Two distance (or dissimilarity) functions between individuals are considered: the classical approach using Gower's similarity coefficient and weighted related metric scaling. Both approaches can cope with different types of information (quantitative, multistate categorical and binary variables). However, the Euclidean configurations that are obtained via weighted related metric scaling present a higher percentage of explained variability and higher stability.

Respondent-driven sampling is a widely used method for sampling hard-to-reach human populations by link tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to compute the sampling weights for traditional design-based inference directly, and likelihood inference requires modelling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared with existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of the prevalence of human immunodeficiency virus in a high-risk population.

Maps of the distribution of epidemiological data often ignore surveillance error or possible correlations between missing information and outcomes. We analyse presence–absence data at the household level (12050 points) of a disease-carrying insect in Mariano Melgar, Peru, collected as part of the Arequipan Ministry of Health's efforts to control Chagas disease. We construct a Bayesian hierarchical model to locate regions that are vulnerable to under-reporting due to surveillance error, accounting for variability in participation due to infestation status. The spatial correlation in the data allows us to identify relative inspector sensitivity and to elucidate the relationship between participation and infestation. We show that naive estimates of prevalence would be biased by surveillance error and missingness at random assumptions. We validate our results through simulations and observe how randomized inspector assignments may improve prevalence estimates. Our results suggests that bias due to imperfect observations and missingness at random can be assessed and corrected in prevalence estimates of spatially auto-correlated binary variables.

A multivariate counting process formulation is developed for the quantification of association football event interdependences which permits dynamic prediction as events unfold. We model data from English Premier League and Championship games from the 2009–2010 and 2010–2011 football seasons and assess predictive capacity by using a model-based betting strategy, applied prospectively to available live spread betting prices. Both the scoreline and the bookings status were predictive of match outcome. In particular, the award of a red card led to increased goal rates for the non-penalized team and the home team scoring rate decreased once they were ahead. Overall the betting strategy profited with gains made in the bookings markets.

The paper considers panel data methods for estimating ordered logit models with individual-specific correlated unobserved heterogeneity. We show that a popular approach is inconsistent, whereas some consistent and efficient estimators are available, including minimum distance and generalized method-of-moment estimators. A Monte Carlo study reveals the good properties of an alternative estimator that has not been considered in econometric applications before, is simple to implement and almost as efficient. An illustrative application based on data from the German Socio-Economic Panel confirms the large negative effect of unemployment on life satisfaction that has been found in the previous literature.

The production of legislative acts is affected by multiple sources of latent heterogeneity, due to multilevel and multivariate unobserved factors that operate in conjunction with observed covariates at all the levels of the data hierarchy. We account for these factors by estimating a multilevel Poisson regression model for repeated measurements of bivariate counts of executive and ordinary legislative acts, enacted under multiple Italian governments, nested within legislatures. The model integrates discrete bivariate random effects at the legislature level and Markovian sequences of discrete bivariate random effects at the government level. It can be estimated by a computationally feasible expectation–maximization algorithm. It naturally extends a traditional Poisson regression model to allow for multiple outcomes, longitudinal dependence and multilevel data hierarchy. The model is exploited to detect multiple cycles of legislative supply that arise at multiple timescales in a case-study of Italian legislative production.

Obesity is a rapidly growing public health problem even among the elderly. Understanding the disabling consequences of obesity in the elderly will help us to design better effective intervention management guidelines for the elderly obese. To examine the long-term health consequences of the obese elderly, we present a joint model consisting of two bivariate ordered responses observed at successive time points. The bivariate ordered response model corresponds to the subject's self-reporting health status outcomes including self-rated health and functional status. Although the joint model that we propose is generally suited for use in health and disease research, where the ordered value responses are observed at successive time points, we further extend it by addressing some of the challenges by incorporating the semiparametric features in the ordinal logistic model, by modelling the underlying latent states of health that are associated with self-rated health, by jointly modelling the bivariate ordinal outcomes to mitigate the variability of the single response and by accounting for the non-ignorable missing data due to different reasons through a multinomial logit model. The motivating data were obtained from the Second Longitudinal Study of Aging, which are longitudinal survey data from 1994–2000 providing various useful information on the health status of elderly people. Parameter estimation of our joint model was performed in a Bayesian framework via Markov chain Monte Carlo methods. Analytical results demonstrate the difference in longitudinal patterns of the health outcomes between the two weight groups, validating our hypothesis that different management strategies for the obese elderly should be employed.

We use data on leg before wicket decisions from 1000 test cricket matches to quantify the systematic bias by officials (umpires) to favour home teams. We exploit recent changes in the regulation of test cricket as a series of natural experiments to help to identify whether social pressure from crowds has a causal effect on home bias. Using negative binomial regressions, we find that home umpires favour home teams and that this effect is more pronounced in the later stages of matches.

Randomized controlled trials (RCTs) can provide unbiased estimates of sample average treatment effects. However, a common concern is that RCTs may fail to provide unbiased estimates of population average treatment effects. We derive the assumptions that are required to identify population average treatment effects from RCTs. We provide placebo tests, which formally follow from the identifying assumptions and can assess whether they hold. We offer new research designs for estimating population effects that use non-randomized studies to adjust the RCT data. This approach is considered in a cost-effectiveness analysis of a clinical intervention: pulmonary artery catheterization.