Dual-frequency identification sonar delivers video-like underwater images which allow the investigation of fish behaviour even in cloudy and muddy water. Generally, images are recorded in a resolution of up to 10 pictures per second, so that practically one obtains a video of underwater movements. These videos allow ecologists to observe, count or investigate fish behaviour. We focus on automatic classification of fish based on such sonar videos. After appropriate preprocessing of the videos, we show how we can count and classify fish into different species on the basis of their shape and movement. The procedures developed work in realtime, i.e. data processing and classification of video sequences are faster than the length of the video sequences themselves.

Civil unrest is a complicated, multifaceted social phenomenon that is difficult to forecast. Relevant data for predicting future protests consist of a massive set of heterogeneous sources of data, primarily from social media. Using a modular approach to extract pertinent information from disparate sources of data, we develop a spatiotemporal multiscale framework to fuse predictions from algorithms mining social media. This novel multiscale spatiotemporal model is developed to satisfy four essential requirements: be scalable to handle massive spatiotemporal data sets, incorporate hierarchical predictions, accommodate predictions of differing quality and uncertainty, and be flexible, allowing revisions to existing algorithms and the addition of new algorithms. The paper details the challenges that are posed by these four requirements and outlines the benefits of our novel multiscale spatiotemporal model relative to existing methods. In particular, our multiscale approach coupled with an efficient sequential Monte Carlo framework enables scalable rapid computation of richly specified Bayesian hierarchical models for spatiotemporal data.

Tissue samples from the same tumour are heterogeneous. They consist of different subclones that can be characterized by differences in DNA nucleotide sequences and copy numbers on multiple loci. Inference on tumour heterogeneity thus involves the identification of the subclonal copy number and single-nucleotide mutations at a selected set of loci. We carry out such inference on the basis of a Bayesian feature allocation model. We jointly model subclonal copy numbers and the corresponding allele sequences for the same loci, using three random matrices, **L**,** Z** and **w**, to represent subclonal copy numbers (**L**), the number of subclonal variant alleles (**Z**) and the cellular fractions (**w**) of subclones in one or more tumour samples respectively. The unknown number of subclones implies a random number of columns. More than one subclone indicates tumour heterogeneity. Using simulation studies and a real data analysis with next generation sequencing data, we demonstrate how posterior inference on the subclonal structure is enhanced with the joint modelling of both structure and sequencing variants on subclonal genomes. An R package is available from http://cran.r-project.org/web/packages/BayClone2/index.html.

In phase I clinical trials with cytostatic agents, the typical objective is to identify the optimal biological dose, which should be tolerable as well as achieving the highest effectiveness. Towards this goal, we consider binary toxicity and efficacy end points simultaneously and develop a two-stage Bayesian adaptive design. Stage 1 searches for the maximum tolerated dose by using a beta–binomial model in conjunction with a probit model, for which decision making is based on the model that fits the toxicity data better. Stage 2 identifies the optimal biological dose while still controlling the level of toxicity. We enumerate all the possibilities that each of the admissible doses may deliver the highest effectiveness so that the dose–efficacy curve is allowed to be increasing, decreasing or concave. We conduct simulation studies to examine the ability of the proposed method to pinpoint both the maximum tolerated dose and the optimal biological dose and demonstrate the design's satisfactory performance with the BKM120 and cetuximab phase I clinical trials.

The paper develops a parametric variant of the Machado–Mata simulation methodology to examine quantile wage differences between groups of workers, with an application to the wage gap between native and foreign workers in Luxembourg. Relying on conditional-likelihood-based ‘parametric quantile regression’ in place of the standard linear quantile regression is parsimonious and cuts computing time drastically with no loss in the accuracy of marginal quantile simulations in our application. We find that the native worker advantage is a concave function of quantile: the advantage is small (possibly negative) for both low and high quantiles, but it is large for the middle half of the quantile range (between the 20th and 70th native wage percentiles).

Motivated by an imaging study, the paper develops a non-parametric testing procedure for testing the null hypothesis that two samples of curves observed at discrete grids and with noise have the same underlying distribution. The objective is to compare formally white matter tract profiles between healthy individuals and multiple-sclerosis patients, as assessed by conventional diffusion tensor imaging measures. We propose to decompose the curves by using functional principal component analysis of a mixture process, which we refer to as *marginal functional principal component analysis*. This approach reduces the dimension of the testing problem in a way that enables the use of traditional non-parametric univariate testing procedures. The procedure is computationally efficient and accommodates different sampling designs. Numerical studies are presented to validate the size and power properties of the test in many realistic scenarios. In these cases, the test proposed has been found to be more powerful than its primary competitor. Application to the diffusion tensor imaging data reveals that all the tracts studied are associated with multiple sclerosis and the choice of the diffusion tensor image measurement is important when assessing axonal disruption.

The analysis of car crash output parameters such as firewall intrusion points assist the overall engineering process. Such data are nowadays collected from many numerical simulations and it is not possible for the engineer to analyse this growing amount of data by hand. Therefore, data mining and statistical methods are needed. Here, we propose to use the flexible class of regular vine (*R*-vine) copulas for modelling the dependence between such output variables. *R*-vine copulas are multivariate copulas constructed hierarchically from bivariate copulas as building blocks. We introduce the concept of such constructions and their graphical tree representation. Applied to simulated frontal crash data of a Ford Taurus such graphs help us to illustrate the dependence structure among different firewall intrusion locations. The big advantage of *R*-vines compared with standard approaches such as the multivariate normal distribution or the multivariate Gaussian copula is the ability to model asymmetries and dependence in the tails. Our application demonstrates the strong potential of *R*-vines in the engineering context and opens further application areas.

Quantitative fitness analysis (QFA) is a high throughput experimental and computational methodology for measuring the growth of microbial populations. QFA screens can be used to compare the health of cell populations with and without a mutation in a query gene to infer genetic interaction strengths genomewide, examining thousands of separate genotypes. We introduce Bayesian hierarchical models of population growth rates and genetic interactions that better reflect QFA experimental design than current approaches. Our new approach models population dynamics and genetic interaction simultaneously, thereby avoiding passing information between models via a univariate fitness summary. Matching experimental structure more closely, Bayesian hierarchical approaches use data more efficiently and find new evidence for genes which interact with yeast telomeres within a published data set.

The paper investigates angular regressions that express the angles of an animal's motion in terms of time varying directions and distances to environmental features that could influence its displacement. The mean direction proposed is a compromise between several possible targets. Conditions for the identifiability of the regression parameters are provided. Maximum likelihood estimators for the parameters are derived under two von Mises error structures. Robust sandwich estimators of the parameter variance–covariance matrix are obtained. The statistical methodology proposed is first used to reanalyse a classical data set on periwinkle movement. A second application investigates how bison trails are shaped by meadows and canopy gaps in Saskatchewan's Prince Albert National Park.

This study proposes a two-stage approach to characterize individual developmental trajectories of health risk behaviours and to delineate their time varying effects on short-term or long-term health outcomes. Our model can accommodate longitudinal covariates with zero-inflated counts and discrete outcomes. The longitudinal data of a well-known study of youths at high risk of substance abuse are presented as a motivating example to demonstrate the effectiveness of the model in delineating critical developmental periods of prevention and intervention. Our simulation study shows that the performance of the model proposed improves as the sample size or number of time points increases. When there are excess 0s in the data, the regular Poisson model cannot estimate either the longitudinal covariate process or its time varying effect well. This result, therefore, emphasizes the important role that the model proposed plays in handling zero inflation in the data.

Heatwaves are phenomena that have large social and economic consequences. Understanding and estimating the frequency of such events are of great importance to climate scientists and decision makers. Heatwaves are a type of extreme event which are by definition rare and as such there are few data in the historical record to help planners. Extreme value theory is a general framework from which inference can be drawn from extreme events. When modelling heatwaves it is important to take into account the intensity and duration of events above a critical level as well as the interaction between both factors. Most previous methods assume that the duration distribution is independent of the critical level that is used to define a heatwave: a shortcoming that can lead to incorrect inferences. The paper characterizes a novel method for analysing the temporal dependence of heatwaves with reference to observed temperatures from Orleans in central France. This method enables estimation of the probabilities for heatwave events irrespectively of whether the duration distribution is independent of the critical level. The methods are demonstrated by estimating the probability of an event more severe than the 2003 European heatwave or an event that causes a specified increase in mortality.

We study the application of Bayesian spatial modelling to seismic tomography, a geophysical, high dimensional, linearized inverse problem that infers the three-dimensional structure of the Earth's interior. We develop a spatial dependence model of seismic wave velocity variations in the Earth's mantle based on a Gaussian Matérn field approximation. Using the theory of stochastic partial differential equations, this model quantifies the uncertainties in the parameter space by means of the integrated nested Laplace approximation. In resolution tests using simulated data and in inversions using real data, our model matches the performance of conventional deterministic optimization approaches in retrieving three-dimensional structure of the Earth's mantle. In addition it delivers estimates of the full parameter covariance matrix. Our model substantially improves on previous work relying on Markov chain Monte Carlo methods in terms of statistical misfits and computing time.

This work is concerned with understanding common population level effects of stroke on motor control while accounting for possible subject level idiosyncratic effects. Upper extremity motor control for each subject is assessed through repeated planar reaching motions from a central point to eight prespecified targets arranged on a circle. We observe the kinematic data for hand position as a bivariate function of time for each reach. Our goal is to estimate the bivariate function-on-scalar regression with subject level random functional effects while accounting for potential correlation in residual curves; covariates of interest are severity of motor impairment and target number. We express fixed effects and random effects by using penalized splines, and we allow for residual correlation by using a Wishart prior distribution. Parameters are jointly estimated in a Bayesian framework, and we implement a computationally efficient approximation algorithm using variational Bayes methods. Simulations indicate that the method proposed yields accurate estimation and inference, and application results suggest that the effect of stroke on motor control has a systematic component observed across subjects.

Evaluation of large-scale intervention programmes against human immunodeficiency virus (HIV) is becoming increasingly important, but impact estimates frequently hinge on knowledge of changes in behaviour such as the frequency of condom use over time, or other self-reported behaviour changes, for which we generally have limited or potentially biased data. We employ a Bayesian inference methodology that incorporates an HIV transmission dynamics model to estimate condom use time trends from HIV prevalence data. Estimation is implemented via particle Markov chain Monte Carlo methods, applied for the first time in this context. The preliminary choice of the formulation for the time varying parameter reflecting the proportion of condom use is critical in the context studied, because of the very limited amount of condom use and HIV data available. We consider various novel formulations to explore the trajectory of condom use over time, based on diffusion-driven trajectories and smooth sigmoid curves. Numerical simulations indicate that informative results can be obtained regarding the amplitude of the increase in condom use during an intervention, with good levels of sensitivity and specificity performance in effectively detecting changes. The application of this method to a real life problem demonstrates how it can help in evaluating HIV interventions based on a small number of prevalence estimates, and it opens the way to similar applications in different contexts.

Dose finding methods aiming at identifying an optimal dose of a treatment with a given schedule may be at a risk of misidentifying the best treatment for patients. We propose a phase I–II clinical trial design to find the optimal dose–schedule combination. We define schedule as the method and timing of administration of a given total dose in a treatment cycle. We propose a Bayesian dynamic model for the joint effects of dose and schedule. The model proposed allows us to borrow strength across dose–schedule combinations without making overly restrictive assumptions on the ordering pattern of the schedule effects. We develop a dose–schedule finding algorithm to allocate patients sequentially to a desirable dose–schedule combination, and to select an optimal combination at the end of the trial. We apply the proposed design to a phase I–II clinical trial of a *γ*-secretase inhibitor in patients with refractory metastatic or locally advanced solid tumours, and we examine the operating characteristics of the design through simulations.

Delivering radiation to eradicate a solid tumour while minimizing damage to nearby critical organs remains a challenge. For oesophageal cancer, radiation therapy may damage the heart or lungs, and several qualitatively different, possibly recurrent toxicities that are associated with chemoradiation or surgery may occur, each at two or more possible grades. We describe a Bayesian group sequential clinical trial design, based on total toxicity burden (TTB) and the duration of progression-free survival, for comparing two radiation therapy modalities for oesophageal cancer. Each patient's toxicities are modelled as a multivariate doubly stochastic Poisson point process, with marks identifying toxicity grades. Each grade of each type of toxicity is assigned a severity weight, elicited from clinical oncologists who are familiar with the disease and treatments. TTB is defined as a severity-weighted sum over the different toxicities that may occur up to 12 months from the start of treatment. Latent frailties are used to formulate a multivariate model for all outcomes. Group sequential decision rules are based on posterior mean TTB and progression-free survival time. The design proposed is shown to provide both larger power and smaller mean sample size when compared with a conventional bivariate group sequential design.

The publication of a projected path of future policy decisions by central banks is a controversially debated method to improve monetary policy guidance. The paper proposes a new approach to evaluate the effect of the guidance strategy on the predictability of monetary policy. The empirical investigation is based on jump probabilities of Norwegian interest rates on announcement days of the Norges Bank *before* and *after* the introduction of quantitative guidance. Within the standard semimartingale framework, we propose a new methodology to detect jumps. We derive a representation of the quadratic variation in terms of a wavelet spectrum. An adaptive threshold procedure on wavelet spectrum estimates aims at localizing jumps. Our main empirical result indicates that quantitative guidance significantly improves the predictability of monetary policy.

We compare two binary diagnostic tests when each subject is measured more than once with each test and with a gold standard. We introduce a new model that allows the correlation between two measurements on a single subject by the same test to be different from the correlation between two measurements by different tests. We show that moment estimators of the population parameters for the mean sensitivities and specificities are virtually identical to the maximum likelihood estimates from our random-effects model. We apply the model to data comparing two rapid malaria tests and provide guidance for choosing the number of subjects and repeated measurements.

Varying-coefficient models provide a useful tool to explore the dynamic pattern in various fields of science, such as epidemiology, medical research and ecology. Crucial issues arise in assessing the dose–response relationship from flexible dose clinical trial data: the current response is affected by not only the current dose level but also past dose levels, i.e. there is a time lag in the effectiveness of treatment and, also, there is considerable variability between subjects. To address these issues, we propose a novel non-linear varying-coefficient model, called a mixed effects historical varying-coefficient model (MEHVCM), for estimating dose–response curves in longitudinal flexible dose trials. This model enables us to describe historical effectiveness curves and subject-specific curves. Unknown parameters included in the MEHVCM are estimated by the maximum penalized likelihood method along with the EM algorithm. Monte Carlo experiments are conducted to investigate the performance of the MEHVCM for evaluating dose–response relationships in flexible dose trials. We apply the proposed model to the analysis of data from a multiple-sclerosis clinical trial.