The global historical radiosonde archives date back to the 1920s and contain the only directly observed measurements of temperature, wind, and moisture in the upper atmosphere, but they contain many random errors. Most of the focus on cleaning these large datasets has been on temperatures, but winds are important inputs to climate models and in studies of wind climatology. The bivariate distribution of the wind vector does not have elliptical contours but is skewed and heavy-tailed, so we develop two methods for outlier detection based on the bivariate skew-*t* (BST) distribution, using either distance-based or contour-based approaches to flag observations as potential outliers. We develop a framework to robustly estimate the parameters of the BST and then show how the tuning parameter to get these estimates is chosen. In simulation, we compare our methods with one based on a bivariate normal distribution and a nonparametric approach based on the bagplot. We then apply all four methods to the winds observed for over 35,000 radiosonde launches at a single station and demonstrate differences in the number of observations flagged across eight pressure levels and through time. In this pilot study, the method based on the BST contours performs very well.

Several environmental phenomena can be described by different correlated variables that must be considered jointly in order to be more representative of the nature of these phenomena. For such events, identification of extremes is inappropriate if it is based on marginal analysis. Extremes have usually been linked to the notion of quantile, which is an important tool to analyze risk in the univariate setting. We propose to identify multivariate extremes and analyze environmental phenomena in terms of the directional multivariate quantile, which allows us to analyze the data considering all the variables implied in the phenomena, as well as look at the data in interesting directions that can better describe an environmental catastrophe. Because there are many references in the literature that propose extremes detection based on copula models, we also generalize the copula method by introducing the directional approach. Advantages and disadvantages of the nonparametric proposal that we introduce and the copula methods are provided in the paper. We show with simulated and real data sets how by considering the first principal component direction we can improve the visualization of extremes. Finally, two cases of study are analyzed: a synthetic case of flood risk at a dam (a three-variable case) and a real case study of sea storms (a five-variable case).

]]>Measurements recorded over monitoring networks often possess spatial and temporal correlation inducing redundancies in the information provided. For river water quality monitoring in particular, flow-connected sites may likely provide similar information. This paper proposes a novel approach to principal components analysis to investigate reducing dimensionality for spatiotemporal flow-connected network data in order to identify common spatiotemporal patterns. The method is illustrated using monthly observations of total oxidized nitrogen for the Trent catchment area in England. Common patterns are revealed that are hidden when the river network structure and temporal correlation are not accounted for. Such patterns provide valuable information for the design of future sampling strategies.

]]>Land cover (LC) is a critical variable driving many environmental processes, so its assessment, monitoring, and characterization are essential. However, existing LC products, derived primarily from satellite spectral imagery, each have different temporal and spatial resolutions and different LC classes. Most effort is focused on either fusing a pair of LC products over a small space-time region or on interpolating missing values in an individual LC product. Here, we review the complexities of LC identification and propose a method for fusing multiple existing LC products to produce a single LC record for a large spatial-temporal grid, referred to as spatiotemporal categorical map fusion. We first reconcile the LC classes of different LC products and then present a probabilistic weighted nearest neighbor estimator of LC class. This estimator depends on three unknown parameters that are estimated using numerical optimization to maximize an agreement criterion that we define. We illustrate the method using six LC products over the Rocky Mountains and show the improvement gained by supplying the optimization with data-driven information describing the spatial-temporal behavior of each LC class. Given the massive size of the LC products, we show how the optimal parameters for a given year are often optimal for other years, leading to shorter computing times.

The last few years were particularly volatile for the insurance industry in North America and Europe, bringing a record number of claims due to severe weather. According to a 2013 World Bank study, annual average losses from natural disasters have increased from $50 billion in the 1980s to about $200 billion nowadays. Adaptation to such changes requires early recognition of vulnerable areas and the extent of the future risk due to weather factors. Despite the well-documented impact of climate change on the insurance sector, there exists a relatively limited number of studies addressing the effect of the so-called “normal” extreme weather (i.e., higher frequency and lower individual but high cumulative impact events) on the insurance dynamics. To reduce financial repercussions of such weather events, we develop a nonlinear attribution analysis of integer-valued insurance claims and atmospheric variables. Using data-driven nonparametric procedures, we identify triggering thresholds, or tipping points, leading to an increase in the number of claims. We develop a new data-adaptive method to compare tails of observed and projected weather variables and employ its outcomes to assess future dynamics of insurance claims. We illustrate our approach by application to modeling and forecasting of flood-related home insurance claims in Norway.

]]>Understanding energy consumption patterns of different types of consumers is essential in any planning of energy distribution. However, obtaining individual-level consumption information is often either not possible or too expensive. Therefore, we consider data from aggregations of energy use, that is, from sums of individuals' energy use, where each individual falls into one of *C* consumer classes. Unfortunately, the exact number of individuals of each class may be unknown due to inaccuracies in consumer registration or irregularities in consumption patterns. We develop a methodology to estimate both the expected energy use of each class as a function of time and the true number of consumers in each class. To accomplish this, we use B-splines to model both the expected consumption and the individual-level random effects. We treat the reported numbers of consumers in each category as random variables with distribution depending on the true number of consumers in each class and on the probabilities of a consumer in one class reporting as another class. We obtain maximum likelihood estimates of all parameters via a maximization algorithm. We introduce a special numerical trick for calculating the maximum likelihood estimates of the true number of consumers in each class. We apply our method to a data set and study our method via simulation.

In the analysis of most spatiotemporal processes in environmental studies, observations present skewed distributions. Usually, a single transformation of the data is used to approximate normality, and stationary Gaussian processes are assumed to model the transformed data. The choice of transformation is key for spatial interpolation and temporal prediction. We propose a spatiotemporal model for skewed data that does not require the use of data transformation. The process is decomposed as the sum of a purely temporal structure with two independent components that are considered to be partial realizations from independent spatial Gaussian processes, for each time t. The model has an asymmetry parameter that might vary with location and time, and if this is equal to zero, the usual Gaussian model results. The inference procedure is performed under the Bayesian paradigm, and uncertainty about parameters estimation is naturally accounted for. We fit our model to different synthetic data and to monthly average temperature observed between 2001 and 2011 at monitoring locations located in the south of Brazil. Different model comparison criteria and analysis of the posterior distribution of some parameters suggest that the proposed model outperforms standard ones used in the literature.

]]>No abstract is available for this article.

]]>Stream water temperature is an important factor in determining the impact of climate change on hydrologic systems. Near continuous monitoring of air and stream temperatures over large spatial scales is possible due to inexpensive temperature recorders. However, missing water temperature data commonly occur due to the failure or loss of equipment. Missing data creates difficulties in modeling relationships between air and stream water temperatures. It also imposes challenges if the objective is an analysis, for example, clustering streams in terms of the effect of changes in water temperature. In this work, we propose to use a novel spatial–temporal varying coefficient model to impute missing water temperatures. Modeling the relationship between air and water temperature over time and space increases the effectiveness of imputing the missing water temperatures. A parameter estimation method is developed, which utilizes the temporal covariation in the relationship, borrows strength from neighboring stream sites, and is useful for imputing sequences of missing data. A simulation study is conducted to examine the performance of the proposed method in comparison with several existing imputation methods. The proposed method is applied to cluster streams with missing water temperatures into groups from 156 streams with meaningful interpretations.

]]>The problem of choosing spatial sampling designs for investigating an unobserved spatial phenomenon
arises in many contexts, for example, in identifying households to select for a prevalence survey to study disease burden and heterogeneity in a study region
. We studied randomized inhibitory spatial sampling designs to address the problem of spatial prediction while taking account of the need to estimate covariance structure. Two specific classes of design are *inhibitory designs* and *inhibitory designs plus close pairs*. In an inhibitory design, any pair of sample locations must be separated by at least an inhibition distance *δ*. In an inhibitory plus close pairs design, *n* − *k* sample locations in an inhibitory design with inhibition distance *δ* are augmented by *k* locations each positioned close to one of the randomly selected *n* − *k* locations in the inhibitory design, uniformly distributed within a disk of radius *ζ*. We present simulation results for the Matérn class of covariance structures. When the nugget variance is non-negligible, inhibitory plus close pairs designs demonstrate improved predictive efficiency over designs without close pairs. We illustrate how these findings can be applied to the design of a rolling Malaria Indicator Survey that forms part of an ongoing large-scale, 5-year malaria transmission reduction project in Malawi.

We consider geostatistical regression models to predict spatial variables of interest, where likelihood-based methods are used to estimate model parameters. It is known that parameters in the Matérn covariogram cannot be estimated well, even when increasing amounts of data are collected densely in a fixed domain. Although a best linear unbiased predictor has been proposed when model parameters are known, a predictor with estimated parameters is nonlinear and may be not the best in practice. Therefore, we propose an adjusted procedure for the likelihood-based estimates to improve the predicted ability of the nonlinear spatial predictor. The adjusted parameter estimators based on minimizing a corrected Stein's unbiased risk estimator tend to have less bias than the conventional likelihood-based estimators, and the resulting spatial predictor is more accurate and more stable. Statistical inference for the proposed method is justified both theoretically and numerically. To verify the practicability of the proposed method, a groundwater data set in Bangladesh is analyzed.

]]>This work presents a case study about the evaluation of the water quality dynamics in each of the 56 major catchments in Scotland, for a period of 10 years. Data are obtained by monthly sampling of water contaminants, in order to monitor discharges from the land to the sea. We are interested in the multivariate time series of ammonia, nitrate, and phosphorus. The time series may present issues that make their analysis complex: non-linearity, non-normality, weak dependency, seasonality, and missing values. The goals of this work are the classification of the observations into a small set of homogeneous groups representing ordered categories of pollution, the detection of change-points, and the modeling of data heterogeneity. These aims are pursued by developing a novel spatio-temporal hidden Markov model, whose hierarchical structure was motivated by the data set to study: the observations are displayed on a cylindrical lattice and driven by an anisotropic and inhomogeneous hidden Markov random field. As a result, four hidden states were selected, showing that catchments could be grouped spatially, with a strong relationship with the dominating land use. This method represents a useful tool for water managers to have a nationwide picture in combination of temporal dynamics.

]]>Bayesian networks (BNs) have been widely applied in environmental modelling to predict the behavior of an ecosystem under conditions of change. However, this approximation doesn't take time into consideration. To solve this issue, an extension of BNs, the dynamic Bayesian networks (DBNs), has been developed in mathematics and computer science areas but has scarcely been applied in environmental modelling. This paper presents the application of DBN to water reservoir systems in Andalusia, Spain. The aim is to predict changes in the percent fullness of the reservoirs under the irregular rainfall patterns of Mediterranean watersheds. In comparison to static BNs, DBNs provide results that can be extrapolated to a particular time so that a climate change scenario can be studied in detail over time. Because results are expressed by density functions rather than unique values, several metrics are obtained from the results, including the probability of certain values. This allows the probability that water level in a reservoir reaches a certain level to be directly computed.

]]>