## 1. Introduction

Prior to the current era of satellite data acquisition, the main source of information on sea-surface temperatures (SST) came from the logs of ships of opportunity. These records stretch back to the mid 19th century, making them a tantalizing source of information about climate variability on interannual and decadal time-scales. However, the temporal and spatial inhomogeneity of these data make them difficult to use in standard statistical analysis procedures. Gridded fields of SST are also needed for initialization and verification of ocean models and as time-dependent boundary conditions in atmospheric models. As a result, interpolation schemes for infilling these sparse data are tremendously important in climate research.

One popular approach to the interpolation of historical datasets is reduced-space estimation (Shriver and O’Brien, 1995; Smith *et al.*, 1996; Kaplan *et al.*, 1998, 2000; Rayner *et al.*, 2003). One of its advantages over more conventional methods such as simple kriging (which uses a stationary, localized covariance function) is its emphasis on the reconstruction of the largest and most energetic spatial scales over the entire domain of interest. This is a natural advantage for modelling climate variables such as SST, because the dynamics of the climate system often result in global-scale coherency.

It is worth considering why reduced-space estimation has been so useful in climate applications. For climate variables that possess a large spatial dimension, the relative temporal ‘shortness' of the reliable observational data record that can be used for computing a sample covariance matrix often leads to rank-deficiency. Assuming a lower dimensionality of the system via truncation of the less energetic eigenvectors of the covariance matrix can circumvent this problem. Another advantage of reduced-space techniques becomes evident when the data used for reconstruction are clustered in limited areas, leaving large regions completely unobserved. Under these circumstances, inference in the interiors of the unsampled regions would be imprudent using methods that rely solely on local spatial estimation methods.

The disadvantage of using a reduced-space technique for interpolation is that there is no guarantee that the patterns of covariability that dominate within smaller subregions of the global domain will be well represented. The truncation of trailing eigenvectors necessarily excludes some structures that are better suited to local estimation techniques. Ideally, a reconstruction methodology would draw from the strengths of both types of interpolation, with the aim of representing behaviour over a range of spatial scales.

We restrict our focus to the statistical modelling and reconstruction of SST anomalies in the northern hemisphere Atlantic Ocean. We present a method to augment an existing historical SST reconstruction that uses a reduced-space Kalman smoother (Kaplan *et al.*, 1998) to capture what we will term the ‘global-scale’ or ‘large-scale’ modes of variability. The contribution of this work is to model and reconstruct what we will term ‘mid-scale’ variability. For the remainder of this article, we will use the terms global and mid-scale to distinguish between variability captured by the reduced-space technique and the more locally dominant variability on which we are focused.

The separation into global and mid-scales is not based on physical processes. No objective criteria for parsing between these covariance models is used, nor do we mean to imply that a given length-scale of covariability will be uniquely contained in either model. In the context of this study, mid-scales can be interpreted as the most dominant local variability not captured by the globally-based reduced space reconstruction.

Section 2 describes the historical temperature data, extending back to 1850, that are used in this reconstruction. Section 3 gives a brief description of the reduced-space Kalman smoother that was used in the published reconstruction of the large-scale SST anomalies. As we discuss, there are subjective choices that go into reduced-space techniques and our definition of mid-scales is implicitly impacted by these. Given this caveat, it is still instructive to note that the mid-scales tend to have geographic coherency of the order of 500–1300 km.

There are two main areas of emphasis in this work. They are (1) the statistical modelling of our prior knowledge of the mid-scale variability not present in the established reduced space reconstruction and (2) description of the mid-scale reconstruction in terms of the mean, covariance and samples from the posterior distribution. Section 4 outlines the statistical procedure that we use to form the posterior distribution for our mid-scale reconstruction. In section 5 we present our model for the covariance of the mid-scale variability. We employ a novel covariance parametrization developed by Paciorek and Schervish (2006) that allows for non-stationarity in the length-scales and anisotropy of the spatial correlation functions. This parametrization gives our model the flexibility to capture geographic variation in the underlying covariability of SST anomalies while still ensuring a positive-definite covariance matrix defined over the entire domain. This is a useful feature for analyses of SST in the northern Atlantic Ocean basin, where the dominant physical processes vary over the domain.

We verify the statistical model in section 6 and section 7 presents a selection of the resultant reconstructions. Because the quantification and representation of uncertainty has become an area of increased interest within the climate research community (Rayner *et al.*, 2009), we pay special attention to the uncertainty estimates implied by the posterior distribution. Specifically, we note the temporal evolution of the uncertainty due to changes in data availability through time and the spatial correlations inherent in the posterior distributions. We conclude in section 8 with a discussion of some of the broader issues relevant to this work, some of its limitations and prospects for its extension.