Multivariate spatio-temporal modelling for assessing Antarctica's present-day contribution to sea-level rise

Antarctica is the world's largest fresh-water reservoir, with the potential to raise sea levels by about 60 m. An ice sheet contributes to sea-level rise (SLR) when its rate of ice discharge and/or surface melting exceeds accumulation through snowfall. Constraining the contribution of the ice sheets to present-day SLR is vital both for coastal development and planning, and climate projections. Information on various ice sheet processes is available from several remote sensing data sets, as well as in situ data such as global positioning system data. These data have differing coverage, spatial support, temporal sampling and sensing characteristics, and thus, it is advantageous to combine them all in a single framework for estimation of the SLR contribution and the assessment of processes controlling mass exchange with the ocean. In this paper, we predict the rate of height change due to salient geophysical processes in Antarctica and use these to provide estimates of SLR contribution with associated uncertainties. We employ a multivariate spatio-temporal model, approximated as a Gaussian Markov random field, to take advantage of differing spatio-temporal properties of the processes to separate the causes of the observed change. The process parameters are estimated from geophysical models, while the remaining parameters are estimated using a Markov chain Monte Carlo scheme, designed to operate in a high-performance computing environment across multiple nodes. We validate our methods against a separate data set and compare the results to those from studies that invariably employ numerical model outputs directly. We conclude that it is possible, and insightful, to assess Antarctica's contribution without explicit use of numerical models. Further, the results obtained here can be used to test the geophysical numerical models for which in situ data are hard to obtain. © 2015 The Authors. Environmetrics published by John Wiley & Sons Ltd.


INTRODUCTION
Antarctica is the largest fresh-water resource on Earth and, potentially, the largest contributor to global sea-level rise (SLR). Its ice sheet is widely believed to be losing on the order of 50 Gt year 1 (Shepherd et al., 2012;Gunter et al., 2014;Rignot et al., 2008), and this rate is likely to increase (Rignot et al., 2011b). SLR has global implications and accurate assessment of present-day rates, sources and their uncertainties are essential for prediction, and hence the development and implementation of effective urban, eco-preservation and flood defence strategies .
A review of the difficulties associated with estimating the SLR contribution from Antarctica and the assumptions of current approaches that tackle this problem is given in Zammit-Mangion et al. (2014). The primary difficulty is that of under-determination (lack of mixture diversity), because the number of observed linear combinations (using satellite and global positioning system (GPS) instruments) is less than the number of processes influencing the instruments. Zammit-Mangion et al. (2014) showed that under-determination, for this problem, can be tackled using a hierarchical modelling framework (Cressie and Wikle, 2011). Their approach, which as a proof of concept focused on only a part of the Antarctic ice sheet, took advantage of differing spectral characteristics to improve the separation of the individual processes

HIERARCHICAL MODELLING
We will follow a hierarchical modelling approach to the problem, as exemplified in works such as Berliner et al. (2000) and Wikle and Berliner (2007). The interested reader is referred to Cressie and Wikle (2011) for a comprehensive review. In summary, the model we will define is structured with (i) an observation layer detailing the interactions between the instruments and the processes under investigation; (ii) the process layer containing information on the spatio-temporal nature of the processes; and (iii) the parameter layer encoding prior beliefs on unknowns, which appear in any of the first two layers.
The nature of the observations in this application are sufficiently complex to warrant a split of the top layer into two sublayers. The first layer identifies interactions that may occur between the observations themselves. This is primarily included to cater for averaging effects occurring in the observations with large spatial footprint. The second layer describes the measurement mapping between the process layer and the instrument itself. The 'process' we consider is a superposition of a multivariate spatio-temporal model and a fine-scale effect. Although the fine-scale effect is not of direct interest, it is clearly present in the altimetric observations and thus needs to be considered.
In this work, we will adopt a hybrid approach where we take advantage of the geophysical numerical models to estimate some of the parameters and infer other parameters from data using a Markov chain Monte Carlo (MCMC) scheme. The fourth layer describes prior distributions over these latter parameters, which appear both in the first observation layer and in the component describing the fine-scale variation. A graphical model showing all parameters (both estimated and random), processes and data sets is given in Figure 1 for use as reference throughout this section.

Data preprocessing and the data model
There are, broadly, two satellite-based techniques used when assessing the SLR contribution of the Antarctic ice sheet. The first, satellite altimetry, is a technique for measuring ice-sheet height change, while the second, gravimetry, is a technique for measuring changes in the Earth's gravity field, the geoid, which can be used to infer mass changes. While the former produces point-referenced data, the latter is a potential field that generally produces large spatial footprints (often expressed as a spherical harmonic expansion). In this work, we made use of the most informative data sets available to date, from the Variables on the left are parameters pertaining to the underlying spatio-temporal processes. Â ı denotes the fine-scale variation (ı.s/) parameters, O Â S , parameters relating to the surface and firn processes (X S;t .s/ and X F;t .s/), O R the interaction between these two processes,ˇI the weightings of covariates associated with the model for height change due to ice dynamics (X I;t .s/) and O Â GIA parameters appearing in the model for glacio-isostatic adjustment (X GIA .s/). Variables in the middle denote the spatio-temporal processes, while variables on the right denote the observations (filled circles). The parameters Â and G are parameters appearing in the observation equation for z G . Rectangles denote groupings: the altimetry data sets z E and z I are treated identically (same observation characteristics), while X S;t .s/ and X F;t .s/ are, jointly, a multivariate spatiotemporal process. Dashed circles enclose the parameters estimated using maximum likelihood from data/numerical models prior to computing the Bayesian update. The pseudo observations are omitted for clarity  Figure 2, and the role of the data sets used (excluding the pseudo-data) is summarised in Table 1.

First data layer
We use the first data layer to describe the mapping between the observations and the processes. Denote the ICESat rates at year t as z I;t , the EnviSat rates as z E;t , the GRACE rates as z G;t , the GPS rates as z GPS;t  i T . Then, we can write the first data layer as z t D P.Â /y t C e t ; t D 0 : : : T where y t D h y T I;t ; y T E;t ; y T G;t ; y T GPS;t i T is defined in Section 2.1.2 and P.Â / D bdiag.I I ; I E ; P G .Â /; I GPS /, where bdiag. / places its arguments along a block diagonal. We assume that the altimetry and GPS observations are point-referenced independent records of the first layer, hence the use of the identity matrix I i . On the other hand, it is well known that the effective resolution of GRACE is less than that supplied (Luthcke et al., 2013), and the role of P G .Â / is to cater for this observation interaction. P G .Â / is thus defined as Table 1. Observation data sets employed, together with type of spatial footprint, the processes which they are able to detect changes in and the number of values available for the temporal horizon under consideration (2003( , GRACE 2004 (2) where .#i/ denotes the number of neighbours of observation i and denotes 'neighbour of'. Equation (2) describes the proportion of signal .#i/Â that should be attributed to the spatial regions associated with the neighbouring observations. If .#i/Â exceeds 0.9 (indicative of poor localisation), the observation is assumed to be an equal average of itself and its neighbours. Two GRACE observations are marked as neighbours if their geometric centres are distanced by less than 450 km, see Figure 2 (right).
For the GRACE signal, preprocessing techniques might also induce an under-estimation on the variance of e G;t (Zammit-Mangion et al., 2014). To allow for this, we redefine e G;t WD e G;t where e G;t is the original supplied error and G is an inflation parameter, which, together with Â (the strength of neighbourhood interaction), needs to be estimated.

Second data layer
The second data layer models the relationship between the processes, fine-scale variation ı t .s/ and the data. There are four dominating physical processes that need to be considered when estimating SLR contribution. (i) The first is a solid-Earth process known as glacioisostatic adjustment (GIA), the response of the lithosphere to past changes in ice loading that took place, primarily, at the end of the last glacial, around 12 KYr ago; (ii) on top of the bedrock sits the ice, which constantly flows from the interior to the ocean; (iii) between the surface snow and the ice at depth lies a firn layer that has a density between that of snow and ice and that is constantly compacting; (iv) at the surface, the mass balance is affected mostly by changes in precipitation, sublimation and (limited) melt-water run-off. The role of these layers and the way they interact with the instrumentation are discussed in greater detail in Zammit-Mangion et al. (2014). In the current geophysical setting, the fine-scale variation is due to localised (fine-scale) precipitation anomalies (Isaksson and Karlen, 1994), which are most evident in areas of rapidly varying topography (Richardson et al., 1997).
The model for the second layer is as follows where the fX i;t g denote the change in height (in m year 1 ) due to processes that contain some spatio-temporal structure (as opposed to ı t ); the overall surface process anomaly (S ), firn densification (F ), ice dynamics/subshelf melting (I ) and GIA (GIA), respectively. Note that because we are operating in a multivariate setting, we have not modelled fine-scale variation to be part of a particular process of interest (e.g. , so that fine-scale effects from all processes, if present, are captured in ı t .s/. Each element in the vectors of linear operators A i;t maps the physical process X i;t .s/ or the fine-scale variation into the observation space. Each element in A i;t depends on the type and the location of the observation. For the data sets described earlier, it is sufficient to let each element in A i;t say A j i;t Á to be an integral operator of the form for i 2 fS; F; I; GIA; ıg, where j is the observation footprint, f j i;t .s/ / 1 or 1=j j j in the case of a purely accumulating or averaging observation, respectively, and f j i;t .s/ D ı.s s j;t / in the case of a point-referenced observation at location s j at time t . For the altimetry data sets, pseudo-data sets and GPS, because their spatial support is relatively small, we assumed point-referenced observations and thus let j D (the entire domain) and f j i;t .s/ D ı.s s j;t / for all processes. For the GPS and pseudo observations, we let f j ı;t .s/ D 0 for each j because these observations are not affected by the fine-scale process.
For GRACE, we let f j i;t .s/ D i .s/ for each i 2 fI; S; F; GIA; ıg, where each i .s/ is a spatially varying density map used to convert the height change to the detected mass change. I .s/ WD I is the density of ice (917 kg m 3 ), S .s/ is the density at which surface process fluctuations occur (spatially varying between 300 and 600 kg m 3 ) obtained from the Regional Atmospheric Climate Model (RACMO, Lenaerts et al., 2012), F .s/ WD F D 0 because firn compaction is a mass-preserving process and GIA .s/ WD GIA is the effective mantle density (3800 kg m 3 ) (RACMO, used to set S .s/, is a regional climate model that reconstructs, at a resolution of 27 km, various surface processes from global climate reanalysis data. It will also be used in Section 3 to provide information on the spatial-temporal chatacteristics of the surface processes). For GRACE and i 2 fS; F; I g, we also define j as the intersection between the observation footprint and the continent's grounding line, because height loss by all processes (except GIA) do not contribute to changes in the geoid beyond this line; see Zammit-Mangion et al. (2014) for details. We evaluate the integral for GRACE by fine gridding the footprint domain although Monte Carlo integration can also be used (Katzfuss and Cressie, 2011).
Rewriting A j ı;t ı t .s/ as a Wiener integral, it is straightforward to show that the observed fine-scale variation at the j th observation is distributed as where .s/ is the spatially varying variance of the uncorrelated process. For homogeneous fine-scale variation, when averaging over the footprint area, var A j i;t ı t .s/ Á scales as 1=j j j while when accumulating (as is the case with GRACE), as j j j. The latter implies, at first sight, that this variation can have a large impact on the GRACE readings. However, placing realistic values for the fine-scale variation, which roughly correspond to the maximum a posterior estimates in Section 3.2, yielded a variation that was three orders of magnitude lower than that of the GRACE signal itself var e i G . Because of this parameter insensitivity, we omitted the effect of fine-scale variation on the GRACE readings by setting f j ı;t .s/ D 0 for these data. This assumption simplifies estimation of the fine-scale variation parameters that are, in any case, mostly informed through the altimetry data.

The process layer
The third layer describes the evolution of the multivariate spatio-temporal model. Three of the variates are spatio-temporal, while the fourth, GIA, is spatial (repeatedly observed at each time point). For all the three spatio-temporal processes, we assume a spatio-temporal auto-regressive process of the general form where a i .s/ is a possibly spatially varying auto-regressive parameter, w t .s/ GP .0; k i .s; rI Â i //, where GP. . /; k. ; // denotes a Gaussian process (Rasmussen and Williams, 2006) with mean . / and kernel k. ; /, where k is not necessarily spatially invariant, d T t D OEd 1;t ; d 2;t : : : are covariates andˇ.s/ the respective spatial weightings, which need to be estimated. We now discuss the models for each of the spatio-temporal processes in turn.

Ice dynamics/subshelf melting
Height changes due to ice dynamics, and subshelf melting is assumed to be relatively smooth over the short time scales considered, that is they follow a linear trend. This assumption follows from studies that show approximately linear temporal variation in ice discharge (Rignot et al., 2011b). The ensuing model for this process is whereˇI .s/ 2 R 2 represents the spatially varying mean change and accelerating change in height, respectively, due to ice dynamics. We assume that bothˇI .s/ and the disturbance w I;t .s/ have small spatial scales and are assumed spatially uncorrelated following discretisation. Prior distributions forˇI .s/ are given in Section 2.3. Fully specifying the statistics of w I;t is crucial to avoid confounding with the fine-scale variation (which is mostly due to precipitation anomalies). We thus take cov.w I;t .s/; w I;t 0 .s// D 0:01 2 (i.e. we allow for confounding on scales of 1 cm year 1 if t D t 0 and 0 otherwise. This choice also reflects the prior assumed (temporal) smoothness of height change due to ice dynamics and subshelf melting.

Surface processes
Changes in height due to surface processes fluctuate rapidly at large (ice sheet) scales (Rignot et al., 2011b). To check whether the same holds at small (20 km grid) scales, we fitted an auto-regressive(1) model to each spatial point from anomalies with respect to the 30-year mean for the output of the forward model RACMO. The auto-regressive coefficient was found to be positive in some places and negative in others. This variation, in addition to spatially varying amplitudes and length scales evident from RACMO, suggested that a heterogeneous model needs to be fitted to this process. As in Zammit-Mangion et al. (2014), we thus adopted the technique of fitting local homogeneous models on a coarsely gridded domain and then allowing the parameters to vary smoothly between the grid cells.
Locally, for each coarse-gridded cell, we assume that the surface process follows the local interactions model where a S serves to define a random-walk space-time process (Stroud et al., 2001;Zammit-Mangion et al., 2012). Note that this is the autoregressive interpretation of a space-time Gaussian field with separable covariance function k.u; / D k 1 .u/k 2 ./ where u D ks rk, D jt t 0 j and k 2 ./ D a S . It can be shown (Storvik et al., 2002) that in this case var.w S;t .s// D .1 a 2 S /k 1 .u/. For each cell, we let k 1 . / (the spatial covariance function) be a Matérn function given by where 2 S is the marginal variance, Ä S is the scaling parameter, S is the smoothness parameter and K S is the modified Bessel function of the second kind. Use of the Matérn is motivated by the dimensionality reduction methods employed in Section 2.4. The unknown parameters

Firn processes
Changes in height due to surface and firn processes are highly anticorrelated (more precipitation implies more compaction and vice versa). We capture this interaction by extending the spatial coregionalisation approach in Finley et al. (2007); Zammit-Mangion et al. (2014) to its spatio-temporal equivalent, termed the spatio-temporal linear model of coregionalisation by Choi et al. (2009). As in the spatial case, we model X F;t .s/ as We can thus express the joint distribution as where R is a 2 2 correlation matrix composed of scalars b and c defining the point-wise interactions. Note that this definition of R is a scaled version of that in Zammit-Mangion et al. (2014) because here the marginal variance of the process is included in the definition of k 1 . /. It can be shown, by comparing second moments, that the Gaussian field (12) is expressed in auto-regressive form as where .w S .s/; w F .s// GP.0; .1 a 2 S /OER˝k 1 . // It is apparent here that coregionalisation does not induce off-diagonal elements in the propagation matrix. Interactions are induced solely through the correlated added disturbances. R is unknown and needs to be estimated.

Glacio-isostatic adjustment
Glacio-isostatic adjustment is a slowly varying process with time scales on the order of thousands of years. For the 7-year period considered, we thus model it as a spatial process (which is repeatedly observed at each time point): where k GIA is a Matérn kernel, parameterised by some unknown parameter vector Â GIA D Ä GIA ; 2 GIA ; GIA T .

Fine-scale variation
We model the fine-scale variation ı t .s/ as a stochastic volatility model as in Katzfuss and Cressie (2012) where b.s/ is a basis function used to explain some of the fine-scale variation. Because we envision fine-scale variation to be present in high-relief regions, we use terrain roughness as a basis function b.s/. The roughness function was derived from a high-pass channel image from the Moderate Resolution Imaging Spectroradiometer Mosaic of Antarctica Image map at 750 m resolution (Haran et al., 2005). The image was binned in 20 km boxes and the spread (standard deviation) of the image intensity in each box found. The logarithm of this spread incremented by 1 (to have a minimum roughness of 0), seen in Figure 3, left, was then used as the basis function for fine-scale variation.
For identifiability reasons, we assumed that the statistics of fine-scale variation are temporally invariant. The parameters Â ı D OEÁ 1 ; Á 2 T are unknown and need to be estimated.

The parameter layer
Because of identifiability concerns, we adopt a hybrid approach to parameter estimation, in which some process parameters, ‚ 1 D .R; Â GIA ; Â S /, are estimated from numerical models as in Calder et al. (2011), and the rest, ‚ 2 D .Â ı ; 2 ; Â;ˇI .s//, are estimated from the data in an MCMC scheme. The estimation procedure for the former group is described in Section 3.1. For the latter, we apply the following prior beliefs: Â ı D OEÁ 1 ; Á 2 T : These parameters describe the second-order statistics of the fine-scale variation and can take values on the real line. For this reason, we let where the hyper-parameters . Á 1 ; Á 2 ; 2 Á 1 ; 2 Á 2 / are set following a preliminary study employing a simplified process model and only one of the data sets (Section 3.2). Â : This parameter describes the extent of averaging within a neighbourhood of GRACE observations and should be positive. However, from (2), observe that P G is invariant to Â when (#iÂ) exceeds 0.9, which occurs at Â 0:14 for the observation arrangement shown in Figure 2. For this reason, we constrain Â < 0:14 by equipping it with a uniform prior where U denotes the uniform distribution and Â p D 0:14.
G : This parameter allows for variance inflation of the GRACE signal. To obtain a tractable conditional distribution, we let where IG is the inverse-Gamma distribution and the hyper-parameters are configured such that the 95 percentile q 0:95 D 400 and the 5 percentile q 0:05 D 1, reflecting our belief that we do not expect the variance inflation to be outside the range [1,400]. Suitable values arę G D 0:59 andˇ G D 2:15. ˇI .s/: The spatially distributed parametersˇ1.s/;ˇ2.s/ define the linear and accelerating change in height due to ice sheet processes, respectively. The marginal variances of these are set to be high for regions where the ice is flowing fast, and low (with a shrinkage to zero) otherwise, to reflect the fact that height change due to ice dynamics at very low speeds (< 10 m s 1 ) is unlikely. We thus let where we set sat;1 D 15 m year 1 , sat;2 D 0:5 m year 2 for one sector of Antarctica, the Amundsen Sea Embayment (which is known to exhibit large accelerations, see Scott et al., 2009) and sat;2 D 0:1 m year 2 elsewhere. Here, v.s/ is the ice velocity obtained from synthetic aperture radar data (Rignot et al., 2011a), supplemented with balance velocities (Bamber et al., 2000) where data are missing. The reconstructed velocity map used is shown in Figure 3, right. Note that although we have a spatially continuous specification for these parameters, they will be estimated following discretisation (together with the processes) on a triangulation within a Bayesian setting, see Section 2.4.

Gaussian field approximation
While convenient for modelling, Gaussian field spatio-temporal process models are not suitable for large-scale inference because the computational complexity of a required Bayesian update scales as O.m 3 / where m is the number of data points. The dependence on m 3 Figure 4. Meshes used to reconstruct the spatio-temporal processes. From left to right: (i) X I;t .s/;ˇI ;1 .s/;ˇI ;2 .s/; (ii) X S;t .s/; X F;t .s/; and (iii) X GIA .s/ is prohibitive in several applications, including the one considered here. One remedy is to project functions from the original, infinitedimensional space O onto some lower-dimensional space O n through a set of basis functions f i g i2N . A natural way to approximate Gaussian fields is by noting that if these are of the Matérn type, they can be treated as solutions to stochastic partial differential equations (SPDEs) of the Laplace type (Lindgren et al., 2011). It can be shown that the projection onto O n of such an SPDE with compactly supported basis functions yields a Gaussian Markov Random Field (GMRF) with favourable computational properties (Rue and Held, 2005). The basis functions we employ are 'tent' functions constructed on the triangulations shown in Figure 4. The GMRFs are obtained by decomposing the spatio-temporal field at each time point and the spatially distributed parameters (ˇ1.s/;ˇ2.s/) as a weighted sum of the tent basis functions, that is we let X i;t .s/ i .s/ T x i;t andˇI ;i .s/ i .s/ TˇI ;i . This decomposition allows us to rewrite the second data layer as where C t D OEA S;t S .s/ T ; A F;t F .s/ T ; A I;t I .s/ T ; A GIA;t GIA .s/ T , ı t D A ı;t ı t .s/ and the multivariate spatio-temporal model (ignoring GIA for now) is where I is the identity matrix of appropriate size. Here, A S Á A F describe the temporal propagation of the processes and are diagonal under the local separability condition assumed.ˇI ;1 andˇI ;2 are discretised versions ofˇI ;1 .s/ andˇI ;2 .s/, and w i;t N c .0; Q w;i / is added random forcing, where N c .h; Q/ denotes the normal distribution with canonical parameters h and Q. For the surface field, Q w;S (which describes heterogeneous fields) is obtained by assuming that w S;t is the discretised solution to an SPDE with spatially varying parameters, as discussed in detail in Zammit-Mangion et al. (2014) and Lindgren et al. (2011) Section 3.2. The approach has the neat property that, locally, w S;t can be interpreted as a discretised solution to a standard Matérn field. Details on how we obtain the spatially distributed parameters from RACMO are given in Appendix A.1. Following coregionalisation between the surface and firn fields, jointly we have that Finally, as discussed in 2.2.1, we set Q w;I diagonal, with precisions given by 1=0:01 2 .
We next combine the auto-regressive processes by defining The multivariate process can then be written in block form as The full process describes Gaussian trajectories, so that we can represent the third layer as a GMRF over .x;ˇI / where x D OEx T 0 ; : : : ; x T T ; x T GIA T . Let ‚ D f‚ 1 ; ‚ =ˇI 2 g, that is the set of all unknown parameters exceptˇI . Then, The full precision structure Q is obtained by considering the joint latent process, which decomposes as follows: p.x;ˇI j‚/ D p.x 0WT ; x GIA ;ˇI j‚/ (26a) D p.x 0WT jx GIA ;ˇI ; ‚/p.x GIA jˇI ; ‚/p.ˇI j‚/ (26b) D p.x 0WT jˇI ; ‚/p.x GIA j‚/p.ˇI j‚/ (26c) p.x t jx t 1 ;ˇI ; ‚/ 1 A p.x 0 jˇI ; ‚/p.ˇI /p.x GIA j‚/ where (26c) follows from (i) x 0WT being conditionally independent of x GIA given .ˇI ; ‚/ and (ii) x GIA being conditionally independent of I given ‚. Equation 26d follows fromˇI being independent of ‚. The prior p.ˇI / D N c .0; Qˇ/ (where Qˇfollows directly from (20)), while the initial process distribution is defined as We choose the precision Q 0 WD Q w A T Q w A, and not that of the respective stationary distribution, as it is sparse for sparse Q w and diagonal A, thus preserving computational tractability. Further, the implied covariance matrix is representative for diagonal A and Q w ; in particular, if A D cI for c 2 . 1; 1/, the induced covariance is the same as that of the stationary distribution (see Hamilton, 1994, Section 10.2). It can be shown that for this model the full precision matrix over .x;ˇI / is where Q 1 is the precision matrix without exogenous inputs (see Rue and Held, 2005, Ch. 1 for a univariate equivalent) Q 1 D 2 6 6 6 6 6 6 6 6 4 : : : and Q 2 ; Q 3 are defined as Q 2 D 2 6 6 6 6 6 6 4 Because of the tractability of the linear state-space model, we have combinedˇI with x, although the former are 'parameters' in the conventional sense. Because we have represented .x;ˇI / jointly, by concatenating across time we can rewrite the observation and process layers as follows:

Estimation of parameters from geophysical models
Because we have three classes of data (altimetry, gravimetry and GPS) and four processes to infer, the problem is under-determined, rendering process-related parameter estimation from observations particularly difficult. We hence make use of numerical (geophysical) models for estimating the process parameters, namely ‚ 1 . Specifically, denote the vector of physical observations as z and numerical model output as x M (we use the symbol x to emphasise that we treat numerical model outputs as directly observed data). x M is the output of a deterministic model g. / run with suitably defined parameters and boundary conditions BC . The first assumption we make is that the true (unobserved) process x and x M D g. ; BC / are conditionally independent, which we model as p.
x; x M jÂ/ D p.xjÂ/p.x M jÂ/ for some parameter Â 2 ‚ 1 , see Figure 5. The second assumption is that x M is highly informative on Â so that we can safely restrict the posterior parameter density to a point estimate:

Markov chain Monte Carlo implementation
In order to carry out inference over x and ‚ 2 , we need to complete the model specification by eliciting prior beliefs over Â ı . We do this by using information from the log-likelihood of these parameters, evaluated at several points on a much simplified model. The log-likelihood reveals that that Á 1 and Á 2 are highly correlated because of the large flat surface in the continent's interior; we therefore also transform these parameters to aid mixing. Details for this step are given in Appendix B.
The sheer size of the problem under study devotes special attention. The combined dimensionality of .x;ˇI / is over 80 000, we have over 300 000 observations, crucially some of which have a large spatial support. The time taken to compute the Cholesky factor of Q following an approximate minimum degree permutation, and the marginal variance using the SuiteSparse package (Davis, 2011), required about 5 h on a mid-end server. This is prohibitive even for the vanilla implementation of the popular integrated nested Laplace approximations (INLA, Rue and Martino, 2009) package. We deal with this by implementing a scheme that can be parallelised and implemented on a distributed system.
We divide the state-space x 0 into d contiguous spatial blocks as in Zammit-Mangion et al. (2014). Denote the indices of the ith subset of x 0 , x 0i , as I i and let I D S i I i . The conditional posterior distribution for the i th block is given by where Q ij ; i; j 2 OE1 : : : d is the matrix composed of the rows in Q with indices in I i and the columns of Q with indices in I j and where C 0i is the matrix composed of the columns in C 0 with column indices in I i . Here, Q eı D Q 1 e C Q 1 ı 1 , Q e D prec.e/ and Q ı D prec.ı/.
In this work, we extend the approach by blocking with a view to parallelisation over a multinode architecture. Parallelisation is possible in Gibbs sampling if one can find groups of fx 0i g that are independent when conditioned on the rest of the state space and the data. Groups because groups within this set, say J j , can then be updated in any order desired (or, indeed, even in parallel) for a given deterministic sweep strategy. As an example, consider the following scenario, depicted in Figure 6, left. Here, we have divided the entire state space into a raster of 8 8 blocks. Assume each block is conditionally independent of every other block given the neighbouring blocks and the data, then we can obtain a sample of x 0 by sampling all nonadjacent blocks in parallel. For this simple case, we only need four cycles, one for each block set J 1 : : : J 4 ; a potential speed-up of 64=4 D 16 is achievable. More generally, a simple parallel Gibbs sampling proceeds as follows: first, instruct computational nodes to sample from blocks with index in J 1 in parallel, wait for the cycle to complete, update and distribute the latest sample and repeat for J 2 ; : : : ; J n j . This sampling strategy is known as chromatic sampling (Gonzalez et al., 2011), because the assigning of conditional independent groups is akin to the problem of graph colouring (Molloy and Reed, 2002): the problem of finding a colour arrangement such that no two neighbours in a graph have the same colour.
We cannot employ a simple raster blocking method and a four-colouring scheme (which is possible for all graphs on a plane, see Gonthier, 2008) for two reasons. First, the key drawback with raster partitioning is the unequal load distribution when employing variable mesh density (as in this problem). Blocks of equal size will, in general, contain drastically different numbers of vertices. This, in turn, will cause computer nodes in a high-performance environment to 'wait' unnecessarily for the block containing the largest number of vertices to be completed before the next cycle can begin. We remedy this problem by dividing the graph into subgraphs of roughly equal size using the Kernighan-Lin algorithm (Kernighan and Lin, 1970). In this study, we use the implementation by Ladroue (2011) seven times in order to subdivide the graph into 128 roughly equal-sized partitions. The final partitioning scheme we used in our problem is shown in Figure 6 (right): with this blocking strategy, we obtained a median block size of 609 vertices with a minimum of 590 and a maximum of 750, which is indicative of a good load balance.
The second problem is that, conditioned on the data, the adjacency graph over the blocks is no longer a sufficient constraint for conditional independence. This is made apparent from the second term within the brackets of (33b). This term implies that because of dense rows in C 0 , O x 0i may even depend on states that are in blocks not immediately adjacent to block i . Thus, when colouring the graph, in addition to the constraint that no two adjacent blocks have the same colour, we have to apply the constraint that no observation can be informative of more than one block of a given colour. This is particularly relevant to the present problem because GRACE has a support that spans multiple blocks. We coloured the graph using a greedy algorithm on a breadth-first-search ordering of the vertices. The final arrangement with the lowest amount of colours, 15 in this case, is shown in Figure 6, right.
We grouped the GIA field on its own (i.e. assigned it a colour of its own) because it is highly spatially correlated. With 1832 vertices, the GIA block is about three times larger than any other block so that the theoretical maximum speed-up is thus 129=.15 C 3/ D 7:2. In practise, we saw a fivefold speed-up over use of a single thread when sampling the state space only, which is remarkably close given the communication overhead, notable in this case because of message passing of the updated samples. Sampling of all the other parameters (‚ =ˇI 2 , see Section 3.3) needs to be done sequentially; the final speed-up we obtained for one whole MCMC cycle was of about three; however, this is still significant given the time needed to generate one sample. Moreover, this approach extends nicely to larger state spaces, if required. We generated 16 000 samples on a high-performance computer (having 16 2.6 GHz SandyBridge cores per node and 4 GB memory per core) using four nodes and 24 parallel threads with RMpi in 42 h.

Sampling the parameters and diagnostics
Following an updated sample of x andˇI , we sample from the conditionals of the remaining parameters, defined as follows where n G is the number of GRACE observations, Q eG D prec.e / (recall that e is the error prior to inflation), C 0 G is the submatrix of C 0 corresponding to the GRACE observations and C 0 A is that corresponding to the altimetry (ICESat and EnviSat). Note that (34c) is independent of Â and as a result of the assumption f j ı .s/ D 0 for the GRACE observations in (5). From this note that only 2 can be sampled from directly. For Â and Â ı , we initially carried out a Metropolis-within-Gibbs step by proposing from a Gaussian distribution centred at zero with variance 0:1 2 . However, the resulting effective sample size from this scheme was remarkably low, prompting us to use univariate slice sampling instead (Neal, 2003), which considerably decreased autocorrelation. We implemented the stepping-out methods and adapted the interval width used for the first 50 samples, following which the interval was fixed. These samples were then discarded.
Visual inspection of the traceplots confirm good mixing of the chains. Geweke diagnostics and autocorrelation plots on these traces also support convergence. To assess convergence in x andˇI , we took 2000 variables at random from x 0 and carried out Geweke diagnostics on each of the traces. The z scores from the diagnostics were then verified to be normally distributed (expected under the assumption of trace independence, a reasonable one given the continent's size and correlation lengths). Geweke diagnostics were carried out using the R package coda.

VALIDATION, VISUALISATION AND RESULTS
In this section, we summarise the main inferential results. In Section 4.1, we test the model using an independent data set and use the results to expose some limitations of the approach, while in Section 4.2, we focus on the marginal fields. Finally, in Section 4.3, we concentrate on mass balance estimates, that is we determine where and by how much Antarctica is losing or gaining mass. In all Antarctica plots presenting our results, we use bilinear interpolation to reconstruct the mesh surface, overlay the triangulation to convey the native process resolution and stipple with green dots where the absolute posterior mean is greater than one posterior standard deviation.

Diagnostic check using a separate remote sensing data set with pivoted Cholesky residuals
In this section, we check the model using data from an altimeter on EnviSat's predecessor, the European Remote-sensing satellite-2 (ERS-2). Because of process smoothness, we compute residuals that are 'corrected' for correlation through the pivoted Cholesky residuals, which are derived from the covariance matrix of the residuals (Bastos and O'Hagan, 2009). The 'pivot' in the pivoted Cholesky decomposition of the matrix sorts the residuals as follows: the first element is that with the highest variance, the second element is the largest variance conditioned on the first element and so on. In a spatio-temporal setting, locations in the vicinity of each other should be far apart in the ordering, even if they have similar marginal variances, that is the variance of the first point in a close pair should be low when conditioned on the second point if correlations are correctly captured. The advantage of this approach is that proximity and spatio-temporal correlation are taken into account when visualising the residuals. As such little correlation should be visually apparent when inspecting plots of the pivoted Cholesky residuals (as opposed to when visualising the standardised residuals). We carry out this diagnostic check by choosing 500 validation points from the ERS-2 data. The pivoted Cholesky residuals are seen in the centre panel of Figure 7 where only some correlation is apparent, indicating that the chosen spatial structure of the model is not implausible. A pivoted Cholesky residual plot can be put into perspective by carrying out a 'Turing test'. This involves sampling validation data points (at the space-time locations) from the predictive distribution, re-evaluating the pivoted Cholesky residuals for this pseudo data and comparing the resulting plots to that of the true data. This informal test checks whether we can spot the real-data case from the pseudo-data case. Overall, from Figure 7, we find it relatively easy to pick out the pseudo-data cases from the real-world case, the latter exhibiting more extreme residuals, representative of overconfidence. A density plot of the pivoted Cholesky residuals (not shown) indicates that the residuals are centred at zero but also confirms an overall overconfidence. It is not clear whether this is a result of model mis-specification or an issue with the residuals in the ERS-2 data set; however, it does hint at the possibility of our posterior estimates being slightly overconfident.

Posterior fields
In Figure 8, we show results for the mass rates due to ice dynamics for the 7 years under consideration together with uncertainty estimates. These results corroborate the spatial results of Zammit-Mangion et al. (2014) where only West Antarctica was considered. The most prominent features include the negative, accelerating rates in the Pine Island Glacier and Thwaites Glacier (inset) and an overall ice build-up (with no significant acceleration) in the Kamb area (inset). In East Antarctica, several expected features are also apparent, most notably the considerable mass loss occurring in the Totten Glacier (Velicogna and Wahr, 2013, inset) and areas surrounding the Frost glacier (Rignot, 2006), both of which are grounded below sea level and are hence considered potentially unstable (for basin definitions and landmarks, see Figure 9). Some glaciers, such as Shirase glacier towards the top of the plot (reported to be largely in equilibrium by (Pattyn and Derauw, 2002)), are shown to be thickening. We note that glacier thickening at large rates is physically improbable (except in the Kamb area, Ng and Conway, 2004) and that here there could be confounding between changes due to ice dynamics and persistent positive local precipitation anomalies. We omit changing height thickness over the ice shelves in the plots, as this is not a contributor to SLR.
In Figure 10, we provide a comparison of posterior estimates for height changes due to surface processes and those provided by the regional climate model, RACMO. When considering aggregates (right), the positive-negative trend is clearly followed by RACMO although, for East Antarctica, a bias is evident, possibly because of a systematic underestimation of the snowfall anomaly by RACMO for 2004(van Wessem et al., 2014. It should be noted that East Antarctica is one of the most poorly sampled places on the planet in terms of meteorological observations and therefore also relatively poorly constrained in RACMO. Although in most cases outside our 95% credibility intervals, we find it encouraging that the shape and phase of the anomaly patterns could be reconstructed. Essentially, this result shows that RACMO reconstructs the temporally high frequency, spatially mid-frequency signals evident in satellite altimetry data but with differences in the amplitude of the signal. In Figure 11, we provide our reproduction for the GIA field and compare it to a forward numerical model, IJ05R2, which we used to set the parameters on x GIA . Notably, as in IJ05R2, we observe a positive trend (uplift) of 2-4 mm year 1 GIA over West Antarctica and an overall negative signal (subsidence) towards the pole and over much of East Antarctica. The main discrepancy apparent in Figure 11 (right)  (Fretwell et al., 2013). Height changes over ice shelves (enclosed between the dashed and solid lines and grayed out in the insets) are omitted as these do not contribute to sea-level change is the large positive rate observed in our results in basins 1, 17-19, which correspond to the upper part (in the figure) of the East Antarctic ice sheet. We caution that GPS readings used here have not been corrected for the elastic response of the lithosphere and that we are using release 1 of the GRACE mass-concentration data set (work on a second release is underway). Consequently, the GIA results (and, to a lesser extent, the SLR contribution estimates of Section 4.3) are not definitive and presented as illustrative of the framework capability.

Estimates of sea-level rise contribution
Multivariate spatio-temporal modelling allows us to discriminate between the processes, in order to obtain mass-balance estimates from assumed density profiles, the same profiles that are used to translate the observed height changes to the mass changes observed by GRACE. As discussed in Zammit-Mangion et al. (2014), several authors have provided SLR contribution estimates for Antarctica. Here, we compare to the GRACE-based method of Luthcke et al. (2013) (where the IJ05R2 model is subtracted from the GRACE signal) and the so-called input-output (IO) method, where mass balance is found by deducting the rate of ice discharge (obtained through satellite readings of horizontal ice velocity) from the surface mass balance, obtained from a regional climate model, in this case RACMO. We obtained these latter results from Shepherd et al. (2012), which are similar to those of Rignot et al. (2011a) but updated to include more recent data, particulary for measured ice thickness. We note that the mass imbalance is not strictly equivalent to SLR contribution when accounting for grounding line migration (Rignot et al., 2011a), and thus, we can expect results from the IO method to be slightly more negative (although it is unclear whether the IO results had been corrected for this or not). To use similar basin definitions as these studies, we have allocated the Foundation Glacier basin to the East Antarctic ice sheet, see Figure 9.
Estimates of SLR contribution for the West, East Antarctic ice sheets and the Antarctic Peninsula are given in Figure 12. Again, there is a broad agreement between the results. In the West Antarctic ice sheet, the positive anomaly in 2005 is clear for the three methods, as is the apparent negative trend in the latter years of the study. For the East Antarctic ice sheet, the conclusions are similar, in that the overall positive-negative pattern along the years is preserved, despite our solutions being temporally more smooth, probably because of the imposed auto-regression coefficients on the surface process prior which are mostly positive. The pattern we observe in the Antarctic Peninsula is reproduced in the other solutions; however, the large mass loss indicated in the other methods is not within our credibility intervals. This Figure 12. Mass-balance estimates (black) compared to those from the Gravity Recovery and Climate Experiment-based method of Luthcke et al. (2013) (green) and the IO method (pink) as reported in Shepherd et al. (2012). The 1 and 2 levels are denoted by the dark and light shadings, respectively. Left: West Antarctic ice sheet. Centre: East Antarctic ice sheet. Top right: Antarctic Peninsula. Bottom right: Estimated cumulative mass change throughout the period of study (black) compared to the estimate obtained when averaging over multiple methods (Shepherd et al., 2012). could be due to an optimistic prior belief in the definition overˇI or a result of inaccurate representation of the field by the chosen mesh. It could also be due to overestimation of mass loss in the other methods for a region that is particularly challenging for both GRACE and the IO method because of the small scale at which ice dynamic takes place. Finally, we compare the cumulative mass change obtained using our method with Shepherd et al. (2012), which is an unweighted mean estimate (using a meta-analysis) from three approaches (GRACE-based methods, IO-methods and altimetry + RACMO methods) and multiple data preprocessing methods. Our cumulative mass change follows the unweighted mean estimate but is less negative. We note, however, that a more recent version of the RACMO model, v2.3, produces a more positive snowfall rate by over 100 Gt year 1 compared with the version used in Shepherd et al. (2012) (van Wessem et al., 2014. This difference would increase the integrated IO method results by the same amount, resulting in a lower mass loss rate for the unweighted mean in Shepherd et al. (2012). Our estimated SLR contribution of Antarctica during this period (2003)(2004)(2005)(2006)(2007)(2008)(2009)) is of 0.13˙0.08 mm year 1 . This lies, as expected, between the the ICESat-based estimate of 0:06˙0:23 mm year 1 and the GRACE-based estimate of 0:16˙0:14 mm year 1 as reported in Shepherd et al. (2012) between October 2003 and December 2008; however, it is considerably less than the GRACE-based method of Luthcke et al. (2013), which reported 0.230 .07 mm year 1 for 2003-2010. The discrepancy with the latter is possibly due to the inclusion of 2010, a year in which the West Antarctic ice sheet has continued to witness increased ice loss (McMillan et al., 2014).

CONCLUSION
In this paper, we have shown that spatio-temporal modelling is a powerful tool in assessing Antarctica's mass balance and contribution to SLR. Unlike other methods, its use of forward numerical models is up to the assumption of conditional independence (used only to estimate parameters) and is thus less restrictive and bias-prone than standard data assimilation procedures. Overall, we find that our results conform quite well with those making explicit use of numerical models, indicating that this approach may be used in the future to validate the models with remote sensing data. This is particularly useful in Antarctica where in situ data are difficult and sometimes impossible to obtain.
There are several ways in which the framework can be improved. (i) More work needs to be done on mesh generation, with special care taken in high-relief and narrow areas such as the North Antarctic Peninsula. A coarse mesh, in these regions, constitutes an inaccurate model representation, which may be the cause of the optimistic results obtained in this region. (ii) A comparison with the auxiliary data set from ERS-2 revealed that the fields are unbiased and that the modelled spatial correlation is suitable. However, an overall overconfidence was observed, which might be the cause of the relatively narrow credibility intervals seen in Section 4.2. This could be a result of model mis-specification, but it is also possible that the supplied errors with the preprocessed data are, in fact, optimistic. (iii) Process parameters are treated as fixed and assumed to be fully informed by geophysical models. In an MCMC scheme, parameter uncertainty can be easily included, although this would require the provision of posterior densities over the unknown parameters. To remedy this, one could still assume conditional independence but not restrict the posterior parameter density to a point estimate as we do here. (iv) The assumption of imposed (local) spatio-temporal stationarity might be a fair assumption for some of the processes considered here; however, this does not hold for ice dynamics, which is directional and exhibits spatio-temporal interactions (Hurkmans et al., 2012). It would be possible within the current framework to consider a simplified transport model for this process (e.g. Wikle et al., 2001) although this will come at some computational expense.
The key novelties of this work lie in the level of complexity and sheer size of the spatio-temporal model employed, the way in which numerical models were incorporated with large-scale multivariate spatio-temporal systems and the tools developed for inference over multiple-process architectures. Because the work is multifaceted, we have implemented an R package MVST that facilitates most operations described in this paper and provides a simplified analysis of that described here as a vignette. One notable useful feature of this package is the ease with which multiple observations, with spatially varying observation operator, can be considered with various types of multivariate spatio-temporal systems, including those decomposed on triangulations such as those considered here. It is envisioned that this package can be used to help set-up latent Gaussian multivariate spatio-temporal models in a variety of settings. Once the Gaussian model is set-up, it can be passed on for inference with other packages, such as INLA, although we also provide code for the elementary Gaussian case.