We present a set of idealised model experiments that investigate the impact of assimilating different amounts of ocean and atmosphere data on decadal climate prediction skill. Assimilating monthly average sub-surface temperature and salinity data successfully initialises the meridional overturning circulation and produces skillful predictions of global ocean heat content. However, when sea surface temperature data is assimilated alone the predictions have much less skill, particularly in the extra-tropics. The upper 2000m temperature and salinity observations currently provided by the Argo array of floats are therefore potentially well suited to initialising decadal climate predictions. We note however that we do not attempt to simulate the actual distribution of Argo floats. Assimilating data beneath 2000m always reduces the RMSE, with the most significant improvements in the Southern Ocean. Furthermore, assimilating six hourly atmospheric observations significantly improves the forecast skill within the first year, but has little impact thereafter.
 Decadal climate prediction aims to predict natural internal variability in addition to the response of the climate system to anthropogenic forcing. In order to achieve this it is necessary to start from the current state of the climate system. So called ‘perfect model’ predictability experiments have been carried out to assess potential predictability on decadal timescales [e.g., Griffies and Bryan, 1997; Grötzner et al., 1998; Boer, 2000; Collins and Sinha, 2003; Pohlmann et al., 2004]. Most of these studies found multi-decadal predictability in the meridional overturning circulation (MOC) and/or the sea surface temperatures (SST) in the North Atlantic, Nordic Seas and the Southern Ocean. The MOC has been linked to Atlantic Multi-Decadal Variability which is thought to influence many regional climate phenomena, including North American and European summer climate, North Eastern Brazilian and African Sahel rainfall and Atlantic hurricanes [Sutton and Hodson, 2005; Knight et al., 2006; Zhang and Delworth, 2006]. Unlike perfect model experiments, real decadal forecasts do not have instantaneous knowledge of all climate variables at all locations and so we need address the forecast skill given more realistic observations.
Zhang et al.  show the potential of current sub-surface ocean observations to initialise the MOC, however they do not assess the forecast skill. Initialised decadal predictions have recently been made, but with conflicting results. Smith et al.  predicted that natural variability would offset the anthropogenic warming signal in global surface temperature from 2005 to 2009, after which temperature would continue to rise. In contrast, Keenlyside et al.  forecast no increase in global average temperature over the next decade (taken to mean 2005 to 2015 from their study) and that North Atlantic SST, European and North American surface temperatures will cool over the same period. These studies use different amounts of ocean and atmosphere data for initialisation. Smith et al.  assimilate both sub-surface temperature and salinity (T&S) and six hourly atmospheric observations, whereas Keenlyside et al.  use SST, and no atmospheric data. A third study [Pohlmann et al., 2009] uses sub-surface T&S observations, but no atmosphere data.
 Assessments of the relative merits of different decadal prediction systems via real-world hindcasts are hampered by a number of factors. In the last decade the Argo array [Roemmich and Owens, 2000] of ocean profiling floats has led to a step-change in the number and coverage of sub-surface T&S data. Since 2007 over 3000 operational floats have provided near global coverage of T&S profiles to depths of up to 2000m. It is therefore reasonable to expect that forecasts made today will be more accurate than hindcasts used to assess decadal climate prediction systems. Assessment of skill in decadal predictions is further complicated by the background global warming trend, model bias and unpredictable natural forcings. In order to overcome these difficulties we perform a set of idealised model experiments and use these to compare the relative merits of the different decadal prediction systems.
 In section 2 we describe our experimental setup, section 3 presents the results during the assimilation phase, in section 4 we compare the skill of the forecasts and we discuss our results and present conclusions in section 5.
2. Experimental Set-up
 Our experiments are based on the control integration of HadCM3 [Gordon et al., 2000], which has no inter-annually varying external forcings. We base our experiments around a 50 year section to mimic the typical period of real-world hindcasts. This particular section was not chosen for any specific reason, other than it qualitatively represents an average period with typical MOC variability.
 Our primary focus is to assess the impact of assimilating different amounts of data on decadal forecast skill. The Argo array now provides global coverage of T&S data to depths of up to 2000 m. Important questions are: how does this compare with assimilating full-depth T & S data; and how does this compare to only assimilating SST? To address these questions we performed four experiments:
 1. ‘Full Depth’ - ocean T&S are assimilated to full depth, together with atmospheric variables as by Smith et al. .
 2. ‘2000m’ - as Full Depth but only to depth of 2000 m.
 3. ‘2000m NoAt’ - as 2000m but no atmospheric assimilation.
 4. ‘SST-6hr and SST-96hr’ - SST is assimilated with 6 or 96 hour relaxation constant and no atmosphere assimilation.
 In all experiments we start assimilation runs with initial conditions from a point 150 years into the model future. This ensures that the experiments initially have no information regarding the phase of decadal natural variability before data assimilation.
 All experiments assimilate complete monthly fields of ocean data and, where appropriate, six hourly average atmospheric variables (surface pressure, 3 dimensional u and v winds and potential temperature) from the HadCM3 control run output. For the three experiments that assimilate ocean sub-surface information we follow Smith et al.  with a globally uniform relaxation constant of six hours. For the SST experiments we follow the assimilation procedure of Keenlyside et al.  (described further in section 3).
3. Assimilation Phase
 During the assimilation phase we calculate the MOC at a depth of 1000m and a latitude of 30° N (location of maximum overturning in HadCM3 control run) for each of the experiments and plot timeseries in Figure 1. The Full Depth experiment agrees with the original control run (the ‘truth’) very well. This illustrates that assimilating monthly T&S and atmospheric observations is sufficient to successfully reproduce the MOC. The 2000m experiment also shows a high level of agreement with the truth, shown both by the five year smoothed timeseries and the dashed line which shows the annual MOC. The MOC timeseries for the 2000m NoAt experiment shows that in the absence of wind forcing provided by atmospheric assimilation the annual MOC timeseries is not reproduced faithfully. However, crucially the low frequency (five year smoothed) MOC follows the truth well.
 We explore different methods of assimilating only SST. We first attempted to assimilate SST in the same manner as our other experiments (i.e., globally and with six hourly relaxation). After just a year of assimilation the MOC had increased by over 7 Sv (not shown) and continued to increase to over 35 Sv. This is due to ocean stratification at high latitudes where cold surface water lies above warmer deep water. At these latitudes assimilating a cold anomaly, and not adjusting salinity, induces a strong sinking. The resulting mixing brings warmer water to the surface. At the next assimilation timestep an even stronger cold anomaly needs to be applied, resulting in positive feedback and unrealistic density anomalies.
Keenlyside et al.'s  SST assimilation is carried out in a latitude dependent manner. Uniform strength assimilation is carried out between 30° S and 30° N with six hourly relaxation. Between latitudes of 30° and 60° the relaxation factor is reduced linearly, reaching zero at 60°. At higher latitudes there is no data assimilation and so the model remains freely coupled. When we implemented this scheme into our model we found a more modest MOC increase of 5 Sv during the first couple of years (Figure 1b). A similar result was obtained with different initial conditions, showing that agreement between different initialisations does not necessarily imply skill, but could be caused by a robust, but incorrect, response to assimilated data.
 Another SST assimilation experiment was carried out using a weaker (96 hour) relaxation constant and again two initial conditions were used. This avoids the spurious spin-up of the MOC. However, these two SST assimilation experiments neither follow the truth nor each other. For approximately the first eight years, they follow the trajectory that the MOC would have taken when no assimilation has taken place. We use the initial conditions provided by the first SST assimilation runs (labelled ‘v1’) to initialise the forecasts described in the next section.
 Using initial conditions provided by the assimilation runs we start forecasts from seven start dates. The dates were chosen to sample a range of different MOC initial states. Each forecast is a nine member ensemble created by a small random perturbation (5 × 10−4 K) applied to each initial SST grid-point and is run for 16 years.
4.1. Predictability of the MOC
 For each of the forecast start dates we calculate the ensemble mean annual MOC and plot the resulting five-year smoothed MOC timeseries in Figure 2. The Full Depth, 2000m and 2000m NoAt show a similar level of skill in predicting the evolution of the MOC, as illustrated by the small spread in the root mean squared error (RMSE) (Figure 2) averaged over all start dates and lead times. However, both SST experiments fail to capture the correct evolution, resulting in a much larger RMSE than the other experiments (2.26 and 0.88 compared to 0.4 Sv). The lack of skill in these forecasts clearly arises from the poor initial conditions shown in Figures 2d and 2e.
4.2. Forecasting Regional Ocean Heat Content
 Forecast skill is further assessed by examining regional forecast errors. We show the RMSE of ocean temperature forecasts in the upper 360m, although similar results are obtained for anomaly correlations. By averaging over the upper 360m we improve the signal-to-noise of the statistics, however qualitatively similar results are obtained for SST.
Figure 3 shows the RMSE as a function of lead time for the global ocean, and the Pacific, north Atlantic (north of the equator) and Southern (south of 30° S) oceans separately. The RMSE is computed as the average over all start dates of the spatial RMSE for a given region, calculated from the ensemble mean five-year average forecast errors at all grid points within the region. The spatial average of the temporal RMSE computed over all start dates at each grid point gives very similar results (not shown). We also plot the RMSE obtained by persisting the five year anomaly immediately prior to each start date. 5–95% confidence intervals on the difference between each experiment and the 2000m experiment are shown, such that when the 2000m experiment (green line) is inside these intervals the difference in skill is not significant at the 5% level. The intervals are calculated by bootstrap resampling of the forecast errors, using both ensemble members and start dates. This approach assumes independence in the forecast errors and so these intervals are slightly narrower than those which take account of the serial correlation. However, this is unlikely to affect the major conclusions of this work.
 The 2000m NoAt experiment is not shown in Figure 3 because it is never significantly different to the 2000m experiment in predicting five year means. However, annual mean forecast errors (not shown) for the 2000m experiment are significantly more skillful than the 2000m NoAt experiment for the first year. The Full Depth experiment has a smaller RMSE than the other experiments in all ocean basins, with the most significant improvements in skill in the Southern Ocean. In the North Atlantic the SST experiments are initially less skillful than persistence. In the Pacific Ocean the SST experiments both show skill above persistence.
5. Discussion and Conclusions
 Assimilating atmospheric variables produces significantly improved skill in forecasting ocean heat content in year one, but thereafter there appears to be little benefit. This likely follows from the fact that assimilating monthly ocean variables alone was sufficient to capture the low frequency variability of the MOC (Figure 1). However, atmospheric assimilation may give improved skill in situations where the sub-surface information is less well constrained than it is in these idealised experiments.
 At all forecast lead times, assimilating SST alone has significantly lower skill than the experiments that assimilated sub-surface temperature and salinity. In particular the SST experiments appear to introduce systematic errors in the North Atlantic with the result that they never achieved skill significantly above that of a persistence forecast. We find a similar situation initially in the Southern Ocean, although performance is better in the Pacific Ocean. We note that we have only used a single climate model in this study and that both the mechanisms of multi-decadal variability and the sensitivity of ocean-atmosphere teleconnections between high-latitude and tropical oceans could be different in other models. Furthermore, the use of model rather than observed SST data could also explain some of the differences between our results and those of Keenlyside et al. .
 Our results suggest that SST assimilation schemes suffer from two undesirable outcomes. If SST alone is assimilated globally, or the latitude dependent scheme of Keenlyside et al.  is adopted together with relatively strong relaxation, then the MOC spins up to spuriously high levels (as discussed in section 3). This can be avoided with a weaker relaxation, but then the variability of the MOC is not initialised. Either way the decadal predictions have poor skill in the extra-tropics. The implication is that sub-surface T&S data are necessary to initialise decadal climate predictions. However, it may be possible to generate T&S anomalies by forcing an ocean only model with observed atmospheric winds and buoyancy fluxes [Griffies, 2009]. These anomalies could then be assimilated into a coupled model to generate initial conditions. Initial experiments with this approach appear to show some promise (D. Matei et al., manuscript in preparation, 2009), and should be investigated further with idealised experiments.
 Assimilation of monthly average temperature and salinity in the upper 2000m produces forecasts with similar skill to full depth assimilation. Furthermore, there is remaining skill in ocean heat content, particularly in the North Atlantic and Southern Ocean after 10 years. This is encouraging as it suggests that the data currently provided by the Argo array has the potential to successfully initialise decadal predictions. However, this needs confirming with further experiments using pseudo observations sampled at actual Argo locations. We note that more sophisticated assimilation methods [e.g., Stammer et al., 2002; Zhang et al., 2007] could potentially produce additional skill to that of the simple nudging technique used in this work.
 Assimilating data beneath 2000m always reduces the RMSE, with the most significant improvements in the Southern Ocean. This is present from year one of the forecast (not shown), suggesting that the improved skill is most likely caused locally, probably due to the strong upwelling in the Southern Ocean which can quickly bring the deep ocean into contact with the surface layers. This could be a location where observations deeper than that currently provided by the Argo array may be useful in the future.
 This study is highly idealised in that we have assimilated perfect and full-field data as our pseudo observations. This introduces two simplifications; firstly that there is no model bias and secondly that observations are available everywhere. In particular, observations near boundaries could be important for initialising the MOC [e.g., Kanzow et al., 2007]. As such our experiments compare the relative skill of assimilating different types of observations, but do not necessarily show the skill to be expected in reality. Future experiments that sub-sample the ocean observations to more realistic distributions, and simulate model bias, are needed.
 This work was supported by the Joint DECC and Defra Integrated Climate Programme - DECC/Defra (GA01101). We would like to thank both reviewers for their very helpful comments.