Impact of Proxies and Prior Estimates on Data Assimilation Using Isotope Ratios for the Climate Reconstruction of the Last Millennium

Climate reconstructions by data assimilation need to accommodate for sensitivities to proxies and prior estimates because models are uncertain and proxies are spatiotemporally limited. This study examines these sensitivities using multiple climate model simulations and different combinations of proxies (i.e., corals, ice cores, and tree‐ring cellulose). Experiments were conducted using an offline data assimilation approach; the results showed annual variations in the global distribution of surface air temperature and precipitation amount from 850 to 2000. Standard deviations of surface air temperature and precipitation amount during the entire period differed by up to 36% due to prior estimates. Experiments with different types of proxies showed that the El Niño‐like distribution of positive anomalies in the central to eastern tropical Pacific may only be adequately reproduced in experiments with corals, and not experiments without corals. The correlation coefficient of the NINO.3 index from 1971 to 2000 between experiments with corals and the Japanese 55‐year Reanalysis (JRA‐55) was 0.81 at maximum. By contrast, the correlation coefficient between experiments without corals and JRA‐55 was a maximum of 0.19.

This study uses multiple climate model simulations based on the method proposed by Okazaki and Yoshimura (2017). Annual variations in climate variables were reconstructed over the past 1000 years, and differences due to prior estimates were investigated. We also tested the impact of proxies with experiments using different proxy combinations. In this data assimilation method, reconstructed fluctuations were dependent on the impact of proxies as prior estimates were constant throughout the entire target period. To improve climate reconstruction by data assimilation, the impact of proxies needs to be evaluated alongside differences due to prior estimates. This paper was structured into four additional sections. Section 2 introduces the models, proxy data, climate data sets, data assimilation, and experimental design. Section 3 presents the results of proxy data assimilation. Section 4 discusses differences by climate models and the impact of proxies, while Section 5 presents the conclusions.

Models
To validate the influence of differences due to prior estimates, we used two isotopes-incorporated atmospheric general circulation models to make the prior estimates. One is the isotopes-incorporated model for interdisciplinary research on climate (MIROC5-iso; Okazaki & Yoshimura, 2019) based on the atmospheric component of MIROC5 (Watanabe et al., 2010). The other is the isotopes-incorporated global spectral model (IsoGSM; Yoshimura et al., 2008) based on the global spectral model (GSM) of the Scripps Experimental Climate Prediction Center (Kanamitsu et al., 2002). For climate model simulations, the sea surface temperature (SST) and sea ice data sets were utilized; these were provided by the historical run of the Coupled Model Intercomparison Project Phase 5 (CMIP5; Taylor et al., 2012). One of the data sets was from the atmosphere-ocean coupled version of MIROC5, while the other was from the Community Climate System Model (CCSM4; Gent et al., 2011). The simulation period was 130 years from 1871 to 2000 (129 years from 1872 to 2000 in MIROC5-iso); the method for making the prior estimates is detailed in Section 2.4. This study utilized proxy models for corals (Liu et al., 2014) and tree-ring cellulose (Roden et al., 2000). Isotope ratios of corals and tree-ring cellulose were calculated from climatic variables of climate model simulations. In Okazaki and Yoshimura (2019), proxy models were evaluated from 1950 to 2007. They showed that annual variations of the isotope ratios of corals and tree-ring cellulose were well simulated, while annual variations of the isotope ratios of ice cores were difficult to simulate. Hoshina et al. (2014) mentions that the isotope ratios of ice cores change after snow deposition and that the governing postdepositional processes may vary by region. This study assumed that the isotope ratios of ice cores were equal to those in precipitation without considering any postdepositional processes when assimilating the isotope ratios of ice cores. The proxy models of corals and tree-ring cellulose are described here.
The equation to calculate the isotope ratios of corals is where a is uniformly set to −0.22 ‰ • C −1 for all corals, as per Thompson et al. (2011) where P , E , D , and Q are precipitation, evaporation, mixing from the lower layer, and the depth of the upper layer, respectively. Here, the depth of the upper layer Q and the damping factor D∕Q are set to 20 m and 0.4 month −1 . The subscripts p, e, and d refer to precipitation, evaporation, and mixing from the lower layer, respectively. The oxygen isotope ratio 18 Or is 18 Osw at a previous time step. In calculating 18 Osw , 18 Od was based on the oxygen isotope ratio data in seawater studied by LeGrande and Schmidt (2006), and P , E , 18 Op, and 18 Oe were calculated from the climate model simulation. Prior to data assimilation, monthly isotope ratios were averaged annually.
A model developed by Roden et al. (2000) was used to obtain the isotope ratios of tree-ring cellulose and is expressed as where fo is the fraction of carbon-bound oxygen and o is the fractionation factor; these values are 0.42 and 27‰ , respectively. The subscripts wx and wl refer to xylem water (same as source water) and leaf water, respectively. Following Okazaki and Yoshimura (2017), 18 Owx was arbitrarily assumed to be the moving average of the isotopic composition in precipitation at the site, traced 3 months backward. The values of 18 Owl were calculated based on air temperature, surface pressure, and relative humidity from the climate model simulation. To estimate the annual isotope ratios, monthly isotope ratios were weighted by climatological net primary production (NPP) provided by the National Aeronautics and Space Administration (NASA).

Proxy Data Used in This Study
A total of 129 isotopic proxy data (i.e., 65 corals, 43 ice cores, and 21 tree-ring cellulose) were used; these data were archived at the National Oceanic and Atmospheric Administration (NOAA; https://www.ncdc.noaa. gov/data-access/paleoclimatology-data), and the 2k Network of the International Geosphere-Biosphere Program Past Global Changes (PAGES 2k; Ahmed et al., 2013). The spatiotemporal distributions of proxies are shown in Figure 1. We assimilated the annual mean oxygen isotopic ratios of corals, ice cores, and tree-ring cellulose from 850 to 2000. Assimilated variables were normalized prior to data assimilation, and observation errors were uniformly set to 0.50 as per Okazaki and Yoshimura (2017). This value was based on the observation errors of oxygen isotope ratio of corals, ice cores, and tree-ring cellulose. Although the observation errors were not always available for all data on each proxy, a uniform value of 0.5 was used for all proxies, and the corresponding signalto-noise ratio (SNR) was 2.0.

Reference Climate Data Sets
Two types of climate data sets were utilized to validate climate reconstruction, of which one was provided by the Climatic Research Unit (CRU; Harris et al., 2020). The data cover the entire land area, with the exception of Antarctica from 1901 with a spatial resolution of 0.5°. The surface air temperature (°C) and precipitation (mm/ day) of CRU were used for validation. The second climate data set was sourced from the Japanese 55-year Reanalysis (JRA-55; Kobayashi et al., 2015) by the Japan Meteorological Agency (JMA); they produce global climate variables at intervals of 1.25° and analysis data from 1958 were available. We used temperature (K) from the surface analysis fields and total precipitation (mm/day) from two-dimensional average diagnostic fields. Prior to comparing with the results, each climate variable was averaged annually because annual analyses were computed for this data assimilation. Figure 2 shows a schematic of the proxy data assimilation; we followed Okazaki and Yoshimura (2017) using the offline data assimilation approach (e.g., Bhend et al., 2012;Steiger et al., 2014), based on the sequential ensemble square root filter (EnSRF; Whitaker & Hamill, 2002); this was a variant of the ensemble Kalman filter (EnKF; Evensen, 2003). Notably, the offline data assimilation did not ensure the temporal continuity of analyses as prior estimates were constructed from existing climate model simulations.

Data Assimilation Method
A 130-year single run from 1871 to 2000 (129 years from 1872 to 2000 in MIROC5-iso) to construct the prior estimates was used. Annual means were calculated from the simulation, and each mean of 130 years was considered an ensemble member. The same ensemble members were used for each year of the target period from 850 to 2000; this meant that the prior estimates used for data assimilation remained constant throughout the entire period.
The equations for the Kalman filter were Figure 2. Schematic of this proxy data assimilation.
where X a is the state vector of the analysis and X b is that of the background estimate. Eight variables including surface air temperature, precipitation amount, evapotranspiration, relative humidity, surface pressure, and three types of isotope ratios (i.e., corals, ice cores, and tree-ring cellulose) were included in the state vector. Vector y consisted of proxy data observations, while function H represented the observation operator that converts the model state to the observation state. Innovation, − ( ) , denotes the difference between observations and background estimates. Matrix K, the Kalman gain, adds a weighting to the innovation and transforms it into state space. Matrix B represents the background error covariance matrix, while R is the observation error covariance matrix. Matrix H is a linearized H around the background mean.
The update equations of the EnKF are expressed as In the EnKF, ̃ is equal to , while in the EnSRF, ̃ is An ensemble mean was denoted by an overbar, and the deviation from the mean was denoted by a prime. For the offline data assimilation in this study, B was constant over time, and R was also constant because the SNR was uniformly set to 2.0, as described in Section 2.2. Following Steiger et al. (2014), a localization function (Gaspari & Cohn, 1999) scaled at 12,000 km was used and applied to gain K.

Experimental Design
We tested whether the results were dependent on the background fields with three simulations (Table 1). The first simulation was based on the MIROC5-iso forced by the SST of MIROC5; the second was based on IsoGSM forced by the SST of MIROC5, while the last was based on IsoGSM forced by SST of CCSM4. The simulation periods were from 1871 to 2000 (from 1872 to 2000 in MIROC5-iso). Each 130 annual means were considered an ensemble member. Anthropogenic global warming following the nineteenth century was considered a major change in the last millennium. By using climate model simulations in the period as ensemble members, large variances were applied for data assimilation in these experiments. Note that model biases were not corrected.
This study used three types of proxies (corals, ice cores, and tree-rings), and three types of climate model simulations for MIROC5-iso-m, IsoGSM-m, and IsoGSM-c. Experiments with different combinations of proxies were conducted to understand the impact of using different types of proxies. Additionally, experiments using different simulations were also conducted to show the impact due to differences in prior estimates. These experiments may be considered a sensitivity analysis to evaluate the impact of proxies and prior estimates.
First, experiments that reconstructed climate fields from 850 to 2000 were conducted using all 129 proxies (ALL). Then, we limited the proxies used for data assimilation to investigate the impact of proxies on the analysis. The impact of proxies was validated based on experiments using corals (C), ice cores (I), tree-ring cellulose (T), corals and ice cores (CI), corals and tree-ring cellulose (CT), and ice cores and tree-ring cellulose (IT). This study focused on experiments C and IT as the differences in the impact of proxies were clear between these two types of experiments.

Experiments Using Different Prior Estimates Over the Last Millennium
This section focuses on the results of the ALL experiment. Figure 3 shows the annual variations in the global mean surface air temperature (°C) and precipitation amount (mm/day) from 850 to 2000. The fluctuations in the surface air temperature and precipitation amount were similar; during the tenth century and latter halves of the twelfth to sixteenth centuries, fluctuations in the precipitation amount were relatively large. These fluctuations were represented in three experiments (i.e., MIROC5-iso-m, IsoGSM-m, and IsoGSM-c). Table 2 lists the averages and standard deviations of the global mean surface air temperature and precipitation amount. In the three experiments, the standard deviations in the surface air temperature and precipitation amount in the tenth century were lower than those in the twentieth century, as shown in Figure 3. However, differences in standard deviations of surface air temperature and precipitation amounts between the tenth and twentieth centuries varied in each experiment. Standard deviations of surface air temperature and precipitation amount in the tenth century were 26%-29% and 44%-58%, respectively, compared with those in the twentieth century.
During the twentieth century, results were evaluated using the CRU data. For the CRU, the standard deviations of the global mean surface air temperature and precipitation amount were 0.26°C and 0.04 mm/day, respectively. Compared with the standard deviations of CRU, those of the three experiments were 30%-43% and 49%-76%, respectively, for surface air temperature and precipitation amount; as such the fluctuation range of this study was lower than that of CRU. However, the warming trend during the twentieth century was clearly apparent in the three experiments and in the CRU. Figures 4 and 5 show the spatial distribution of the standard deviations of surface air temperature and precipitation amount. In Figure 4, the three experiments show high variability in the Arctic and Antarctic regions and North America during the tenth century. In particular, high variations were also observed in the tropical Pacific. During the twentieth century, variability in the tropical Pacific and Arctic and Antarctic regions was higher than the tenth century. In addition, high value areas spread from the Arctic regions to most of the Northern Hemisphere in the three experiments. In Figure 5, the high value areas centered on the tropical Pacific were wider in the IsoG-SM-m and IsoGSM-c than the MIROC5-iso-m. In the three experiments, values were higher during the twentieth century than the tenth century.

Validation With JRA-55
To validate the climate field reconstructed in this study, we focused on the El Niño event from 1971 to 2000 (Table 3). Figure 6 shows    of IsoGSM-m were more similar to that of JRA-55. Compared with each experiment, the fluctuations of the ALL and C experiments were more similar to that of JRA-55. Table 4 shows the root mean square differences (RMSD) between each result and JRA-55, the correlation coefficients (r) with JRA-55, and standard deviations ( ). These values were based on annual variations in surface air temperature anomalies from 1971 to 2000. Compared with simulations for prior estimates of each experiment, the correlation coefficient increased in the data assimilation results of MIROC5-iso-m, IsoGSM-m, and IsoGSM-c. The correlation coefficients changed from 0.10 to 0.74, 0.10 to 0.77, and 0.47 to 0.69 with the MIROC5-iso-m, IsoGSM-m, and IsoGSM-c, respectively. The positive impact of the data assimilation approach was evident in the three experiments; Table 4 shows that the RMSD of experiment C was the lowest, while that of experiment IT was the largest for each result. Comparing the results of the ALL and IT experiments, each RMSD differed by 28%-34%. Each correlation coefficient of experiment C was higher than those of the ALL and IT experiments. The ratios of standard deviations (this study/JRA-55) were approximately 39%-99% in MIROC5-iso-m. These ratios were 29%-92% for IsoGSM-m and IsoGSM-c. Standard deviations of the ALL and C experiments were higher than those of experiment IT.    Figure 7 shows the spatial distribution of surface air temperature anomalies in the tropical Pacific in 1997. Anomalies were defined based on 30 years from 1971 to 2000. Figure 7j for the JRA-55 shows a positive anomaly in the central to eastern tropical Pacific and a negative anomaly in the western tropical Pacific. A similar spatial distribution was observed in each result of the ALL and C experiments, although the positive anomaly was lower than that of JRA-55. By contrast, the results of experiment IT did not show a positive anomaly in the tropical Pacific and a negative anomaly in the western tropical Pacific.    1975, 1976, 1992, 19931975, 1976, 1992, 19931976, 1986, 1992, 1993ALL 1987, 1991, 1992, 1993, 1997, 19981987, 1991, 1992, 1993, 1997, 19981991, 1993, 1997, 1998C 1987, 1991, 1992, 1993, 1997, 19981987, 1991, 1992, 1993, 1997, 19981991, 1993, 1997, 1998IT 1980, 1988, 1996, 1997, 19981988, 1991, 1992, 1996, 1998, 19991972, 1988, 1991, 1997, 1998, 1999JRA-55 1972, 1982, 1983, 1987, 1997 Note. The underlined years imply El Niño years. The cases of climate model simulations (Sim) for prior estimates are also shown.

Comparison With Other Reconstructions
We compared this study with other reconstructions based on data assimilation. The Paleo Hydrodynamics Data Assimilation product (PHYDA, Steiger et al., 2018) and the Last Millennium Reanalysis (LMR, Tardif et al., 2019) were used here. Figure 8 shows the annual variations of the surface air temperature anomaly in NINO.3 from 850 to 2000. Here, anomalies were based on the 1951 to 1980 reference period mean as the LMR values were anomalies from the 1951 to 1980 mean. In the results of this and other studies, similar fluctuations were represented; for example, during the tenth century, temperature anomaly declines were reconstructed. This study used only oxygen isotope ratios of coral, ice cores, and tree-ring cellulose although PHYDA and LMR additionally use tree-ring width, maximum density, speleothem, lake sediment, and marine sediment. In addition, available proxies were sparse in the tenth century; however, these declines may be represented in this and other studies. Table 6 shows the standard deviations of surface air temperature anomalies in NINO.3 for each century.
The PHYDA values were their highest from the tenth to eighteenth century. The MIROC5-iso-m values were lower than those of IsoGSM-m and IsoGSM-c from the tenth to sixteenth century, while those of MIROC5-iso-m were higher than the IsoGSM-m and IsoGSM-c values from the seventeenth to twentieth century. For each case of this and other studies, the values in the twentieth century were the highest.

Differences Due To Prior Estimates
Although there were differences due to model bias, these differences were mainly due to isotopes-incorporated climate models and proxy models, and the SST data used for the coral proxy model or climate model forcing.  Biases were not corrected in this study, although they were corrected in PHYDA (Steiger et al., 2018). However, if climate model biases are corrected, the differences in prior estimates for isotope ratios of corals and tree-rings will remain. Compared with the isotope ratios of prior estimates for MIROC5-iso-m, IsoGSM-m, and IsoGSM-c (not shown), the prior estimates for corals were more dependent on differences in SST data than climate models, and prior estimates for tree-rings varied for each of the three climate model simulations.
The differences due to prior estimates are shown in Table 2. During the entire period, standard deviations of surface air temperature and precipitation amount differed by up to 36% due to prior estimates. As prior estimates were constant for the entire target period using this method, fluctuations were dependent on the impact of proxies, while the fluctuation range was dependent on the impact of proxies and differences due to prior estimates.
In Figure 4, the spatial distribution of the high value areas in the MIROC5-iso-m was different from IsoGSM-m and IsoGSM-c. Differences due to these prior estimates are also shown in Figure 5. In the tropical Pacific, the spatial distribution of MIROC5-iso-m differed from that of IsoGSM-m and IsoGSM-c. Compared with MIROC5iso-m, the results from IsoGSM-m and IsoGSM-c represented higher values in the tropical Pacific where coral proxies were available. This suggests that coral proxies have a strong impact on the spatial distribution of IsoG-SM-m and IsoGSM-c.

Impact of Proxies
We compared the results of experiments ALL, C, and IT and verified the impact of proxies. The fluctuations in the NINO.3 index (e.g., JRA-55) were reproduced by the ALL and C experiments ( Figure 6). In particular, the values for 1997 and 1998 exceeded + in eight of nine experiments. The El Niño E6 was one of the largest of the twentieth century and was the best represented because its influence spread widely and reflected strongly on coral proxies in the tropical Pacific. Coral proxies play an important role in reproducing previous El Niño events. Cobb et al. (2003) revealed fluctuations in oxygen isotope ratios over the last 1100 years based on coral proxies in the central tropical Pacific. They analyzed the strength and frequency of El Niño events in the past, indicating that coral proxies in the tropical Pacific record information on these previous El Niño events; information on coral proxies were reflected in the results of the current study. In Table 4, each RMSD was high in experiment IT, and low in the ALL and C experiments. The decrease in RMSD was largely dependent on coral proxies. In addition, correlation coefficients were high in the ALL and C experiments and low in experiment IT. Thus, coral proxies were considered an effective proxy to reconstruct previous El Niño events using this method.
The impact of coral proxies on the MIROC5-iso, IsoGSM-m, and IsoGSM-c results is also shown in Table 5. Although El Niño events did not occur, the positive anomaly was higher than + in 1993. Confirming the spatial distribution of JRA-55, a positive anomaly was observed within 180° to 150°W in the tropical Pacific in 1993. This influence may be reflected in the MIROC5-iso, IsoGSM-m, and IsoGSM-c results.   NINO.3 From 1971 and From 850 to 2000 (Bottom) The importance of coral proxies for past El Niño reconstruction is also indicated in Figure 7. The results of the ALL and C experiments reproduced the positive anomalies in the central and eastern tropical Pacific; while in the experiment IT, the positive anomaly was small and the spatial distribution of past El Niño events could not be shown.

Conclusions
This study presented annual variations of the global distribution of surface air temperature and precipitation amount from 850 to 2000 using proxy data assimilation. We used isotopes-incorporated climate and proxy models to assimilate the oxygen isotope ratios of corals, ice cores, and tree-ring cellulose. Experiments based on different prior estimates and proxies were conducted to evaluate sensitivity.
Differences due to prior estimates were illustrated spatiotemporally. Although fluctuations in global mean surface air temperature and precipitation amount were similar in the three experiments, the standard deviations of the surface air temperature and precipitation amount differed by up to 36% due to prior estimates during 850-2000.
Additionally, the areas with the standard deviations of the precipitation amount in the tropical Pacific high were wider in the IsoGSM-m and IsoGSM-c than the MIROC5-iso-m. These results suggest that coral proxies significantly influence the results of IsoGSM-m and IsoGSM-c.
The impact of proxies was highlighted in the results of reproduced El Niño events. In experiments ALL and C, the El Niño-like positive anomaly of surface air temperature in the tropical Pacific was reproduced well and fluctuations of surface air temperature anomalies in NINO.3 were similar to the JRA-55. The results indicated that coral proxies were effective in reconstructing the previous El Niño patterns.

Data Availability Statement
The results of this study are available from https://doi.org/10.5281/zenodo.5760209.