Assessment of a New Global Ocean Reanalysis in ENSO Predictions With NOAA UFS

As an update on the current NOAA/NCEP operational ocean reanalysis systems, a new system named GLobal Ocean Reanalysis (GLORe) is recently built up based on the JEDI‐SOCA 3DVar scheme. In this study, the quality of GLORe is assessed in initializing ENSO predictions using the NOAA Unified Forecast System (UFS). In details, initialized by GLORe, 9‐month ensemble hindcasts are conducted from each May/November during 1982–2021. The ENSO prediction skill is compared to the current NOAA operational system CFSv2, suggesting that UFS initialized with GLORe has an improved skill in ENSO predictions. By conducting another set of hindcasts with UFS and the same initializations as CFSv2, it is found that the skill improvement is largely attributed to the ocean initialization with GLORe, but with some contributions from model improvements as well. The effect of ocean initializations is further confirmed by the superiority of GLORe over CFSR as validated against an objective analysis.


Introduction
Seasonal climate predictions are now routinely made using coupled dynamical models at many operational centers worldwide.In the development of the prediction systems, the skill of ENSO predictions is commonly used as a benchmark for evaluating their progress (e.g., Johnson et al., 2019;Saha et al., 2006Saha et al., , 2014)).Since the initial attempt by Cane et al. (1986) about 40 years ago, the capability of dynamical models to predict the ENSO has improved significantly, in spite that there was a drop on ENSO predictability around 2000 (e.g., Barnston et al., 2012;Hu et al., 2020;McPhaden, 2012).The progress has greatly benefitted from the adoption of ocean data assimilation (ODA) (e.g., Balmaseda et al., 2009;Rosati et al., 1997), which assimilates the oceanic surface and subsurface observations, providing ocean initial conditions (ICs) for the predictions.It is expected that a more accurate estimation of the ocean ICs (e.g., the thermocline depth) through ODAs is a key factor for more reliable ENSO forecasts.
A number of global ODA systems have been developed to synthesize various observations with the physics described by ocean general circulation models (OGCMs), to represent the time evolving, three-dimensional state of the ocean.Studies (e.g., Xue et al., 2012;Zhu, Huang, & Balmaseda, 2012) suggest that the ODA-based analyses are capable of achieving consistent estimates of the upper ocean heat content in the tropical Pacific Ocean.However, further hindcast experiments (Zhu, Huang, Marx, et al., 2012) indicate that quantitative differences among these analyses contribute to a large skill spread in ENSO predictions, with the anomaly • A newly developed ocean reanalysis (GLORe) is introduced and evaluated for its quality in initializing ENSO predictions • UFS initialized with GLORe presents a better performance in ENSO predictions than the current operational system at NOAA • The hindcasts will be a benchmark for the Seasonal Forecast System (SFS) development at NOAA Supporting Information: Supporting Information may be found in the online version of this article.correlation metric differing by as much as 0.2 for Niño-3.4index.The experiments suggest that a concurrent development of ODAs and coupled models is required for improved ENSO predictions.
At the National Centers for Environmental Predictions (NCEP)/NOAA, two ODA systems are running operationally, that is, the Global Ocean Data Assimilation System (GODAS; Behringer & Xue, 2004) and Climate Forecast System Reanalysis (CFSR; Saha et al., 2010).GODAS became operational in 2003, which was used in the initialization of the Climate Forecast System, version 1 (CFSv1; Saha et al., 2006), and has also been used for real-time monitoring and prediction supporting products [for example, "Monthly Ocean Briefing" (Hu et al., 2022) and "ENSO Diagnostic Discussion" (https://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_advisory/ensodisc.shtml) at Climate Prediction Center/NOAA].CFSR became operational in 2011 and has been used in the initialization of the Climate Forecast System, version 2 (CFSv2; Saha et al., 2014).As two highly related systems, GODAS and CFSR share some common weaknesses, and currently are lagging to meet the latest operational requirements.For example, both systems lack realistic salinity variability, because they do not assimilate the observed salinity information, and instead utilize the synthetic salinity profiles constructed from temperature and local climatological T-S relationships.Also, there is no sea ice analysis in GODAS, and sea ice in CFSR contains large biases (e.g., Collow et al., 2015).
The weaknesses of GODAS and CFSR have prompted NOAA to develop their replacements with updated model systems and data assimilation (DA) algorithms which can assimilate enhanced observational data sets.The Next Generation Global Ocean Data Assimilation System (NG-GODAS; Kim et al., 2022) is a recent effort along the path.It is built on the Joint Effort for Data assimilation Integration (JEDI) DA package [Trémolet & Auligné, 2020; developed at the Joint Center for Satellite Data Assimilation (JCSDA)] with the Modular Ocean Model version 6 (MOM6) ocean and the Los Alamos Community Ice Code version 6 (CICE6) sea ice models.In spite of great successes with the NG-GODAS development, its configuration prevents it from being implemented in operations.For example, the World Ocean Database (WOD; Boyer et al., 2018) in situ data set assimilated in the NG-GODAS is not available in real time.As a result, a new ODA system, built upon the NG-GODAS, that is, the GLORe, has been developed with modifications and improvements to meet the operational requirements (see Section 2.1).
As a first assessment of the quality of GLORe ocean analysis, this study particularly focuses on its performance in initializing dynamical ENSO predictions.Specifically, a set of hindcast experiments is carried out with the NOAA UFS with its ocean and sea ice components initialized by GLORe.The hindcasts are compared with the current seasonal prediction system at NOAA-CFSv2, and another set of UFS hindcasts initialized by CFSR for its ocean.

GLORe
The GLORe system is configured based on the MOM6 ocean and CICE6 sea ice models with the JEDI-Sea Ice Ocean Coupled Assimilation (SOCA; Holdaway et al., 2020) 3D variational (3DVar) scheme (developed by JCSDA).The MOM6 configuration is the same as the GFDL SPEAR (Lu et al., 2020) 1-degree configuration, which uses a nominal horizontal resolution of 1°with refinement to 1/3°in the meridional direction in the tropics, and has 75 layers in the vertical, with its layer thickness as fine as 2 m near the surface and 30 layers in the top 100 m.The similar ODA configuration was used in an observing system simulation experiment study (Zhu et al., 2021) about the tropical Pacific observing system.
GLORe is an extension of NG-GODAS (Kim et al., 2022), but with modifications to meet the operational requirements for climate monitoring and diagnostics.The main modifications include: (a) for the B-matrix, the 3dimensional Background error on Unstructured Mesh Package (BUMP; Ménétrier, 2020) is applied (vs.2-D BUMP in NG-GODAS); (b) atmospheric forcing is changed to the Conventional Observation Reanalysis (CORe; Ebisuzaki et al., 2020) [versus Climate Forecast System Reanalysis (CFSR;Saha et al., 2010) prior to 2000 and Global Ensemble Forecast System (GEFS) afterward in NG-GODAS]; (c) some of the assimilated observations are changed to ones available in real time and/or with better qualities, for example, the Fleet Numerical Meteorology and Oceanography Center (FNMOC; https://nrlgodae1.nrlmry.navy.mil/ftp/outgoing/fnmoc/data/ocn/profile/) in situ data set after 2004 [versus WOD in NG-GODAS], the Operational Sea Surface Temperature and Ice Analysis (OSTIA; Donlon et al., 2012) sea surface temperature (SST) analysis (vs. a mixed usage of ESACCI and NESDIS L3 satellite SST retrievals in NG-GODAS), and the Climate Data Record version 4 (CDRv4; Meier et al., 2021) sea ice concentration (vs. a mixed usage of NSIDC L3 and EMC L2 sea ice concentration in NG-GODAS); (d) a new sea ice deaggregation scheme (Lindsay & Zhang, 2006) is applied for an improved sea ice analysis.In both GLORe and NG-GODAS, the observed salinity data are explicitly assimilated, in contrast to GODAS and CFSR that assimilate the synthetic salinity profiles.

UFS, Hindcasts, and Validation Data Sets
The UFS (https://ufscommunity.org) is the next generation modeling infrastructure under development for NOAA's operational numerical weather/climate predictions.It consists of the FV3 (the Finite-Volume cubedsphere dynamical core) atmospheric component with the Noah land surface model with multiparameterization options (Noah-MP; Niu et al., 2011), the MOM6 oceanic component, and the CICE6 sea ice component.The UFS also includes the wave and aerosol components, but they were turned off in our experiments.The components are coupled through the Community Mediator for Earth Prediction Systems (CMEPS).The UFS has been applied in some seasonal forecasting practices.For example, Zhu et al. (2023) used the UFS Prototype 5 for the seasonal outlook of Arctic sea ice, which is currently maintained in real time (https://www.cpc.ncep.noaa.gov/products/people/jszhu/seaice_seasonal/index.html);Ray et al. (2023) tested the same UFS prototype in the hindcast of the 2015/16 El Niño event.In our study, the UFS version used is Prototype 8 (UFSp8); the FV3 resolution is C96 (∼1°) horizontally with 64 vertical levels, and MOM6 and CICE6 have a nominal horizontal resolution of 1°(with the same configurations as in GLORe).
By using UFSp8 as a forecast model, hindcasts (referred to as ufsGLORe hereafter) are conducted with its ocean and sea ice initialized from GLORe, and atmosphere and land initialized from CFSR.The May and November starts are chosen in this study, which is to evaluate the forecast skill about the onset and decay phases of ENSO events, respectively.In specific, the hindcasts start from 21st-25th of each May/November during 1982-2021 with one forecast from each day, forming a five-member ensemble for each initial month, and cover 9 full target months.By taking the May starts as an example, in this paper predictions for June will be defined as the 0-month lead, those for July will be defined as the 1-month lead, and so on.It is noted that while five ensemble members can capture most of SST predictability, larger ensemble sizes are needed for fields with low inherent predictability like precipitation (e.g., Kumar & Hoerling, 2000).
The ENSO prediction skill in ufsGLORe is mainly compared to the current NOAA seasonal prediction system CFSv2 (Saha et al., 2014), but also to systems from the North American Multi-Model Ensemble (NMME; Kirtman et al., 2014) as well.The operational CFSv2 is initialized by CFSR (Saha et al., 2010) with a lagged ensemble approach.Specifically, for each year, 4 predictions were produced every 5 days with ICs for all model components taken from the CFSR.In this study, five predictions from ICs at 06Z, 12Z and 18Z on May 21st and 00Z and 06Z on May 26th (00Z, 06Z, 12Z and 18Z on November 22nd and 00Z on November 27th) during 1982-2021 are used for the May (November) forecast starts, a selection minimizing as much as possible the effect from differences in ensemble sizes and lead times.This group of hindcasts is referred to as cfsCFSR hereafter.
To understand the skill difference between ufsGLORe and cfsCFSR, we conduct another set of hindcasts, that is, ufsCFSR.The hindcasts use the same forecast model (i.e., UFSp8) as ufsGLORe, and the same ocean initialization (i.e., CFSR) as cfsCFSR.The cross comparisons among the three hindcasts will assess the roles of ocean initialization and forecast model in the skill difference between ufsGLORe and cfsCFSR.The ufsCFSR are conducted for the May starts only.
The predicted SST anomalies (SSTAs) in ufsGLORe, ufsCFSR and cfsCFSR are derived with respect to their respective lead time-dependent SST climatologies based on the entire hindcast period .For cfsCFSR and ufsCFSR (both initialized from the CFSR ocean states), another set of SSTAs (respectively referred to as cfsCFSR_2CLM and ufsCFSR_2CLM hereafter) are derived by removing model respective climatology for the periods 1982-1998 and 1999-2021, separately.The two-period approach was recommended by Xue et al. (2013) when evaluating ENSO predictions in CFSv2.It was motivated by a clear shift around 1999 in the trade winds and equatorial subsurface temperature in the CFSR.Barnston et al. (2019) suggested that the CFSv2 using the twoperiod climatology approach is one of "the top two performing individual models" in terms of the deterministic skill of ENSO predictions, among eight models from the NMME (Kirtman et al., 2014).The finding seems still valid when compared with more recent NMME models (L'Heureux et al., 2023;Figure S4 in Supporting Information S1).
In this study, SSTs from the NOAA OISSTv2.1 (Huang et al., 2021) are used to validate the predicted SST anomalies (SSTAs).The EN4.2.2 (hereafter referred to as EN4; Good et al., 2013), an objective monthly analysis based on in situ ocean observations from the Met Office Hadley Center, is used for a comparison between CFSR and GLORe.

Results
We start the assessment with spatial distributions of SSTAs prediction skills.Figure 1 shows the skill maps measured by anomaly correlation for lead times of 2, 5, and 8 months for the May starts.It is evident that higher correlations are mostly present in the central and eastern equatorial Pacific for all predictions, a region where SSTAs are dominated by the ENSO.For hindcasts initialized from the CFSR ocean states, the adoption of two climatologies improves the skill measure not only with the CFSv2 model (Xue et al., 2013), but also with the UFSp8 model.The skill gain is more evident at the longer leads (e.g., at the 8-mon lead in Figure 1c vs. Figures 1a  and 1b).It is expected as the CFSR shift around 1999 is mainly related to surface winds that result in the shift in subsurface temperature (Kumar et al., 2012) and it will take months for the shift to appear at the surface when initialized with CFSR.
Compared to cfsCFSR, ufsCFSR presents a higher prediction skill.For example, at the 8-mon lead, while in cfsCFSR (the first row of Figure 1c) the anomaly correlation is below 0.7 over almost all the tropical Pacific, there is a sizable region in the central Pacific with skill higher than 0.7 in ufsCFSR (the third row of Figure 1c).The higher skill of ufsCFSR suggests benefits from an improved model system in ENSO predictions.It is particularly encouraging for UFSp8, as the use of CFSR initial conditions was expected to favor the CFSv2 hindcasts from a weaker initial shock because the same model is used for CFSR assimilation and CFSv2 forecasts.On the other hand, the superiority of UFSp8 over CFSv2 seems less evident when two climatologies are used to compute the anomalies.
Compared to hindcasts initialized from the CFSR ocean states (no matter whether one or two climatologies are applied), ufsGLORe presents consistently higher correlations at all leads longer than 0-mon.The skill enhancement is not only in the central and eastern tropical Pacific (the ENSO region), but also in other regions as well.For example, in the tropical western North Pacific, while the correlation skill in ufsGLORe is well above 0.7 at all leads, it is much lower in other hindcasts.In addition, it is noted that the SST prediction skill is slightly lower in ufsGLORe than in cfsCFSR and ufsCFSR over most of the tropical Pacific (except the central and eastern equatorial Pacific) at the 0-month lead (Figure S1 in Supporting Information S1).This is likely related to that the OISST SSTs, which are used in all our forecast verifications, are assimilated in CFSR while the OSTIA SSTs are assimilated in GLORe.
For the November starts (Figure S2 in Supporting Information S1), the skill at long leads is clearly lower than for the May starts, but a higher skill with ufsGLORe than cfsCFSR/cfsCFSR_2CLM is still evident for both 2-mon and 5-mon leads.
Figure 2 evaluated the prediction skill of the Niño-3.4index as a function of lead months, a standard metric of the overall ENSO prediction skill.For the May starts, the comparison about the anomaly correlation measure (Figure 2a) indicates that ufsGLORe demonstrates the highest skill, with its correlation measure above 0.85 for all lead times.In contrast, cfsCFSR shows the overall lowest skill, and its correlation measure is around or below 0.8 at all leads except at the 0-mon lead.The skill superiority with ufsGLORe over cfsCFSR shows up immediately after forecasts start (e.g., at the 0-mon lead), and their correlation skill difference tends to grow as the lead time increases, which is above 0.2 at the 8-mon lead.
Relative to cfsCFSR, ufsCFSR presents a slight skill enhancement, and the enhancement is mostly less than 0.05 in correlations.The skill improvement reflects the contribution of model improvements.When two climatologies are used in the CFSR ocean-related hindcasts, their ENSO prediction skill is improved at lead times longer than 1 month, and its benefit seems more evident for the CFSv2 hindcast.As a result, cfsCFSR_2CLM and ufsCFSR_2CLM exhibit an overall comparable skill when measured by anomaly correlations.
The comparison between ufsGLORe and ufsCFSR/ufsCFSR_2CLM demonstrates the value of ocean initializations in enhancing ENSO prediction skills.It is seen that the correlation skill enhancement from GLORe can be larger than 0.1 at most leads.In addition, the superiority of applying GLORe can also be noted when comparing the reduction in skill in the first two monthly leads in the hindcasts.In ufsGLORe, the correlation skill drops from 0.94 at 0-mon lead to 0.92 at the 1-mon lead.In contrast, in cfsCFSR/cfsCFSR_2CLM, the correlation skill drops at a much larger rate, from ∼0.91 at the 0-mon lead to ∼0.75 at 1-mon lead.Even with UFSp8, the CFSR ocean initialization also results in a much quicker skill drop (from ∼0.91 to ∼0.82).
Overall, the above cross-system comparisons indicate that the skill enhancement of ENSO predictions with ufsGLORe over the current operational system (CFSv2) is largely attributed to its ocean initializations with GLORe, and with some contributions from the model advances with UFSp8 as well.
Figure 2b presents the root-mean-square error (RMSE) between the observed and predicted Niño-3.4indices.Measuring RMSE is important when forecasting the intensity of ENSO (e.g., L'Heureux et al., 2019).From the perspective of operational ENSO forecasts, however, it is less important than the anomaly correlation metric, as the prediction of ENSO phases measured with anomaly correlations is more important than forecasting ENSO strengths/amplitudes. Also, the intensity forecasts can be improved with some post-processes (e.g., rescaled with the observed climatological amplitudes), but it is more difficult to correct forecast errors regarding ENSO phases.Among all hindcasts, cfsCFSR_2CLM and ufsCFSR_2CLM exhibit an overall smallest RMSE, both featuring a first RMSE growth and then a decay after a peak during the boreal fall/winter.A similar RMSE evolution is seen in ufsGLORe.But in cfsCFSR and ufsCFSR, the RMSEs evolve in a different way which exhibit an overall monotonic growth as the lead time increases.A closer comparison of the RMSE evolutions also suggests a systematic bias in the UFSp8-related hindcasts.That is, they tend to have large RMSEs during fall, and the feature is most distinct in ufsGLORe and discernible in ufsCFSR as well.The bias seems to be a model deficit of UFSp8, but it requires additional experiments to determinate which model components/modules contribute to it which is beyond the scope of this study.
For the November starts, the prediction skill measured by anomaly correlations in all systems presents a remarkable skill decline during spring (consistent with the spring predictability barrier), and ufsGLORe and cfsCFSR/cfsCFSR_2CLM have generally comparable skill at leads longer than 3 months.For leads shorter 3 months, it is encouraging that ufsGLORe shows a significantly higher correlation skill.
We also made a brief comparison between ufsGLORe and the component systems of the recent NMME with hindcasts back to 1982 (Figure S4 in Supporting Information S1).Even though our ufsGLORe hindcasts use a smaller ensemble size than most NMME systems, an encouraging ENSO prediction skill is noted for both May and November starts.
Figure 3 shows the predicted and observed Niño-3.4indices during the period 1982-2021 for the May starts, with another angle of view presented in Figure S3 in Supporting Information S1.From Figure S3 in Supporting Information S1, it is clear that Niño-3.4indices in cfsCFSR and ufsCFSR present a shift around 1999, which are  , 1982-2021).The dashed read and blue curves are for cfsCFSR_2CLM and ufsCFSR_2CLM in which anomalies are computed by removing model climatology for separate periods (i.e., 1982-1999 and 1999-2021).The dots in (a), (c) mean that the correlation skill is significantly lower than that in ufsGLORe at the 90% confidence level.warmer after 1999.This shift is not evident in observations (Figure S3a in Supporting Information S1) and other predictions (Figures S3c, S3d, S3f in Supporting Information S1).Overall, all hindcasts capture the onset and developments of major warm and cold events, and their transitions, such as the 1982/83, 1997/98, 2009/10 and 2015/16 El Niño events, and the 1988/89, 1998/99/2000, 2007/08 and 2010/11 La Niña events.However, their amplitudes may have appreciable errors.A remarkable example is the 1997/98 El Niño events, whose peak amplitude was clearly overestimated in all hindcasts (e.g., by 2.5°C in ufsGLORe).For this event (and the 2009/ 10 one as well), consistent with Figure 2b, the UFSp8-based hindcasts tend to produce larger amplitude biases than CFSv2, reflecting a model error for UFSp8.On the other hand, there are also events that ufsGLORe predicts differently from other hindcasts (although not always in a better way).For instance, while ufsGLORe well predicts the warm episode during 1986-1988, other hindcasts initialized from the CFSR ocean states completely fail to predict it.This implies that the ocean initialization plays a more important role than the forecast models in predicting the events.In addition, it seems that ufsGLORe also performs better in predicting some mild/nearneutral conditions, for example, the mild cold episode during 2001-2002.These skillful hindcasts, however, may not be reflected in the anomaly correlation metric (Figure 2a) as major events with large amplitudes have larger contributions.
In addition, the ENSO prediction skill in ufsGLORe also show a drop after 2000 for both May and November starts (Figure S5 in Supporting Information S1).The feature seems opposite to CFSR-related hindcasts, but has been seen in many prediction systems in association with the decadal shift of ENSO properties around 2000 (e.g., Barnston et al., 2012;Hu et al., 2020;McPhaden, 2012).
The value of the GLORe initialization in ENSO predictions can be further affirmed by the superiority of its analysis over CFSR. Figure 4 presents a comparison about the quality of their ocean analysis validated against the EN4 (Good et al., 2013).The analysis shows respective RMSEs of monthly mean temperature and salinity relative to the EN4 in the equatorial Pacific during 2004-2021, a period chosen considering that the EN4 is more reliable after an advent of Argo, particularly for salinity.
For the temperature (Figures 4a and 4b), both CFSR and GLORe feature large RMSEs in the far western Pacific and along the thermocline.The former corresponds to the Indonesian seas with a complex bathymetry, which might be a factor for the larger RMSEs.The thermocline state (or subsurface thermal condition), on the other hand, plays a vital role in the ENSO cycle (Jin, 1997;Meinen & McPhaden, 2000), and thus, is of foremost importance in ENSO predictions.It is encouraging to note that GLORe demonstrates a clear improvement relative to CFSR-while the RMSEs in CFSR can be easily above 1.6°C, the RMSEs in GLORe are mostly below 1.2°C (Figure 4b vs. Figure 4a).The improvement of salinity in GLORe than CFSR is also evident (Figures 4c and 4d).For example, the salinity RMSEs in CFSR are well above 0.2 psu in the western Pacific warm pool region, extending from the surface to the thermocline depth, which are much smaller in GLORe.The salinity improvement is expected from the salinity assimilation procedure in CFSR.In both GODAS (Behringer & Xue, 2004) and CFSR (Saha et al., 2010), the observed salinity information is not assimilated, and instead synthetic salinity profiles derived from climatological T-S relationships together with real temperature information are used in assimilation.Although salinity is not as important as temperature in ENSO evolution, it does demonstrate some beneficial effects on ENSO predictions (e.g., Zhu et al., 2014).Therefore, the better ENSO prediction performance with GLORe than CFSR (Figures 1-3) could be attributed to both temperature and salinity having a higher quality.

Summary and Conclusions
In this study, hindcast experiments are conducted to explore the quality of a JEDI-based ocean analysis (GLORe) in ENSO predictions.The hindcasts are based on the NOAA UFSp8, and cover 40 years .The performance of the hindcasts is compared with CFSv2, the current NOAA seasonal prediction system (and briefly with the recent NMME models as well), and a third hindcast experiment using the same UFSp8 but with its ocean initialized by CFSR.Skill comparisons indicate that the hindcast with UFSp8 and GLORe presents a better performance in ENSO predictions than CFSv2.The anomaly correlation metric for the Niño-3.4index is higher than CFSv2 for all lead times, and the correlation skill enhancement reaches as much as 0.2 at some leads.Further cross comparisons among the hindcasts suggest that the skill improvement is largely attributed to ocean initializations (i.e., GLORe vs. CFSR), and with contributions from model improvements as well (i.e., UFSp8 vs. CFSv2).The benefit of ocean initializations with GLORe is further affirmed by validations against an objective ocean analysis, which demonstrates an improved quality in both temperature and salinity in GLORe relative to CFSR.
The hindcasts with UFSp8 and GLORe are encouraging, as their configurations are highly relevant to a future operational Seasonal Forecast System (SFS) under development in many regards.First, SFS will be based on a version close to UFSp8, but with a higher resolution.Even though the resolution used in this study performs well for ENSO predictions, a higher resolution is expected to improve predictions in other respects, for example, a higher vertical resolution in FV3 will better resolve the stratosphere-troposphere interactions, and a higher horizontal resolution in FV3 will also benefit the representations/forecasts of meteorological variables over the continents with a complex topography.Second, the initialization of SFS will be based on a coupled DA system under development, and its ocean component is also based on the JEDI package.With ongoing efforts in the development of the SFS, our hindcasts reported in this study can serve as a useful benchmark.for their insightful comments.We are grateful to the EMC Dynamics and Coupled Modeling Group for their help with the configuration of the UFS.All experiments were carried out on the NOAA's R&D HPC System GAEA.We gratefully acknowledge Drs.Jong Kim (EPIC/NOAA) and Dominikus Heinzeller (JCSDA/UCAR) for maintaining the proper modules/stacks in running UFS and JEDI-SOCA on GAEA.The scientific results and conclusions, as well as any view or opinions expressed herein, are those of the author(s) and do not necessarily reflect the views of NWS, NOAA, or the Department of Commerce.

Figure 1 .
Figure 1.Distribution of anomaly correlations between observed and predicted SST anomalies at (a) 2-, (b) 5-, and (c) 8month lead times with being shown in the first, second and third column, respectively.From the top to bottom row are respectively shown for cfsCFSR, cfsCFSR_2CLM, ufsCFSR, ufsCFSR_2CLM and ufsGLORe.The hindcasts start from the May initial conditions during 1982-2021.

Figure 2 .
Figure 2. (a), (c) Anomaly correlation coefficients and (b) RMSEs (°C) of Niño-3.4 index as a function of forecast lead months (x-axis) for hindcasts starting from the (a), (b) May and (c), (d) November initial conditions during 1982-2021.The solid red, blue and green curves are for cfsCFSR, ufsCFSR and ufsGLORe, respectively, in which anomalies are computed by removing model climatology for the whole period (i.e., 1982-2021).The dashed read and blue curves are for cfsCFSR_2CLM and ufsCFSR_2CLM in which anomalies are computed by removing model climatology for separate periods (i.e., 1982-1999 and 1999-2021).The dots in (a), (c) mean that the correlation skill is significantly lower than that in ufsGLORe at the 90% confidence level.