The Role of Air–Sea Interactions in Atmospheric Rivers: Case Studies Using the SKRIPS Regional Coupled Model

Atmospheric rivers (ARs) play a key role in California's water supply and are responsible for most of the extreme precipitation and major flooding along the west coast of North America. Given the high societal impact, it is critical to improve our understanding and prediction of ARs. This study uses a regional coupled ocean–atmosphere modeling system to make hindcasts of ARs up to 14 days. Two groups of coupled runs are highlighted in the comparison: (1) ARs occurring during times with strong sea surface temperature (SST) cooling and (2) ARs occurring during times with weak SST cooling. During the events with strong SST cooling, the coupled model simulates strong upward air–sea heat fluxes associated with ARs; on the other hand, when the SST cooling is weak, the coupled model simulates downward air–sea heat fluxes in the AR region. Validation data shows that the coupled model skillfully reproduces the evolving SST, as well as the surface turbulent heat transfers between the ocean and atmosphere. The roles of air–sea interactions in AR events are investigated by comparing coupled model hindcasts to hindcasts made using persistent SST. To evaluate the influence of the ocean on ARs we analyze two representative variables of AR intensity, the vertically integrated water vapor (IWV) and integrated vapor transport (IVT). During strong SST cooling AR events the simulated IWV is improved by about 12% in the coupled run at lead times greater than one week. For IVT, which is about twice more variable, the improvement in the coupled run is about 5%.

such important societal roles, improved understanding and accurate forecasting of ARs and AR-induced precipitation are critical (Martin et al., 2018;Ralph et al., 2010).
To better understand ARs, the forecast skill of ARs in numerical weather prediction models has been assessed over the last several decades (DeFlorio et al., 2018;Lavers et al., 2016;Martin et al., 2018;Nayak et al., 2014;Wick et al., 2013). For example, Wick et al. (2013) assessed the control forecasts of five global operational ensemble forecast systems, focusing on integrated water vapor (IWV). The models exhibited some usable skill in predicting the overall occurrence of ARs out to 10 days, although the landfall position error was roughly +/− 500 km at 5-days lead time and degraded to +/− 1,000 km at 10-days lead time. They also investigated the influence of model spatial resolution on forecasting ARs and found that the error in AR width is greater in coarser-resolution models. Lavers et al. (2016) investigated the global ensemble reforecasts of integrated vapor transport (IVT) and precipitation across 31 winters. Their results showed that IVT (used as proxy to represent AR conditions) has higher predictability than precipitation, suggesting that IVT may be used to provide early awareness of extreme AR events. They also found large interannual variability in predicting IVT and precipitation. Martin et al. (2018) compared the forecasts of global and regional models against the observations. They demonstrated that improving the water vapor transport accuracy can significantly reduce precipitation error in the regional model, while this was not observed in the global model. In addition, the recently created scale for AR intensity and impacts  uses IVT specifically to define the AR intensity, and IVT depends upon the amount of water vapor in the air (best represented here by the parameter IWV). These factors motivate the use of IWV and IVT in the analyses presented herein.
To extend the predictability of ARs by numerical weather prediction models, recent studies have focused on the connection between ARs and lower frequency synoptic-scale atmosphere features. This is because AR location, intensity, and frequency are strongly modulated by these lower frequency variabilities, such as the El Niño Southern Oscillation, the Madden-Julian Oscillation, the Pacific Decadal Oscillation, and the Pacific North America (PNA) teleconnection patterns (Baggett et al., 2017;Gershunov et al., 2017;Mundhenk et al., 2016;Payne & Magnusdottir, 2014;Zhou & Kim, 2018). These studies suggest that AR prediction skill can be potentially extended through the knowledge of these lower frequency signals. In addition, DeFlorio et al. (2018) studied the combined effect of lower frequency signals on AR prediction skill. They showed that (1) AR prediction skill was increased over the north Pacific/western United States at a 10-day lead when El Niño and positive PNA conditions occur concurrently; and (2) AR prediction skill was increased over the north Atlantic/United Kingdom at a 7-day lead when La Niña and negative PNA conditions occur concurrently.
Air-sea interactions can also impact ARs and their predictability. Recent studies emphasized the importance of AR-induced strong winds (Shinoda et al., 2019;Waliser & Guan, 2017). These winds are often associated with large pressure gradients between extratropical cyclones on the northwest sides of the ARs and anticyclones on the southeast sides (e.g., Newell et al., 1992;Newman et al., 2012;Shinoda et al., 2019). Large air-sea fluxes of momentum, heat, and moisture then result from the strong winds, generating substantial ocean responses. Neiman et al. (2013) investigated a few landfalling AR events and showed that the upward surface latent heat flux can be 200 W m −2 in the AR region, and even higher on the northwest side of AR at 550 W m −2 . The recent study of Shinoda et al. (2019) showed a dipole-like structure that cooler/ warmer sea surface temperature (SST) is observed on the northeast/southwest side of the AR center due to strong surface winds and air-sea heat fluxes. The AR-induced variations and air-sea fluxes could feedback on the ARs and play a critical role in their evolution. However, although there are many studies on AR dynamics and thermodynamics (e.g., Martin et al., 2018;Ralph et al., 2004Ralph et al., , 2010Shinoda et al., 2019), very little is known about the influence of air-sea interactions on modeling and forecasting ARs. There are still fundamental questions to be addressed: 1. How do ARs impact the ocean? 2. How does the ocean impact ARs? 3. Can a coupled ocean-atmosphere model better simulate AR events?
The goal of this work is to investigate the influence of air-sea interactions on AR events. To this end, we perform a series of coupled and uncoupled numerical simulations in the northeastern Pacific region, where SUN ET AL. 10.1029/2020JD032885 2 of 24 ARs have been well-studied. We first present the SST variations and the ocean surface heat fluxes in a series of AR events, aiming to show how ARs impact the ocean. Then, by comparing the coupled and uncoupled runs, we isolate the effect of SST variations to investigate how the AR-induced ocean response feeds back onto the ARs. Finally, we use observational and reanalysis data to quantify the difference in skill between the coupled and uncoupled simulations.
The rest of this study is organized as follows. The coupled model, the design of the experiments, and the data used in this work are introduced in Section 2. An overview of the AR events is presented in Section 3. Section 4 details the impact of air-sea interactions on modeling AR events. Section 5 discusses IWV and IVT skill, and assesses sources of errors. The last section concludes the paper.

Coupled Model
In this case study, the Scripps-KAUST Regional Integrated Prediction System (SKRIPS, version 1.0) is used (Sun et al., 2019). The SKRIPS is a regional coupled ocean-atmosphere model: the oceanic model component is the MIT general circulation model (MITgcm) (Marshall et al., 1997) and the atmospheric model component is the Weather Research and Forecasting (WRF) model (Skamarock et al., 2019). The Earth System Modeling Framework (ESMF) (Hill et al., 2004) is used as the coupler to drive the coupled simulation. The National United Operational Prediction Capability (NUOPC) layer in the ESMF is also used to simplify the implementations of component synchronization, execution, and other common tasks in the coupling (Hill et al., 2004;Sitz et al., 2017). The schematic description of the coupled model is shown in Figure 1. In the coupling process, the MITgcm sends SST and ocean surface velocity to ESMF, and ESMF sends them SUN ET AL.  The schematic description of the SKRIPS regional coupled ocean-atmosphere model. The yellow block is the ESMF/NUOPC coupler; the white blocks are the ocean and atmosphere components; the red blocks are the implemented MITgcm-ESMF and WRF-ESMF interfaces. Although the regridding capability is implemented in SKRIPS, it is not used in the simulations because MITgcm and WRF use identical horizontal grids. ESMF, Earth system modeling framework; MITgcm, MIT general circulation model; NUOPC, National United Operational Prediction Capability; SKRIPS, Scripps-KAUST Regional Integrated Prediction System; WRF, Weather Research and Forecasting. to WRF as the bottom boundary conditions. WRF sends surface fields to ESMF, including (1) net surface longwave and shortwave radiative fluxes, (2) surface latent and sensible heat fluxes, (3) 10-m wind speed, (4) precipitation, and (5) evaporation. The MITgcm uses these variables to prescribe surface forcing, including (1) total net surface heat flux, (2) surface wind stress, and (3) freshwater flux. The total net surface heat flux is computed by adding surface latent heat flux, sensible heat flux, net shortwave radiation flux, and net longwave radiation flux. The MITgcm computes the 10-m neutral wind speed based on the 10-m winds from WRF and then computes the surface wind stress (Large & Yeager, 2004). The freshwater flux is the difference between precipitation and evaporation. The surface latent and sensible heat fluxes are computed using the COARE 3.0 bulk algorithm in WRF (Fairall et al., 2003).

Experimental Design
The AR events in the northeastern Pacific region are investigated. We perform 93 pairs of coupled and uncoupled hindcast simulations, which are initialized on each day in three Januaries from 2016 to 2018 (3 years × 31 days/year). We select these events because they capture different thermodynamic characteristics of ARs, which will be detailed in Section 4. Each simulation aims to examine the model skill up to 14 days and the ensemble of the runs allow us to examine the mean and spread of the hindcasts. In each simulation, a few ARs (about five AR events) can be observed throughout the domain, and the duration of ARs can be a few days.
The model domain extends from 18.16°N to 54°N and from 116°W to 180°. To generate the grids, we choose latitude-longitude (cylindrical equidistant) map projection for both MITgcm and WRF. The horizontal grid has 448 × 800 (lat × long) cells and the spacing is 0.08° in both directions. We use identical horizontal grids for both MITgcm and WRF to eliminate the issue of regridding winds near steep orography and complex coastlines (Seo et al., 2016). Although the regridding capability is implemented in SKRIPS, the interpolations are not performed in the coupling process. There are 40 sigma layers in the atmosphere model and 60 z-layers in the ocean model. The top of the atmosphere is at the 50 hPa pressure level.
The bathymetry of the ocean model is extracted from the 2-min Gridded Global Relief Data (National Geophysical Data Center, 2006). The time step of the ocean model is 120 s. The horizontal sub-grid mixing is parameterized using nonlinear Smagorinsky viscosities, and the K-profile parameterization is used for vertical mixing processes (Large et al., 1994). The time step for the atmospheric simulation is 30 s. The Morrison 2-moment scheme (Morrison et al., 2009) is used to resolve the microphysics; the updated version of the Kain-Fritsch convection scheme (Kain, 2004) is used for cumulus parameterization; the Mellor-Yamada-Nakanishi-Niino 2.5-order closure scheme (Nakanishi & Niino, 2004) is used for the planetary boundary layer (PBL); the Rapid Radiation Transfer Model for GCMs (Iacono et al., 2008) is used for longwave and shortwave radiation transfer through the atmosphere; the Noah land surface model is used for the land surface processes (Tewari et al., 2004).
In the present study, we perform the following simulations: 1. Run CPL: two-way coupled ocean-atmosphere (MITgcm-WRF) simulations 2. Run ATM.STA: stand-alone atmosphere (WRF) simulations with the initial SST kept persistent. This run serves as a benchmark to highlight the difference between the coupled and uncoupled runs. It allows assessing the atmospheric model behavior with realistic, but persistent SST Both CPL and ATM.STA are initialized using global analysis data. The initial conditions, boundary conditions, and forcing terms of the simulations are summarized in Table 1. In CPL, the ocean model uses the assimilated HYCOM/NCODA 1/12° daily global analysis data (the Global Ocean Forecast System, Version 3.0, https://www.hycom.org/dataserver/gofs-3pt0/analysis) as initial and boundary conditions for ocean temperature, salinity, and horizontal velocities (Chassignet et al., 2007). The boundary conditions for the ocean are updated based on linearly interpolating between the daily HYCOM/NCODA analysis data. A restoring layer with a width of 13 grid cells is applied at the lateral boundaries. The inner and outer boundary relaxation timescales are 10 and 0.5 days, respectively. The atmosphere is initialized using the NCEP FNL (Final) Operational Global Analysis data. The same data also provide the boundary conditions for air temperature, wind speed, and air humidity. The atmospheric boundary conditions are updated based on linearly interpolating between 6-h NCEP FNL data. The 'specified' zone in WRF prescribes the lateral boundary values, and the "relaxation" zone is used to nudge the solution from the domain toward the boundary condition value.
Here we use the default width of one point for the specific zone and four points for the relaxation zone.
Importantly, both CPL and ATM.STA derive skill from boundary conditions (i.e. they are dynamically downscaled hindcasts). This better allows us to focus on highlighting the impacts of air-sea interactions on ARs. In CPL run, HYCOM/NCODA data is used for the oceanic initial and lateral boundary conditions. Thus in ATM.STA run, HYCOM/NCODA SST is used as the initial condition and is persistent throughout the run. The atmospheric initial and lateral boundary conditions in ATM.STA are the same as CPL. The coupling interval used for CPL is 20 min to allow capturing the diurnal cycle of air-sea fluxes (Seo et al., 2014). In this study, we do not compare the coupled run with atmosphere-only model driven by daily SST from HYCOM or other SST datasets. This is because (1) we aim to show the difference in IWV and IVT due to the coupling and (2) daily SST may not be available in a real-time forecast.

Validation of the Results
To evaluate the performance of CPL and ATM.STA, the model outputs are compared with validation data. For water vapor during the AR events we compare IWV and IVT with ERA5 reanalysis data (ECMWF, 2017). IWV is calculated from specific humidity q (kg kg −1 ) in the atmosphere: where g is the gravitational acceleration (equal to 9.81 m s −2 ); and p is the pressure (Pa). IVT is calculated from specific humidity and wind speed: where u and v are the zonal and meridional wind speeds (m s −1 ), respectively. Note that we integrate IWV and IVT from the surface pressure p surface to 300 hPa (Lavers et al., 2016). In the simulations, there are about 30 vertical levels below 300 hPa in WRF. To better illustrate the difference between model outputs and validation data, IWV and IVT are averaged on a daily basis (ending at 0000 UTC) using hourly instantaneous diagnostics (Hecht & Cordeira, 2017;Lavers et al., 2015). The simulated surface turbulent heat fluxes (THFs) are validated against the 1 ° × 1 ° daily OAFlux data (Yu et al., 2008). The simulated SST fields are validated against the 1/12° daily HYCOM/NCODA data. Here we use the same HY-COM/NCODA analysis data as the initial and boundary condition in the coupled model (shown in Section 2.2), aiming to show the increase of error from initial condition. We used bilinear interpolation to transfer the validation data onto the model grid to achieve a uniform spatial scale. When interpolating SST, only the data saved on ocean points are used. The validation data are summarized in Table 2. Because (1) ARs can be observed in the selected domain throughout the simulations (shown in Appendix A and (2) the differences in IWV/IVT, THFs, and SST are found outside the AR regions (e.g., pre-AR and post-AR regions), we analyze the simulations results for the entire domain.
The Brier skill score (BSS) is used to examine the skill difference between CPL and ATM.STA (Von Storch & Zwiers, 2001). Here, we use the modified version that simplifies the comparability of positive and negative scores (Winterfeldt et al., 2011): where  2 F and  2 R are the mean squared error (MSE) of the "forecast" and "reference," respectively. According to Equation 3, positive BSS means the forecast is more skillful than the reference, whereas negative BSS means the forecast is less skillful than the reference. In this study, we use the difference between the model outputs and the validation data to calculate the BSSs. We recognize the validation data is also an estimate with recognized uncertainty. Nevertheless, we here use it as truth and choose to refer to the difference between model outputs and validation data as "errors."

Overview of the AR Events
A series of AR events with different thermodynamic interactions are observed in the simulations. To illustrate the different characteristics of ARs, the results obtained in two representative coupled simulations are shown: CASE1 initialized at 0000 UTC January 09, 2018; and CASE2 at 0000 UTC January 25, 2018. The evolution of the ARs is shown in Figure 2 by plotting the daily-averaged IVT fields 4, 6, 8, and 10 days after initiation. Here, we use IVT > 250 kg m −1 s −1 to define the AR region (Rutz et al., 2014). It can be seen in Figure 2 that ARs are observed in the selected domain throughout the simulations. Figure 2a shows several west-east oriented ARs in CASE1, with a maximum IVT of about 1,250 kg m −1 s −1 , whereas Figure 2b shows CASE2 has several ARs with a more south-north orientation and with a maximum IVT of about 900 kg m −1 s −1 . Figure 3 displays the 14-day averaged IVT and the number of days under AR conditions. The mean IVT in CASE1 is higher than that of CASE2, but the AR events cover similar regions in both cases.
To demonstrate different AR thermodynamic interactions in CASE1 and CASE2, the 14-day averaged surface THFs, the 14-day time-integrated Q net (net surface heat flux), and the SST difference between day 14 and day 1 (dSST 14 ) are plotted in Figure 4. The surface THFs in Figure 4a indicate the ocean is losing energy from surface turbulent heat transfer in both cases. The domain mean energy loss in CASE1 (mean THFs: −130 W m −2 ) is more significant than that in CASE2 (mean THFs: −103 W m −2 ). Figure 4b shows the time-integrated Q net in the representative cases. In both cases, the mean Q net in the domain is negative, indicating the ocean loses energy. However, in CASE2 the ocean gains energy in the AR region of about 0.4 × 10 8 J m −2 between 160°-140° W and 18°-42° N. Compared with CASE1, the total surface energy loss in CASE2 is only about half of that in CASE1 (CASE1: 2.34 × 10 21 J; CASE2: 1.14 × 10 21 J). Figure 4c shows the SST difference between the start and the end of the simulations. In CASE1, because the ocean loses heat, SUN ET AL.

Case Study
The two representative cases in Section 3 demonstrate different ARs thermodynamic interactions. In CASE1, SST cools about 1 °C in the AR region, whereas in CASE2, SST warming is observed in parts of the AR region. Here, we first examine the SST evolution in all coupled simulations and use the statistics (e.g., mean, standard deviation, ensemble spread) to demonstrate how ARs impact the ocean. We then investigate how the ocean impacts ARs by comparing coupled and uncoupled simulation results. Due to the chaotic nature of the atmosphere (the differences of the snapshots are detailed in Appendix A), it is challenging to investigate the physical processes that impact the distribution of water vapor without detailed experiments and process-based diagnostics. Hence, we focus on a statistical comparison rather than individual simulations due to the chaotic nature of the atmosphere.

Sea Surface Temperature
The SST evolution in all 93 coupled simulations is summarized in Figure 5. It can be seen in Figure 5a that coupled simulations generally reproduce the evolution of domain-averaged SST in consistency with HY-COM (mean error < 0.2°C; root-mean-square error < 0.6°C). Figure 5b highlights two groups of simulations that each have 31 members (1/3 of all simulations): (1) strong cooling ARs and (2) weak cooling ARs. The strong cooling ARs include 31 runs that have more significant SST cooling (mean dSST 14 : −0.50°C); the weak cooling ARs include 31 runs that have less significant SST cooling (mean dSST 14 : −0.22°C). Compared with the average climatological SST cooling (−0.32°C), the SST cooling in "strong cooling" events is stronger; the SST cooling in "weak cooling" events is weaker. Because the AR conditions are observed throughout each 14-day run in the selected domain, shown in Appendix A, we are able to study the interactions between ARs and the ocean in these runs. Note that in the "weak cooling" runs, the SST may increase in parts of the AR region (example shown in Figure 4), but the domain-averaged SST is still cooling. Here, we use the magnitude of SST cooling to separate the simulations because (1) SST changes are determined by the surface heat fluxes that are important in ocean-atmosphere coupling, and (2)  condition in the atmospheric model. The 31 runs that have intermediate cooling are included in the "all AR" statistics presented, but are not shown in isolation.
The SST simulated with the coupled model is now compared with the validation data to demonstrate the skill improvement over assuming a persistent SST ( Figure 6). In Figure 6a, we plot the root-mean-square errors (RMSEs) of SST obtained in CPL as a function of lead time in red. The upper (lower) whiskers represent maximum (minimum) values; the upper (lower) box bounds represent upper (lower) quartile Q 1 (Q 3 ); the box center lines represent median RMSEs for the 93 coupled or uncoupled simulations. The interquartile range is IQR = Q 1 − Q 3 , and the values above (below) the upper (lower) fence Q 1 + 1.5IQR (Q 3 − 1.5IQR) are outliers. In comparison, the RMSEs of persistent SST are also plotted in gray. It can be seen that the median, the upper/lower quartiles, and the maximum/minimum RMSE SST in CPL are all smaller than persistence from day 1 to day 14. Because the persistent SST is used in ATM.STA, this demonstrates that the SST in CPL agrees better with the validation data than ATM.STA. In addition, we plot BSS SST to quantify the improved skill in CPL ( Figure 6b). Here,  2 F in Equation 3 is calculated between HYCOM SST data and the simulated SST obtained in CPL;  2 R is calculated between HYCOM SST data and the persistent SST used in ATM.STA. It can be seen in Figure 6b that the median BSSs for all AR events are about 20% from day 1 to day 14. The median BSSs for strong and weak cooling AR events are 26.8% and 18.7%, respectively, shown in Table 3. However, in the second week, the BSSs of strong cooling ARs (44.6%) are higher than those of weaker cooling events (6.4%) by about 40%, resulting from the combined effect of the coupled model being SUN ET AL.   able to skillfully simulate the stronger SST changes as well as persistence being less skillful during the strong cooling events.

Surface Turbulent Heat Fluxes
In the AR events associated with stronger SST cooling, stronger surface turbulent heat losses from the ocean can be observed; in the AR events associated with weaker SST cooling, there is much less surface turbulent heat transfer between ocean and atmosphere. This section aims to demonstrate how the coupled model better simulates the surface turbulent heat transfer during the AR events.
To demonstrate how the coupled model better simulates the surface THFs, the comparison between the daily-averaged THFs and daily OAFlux validation data is shown in Figure 7. In Figure 7a, the RMSEs of THFs are plotted as functions of lead time. It can be seen that the RMSEs of CPL are smaller than those of ATM.STA from day 1 to day 14. Note that the RMSEs do not increase significantly (less than 5 W m −2 ) for longer lead simulations. To quantify the improvement of the coupled model, the BSSs are shown in shows the total domain-averaged SST variation dSST 14 during the 14-days simulation. Here, we highlight two groups of simulations that each has 31 members (1/3 of all simulations). The strong cooling ARs include 31 events that have more significant SST cooling (mean dSST 14 : −0.50°C); the weak cooling ARs include 31 events that have less significant SST cooling (mean dSST 14 : −0.22°C). The dashed line in panel (a) is the daily climatology SST (Banzon et al., 2014), available at ftp://eclipse.ncdc.noaa.gov/pub/OI-daily-v2/climatology/. SST, sea surface temperature.
 2 R is calculated between the OAFlux data and the THFs obtained in ATM.STA. It can be seen in Figure 7 that the medians of BSSs are increasing from 0.06 to 0.17 with lead time. In week one, the difference of BSSs between strong and weak cooling events is about 0.01, outlined in Table 3. However, in week two, the BSSs of strong cooling events are much higher than weak cooling events (strong cooling events: 20%; weak cooling events: 11%). This indicates that the skill improvement of the coupled model is more significant for strong cooling AR events.

Improved Skills in Simulating ARs
Because of the improved skill of the coupled model in simulating SST and surface THFs, the question arises whether the coupled model can also better simulate the ARs. This section investigates how much skill is added by the coupled model in simulating ARs. The diagnosed IWV and IVT are used to demonstrate the influence of air-sea interactions on ARs.
The RMSEs of both IWV and IVT are shown in Figure 8  Week 1 Week 2  and uncoupled models have much better skills than persistence. In week one, the RMSE IWV and RMSE IVT of CPL do not differ much from those of ATM.STA. In week two, the differences of RMSEs are larger: the median RMSE IWV of CPL is smaller by about 0.1 mm and the median RMSE IVT of CPL is smaller by about 1 kg m −1 s −1 . It is noted that there are a few simulations that have more than twice larger RMSEs than the median, but the model outputs are still better than the persistent values.
To demonstrate the relative skill improvement of the coupled model, the BSSs of IWV and IVT are plotted as functions of lead time in Figure 9. Here,  2 F is calculated between the ERA5 and the results obtained in CPL;  2 R is calculated between the ERA5 and the results obtained in ATM.STA. In Figure 9, the mean RMSE IWV and RMSE IVT are shown; the standard error of the mean are also plotted as error bars (AR events last for several days, meaning daily-mean IWV and IVT are not independent. We find the errors of forecasts using SUN ET AL.   persistent values of IWV and IVT plateau after 5 days (Figure 8), implying decorrelation on this timescale. Hence, we use the number of days simulated divided by five to determine sample size for calculating a standard error. For all AR events, n = 18; for strong/weak cooling ARs, n = 6.); the median, the upper/ lower quartiles, and the maximum/minimum RMSEs are shown in the inset figures. It can be seen that the mean BSS IWV and BSS IVT are all positive from day 1 to day 14. The coupled model is even better at simulating strong cooling AR events for both IWV (about 12% in week two) and IVT (about 5% in week two), shown in Table 3. However, the skill improvement is much less in weak cooling AR events, where the air-sea heat exchanges are smaller. The skill improvement of IWV is higher than that of IVT, because IVT is more variable than IWV. This difference will be discussed further in Section 5. The standard deviations of the BSSs are also shown in Table 3. It can be seen that BSS IWV and BSS IVT are less statistically significant in week one compared with week two. The BSS IVT is also less statistically significant compared with BSS IWV for the strong cooling AR events. This is also because IVT is more variable than IWV.
To investigate the relationship between the SST variation and the improvement in model forecast skill, we plotted the BSSs as functions of SST changes in Figure 10. It can be seen that both IWV and IVT skills in CPL increase when SST cooling is stronger in the simulations. The predictions of IWV and IVT of CPL are similar to those of ATM.STA when the SST cooling is less than 0.2°C. On the other hand, when the SST cooling is stronger than 0.5 °C, the mean BSSs of IWV and IVT are 18% and 16%, respectively. The BSSs are also plotted as functions of the time-integrated net surface heat flux Q net in Figure 11. It can be seen that the skill of the coupled model increases with increasing Q net loss. When the mean surface energy loss is smaller than 0.6 × 10 8 J m −2 , the BSSs are smaller than 5%; when the mean surface energy loss is more than 1.2 × 10 8 J m −2 , the mean BSSs of IWV and IVT are 18% and 15%, respectively. Because the accumulated SST cooling and Q net loss in week two are higher than those in week one, we hypothesize that the skill improvement is better in week two because of stronger SST and more Q net loss.

BSSs at Different Atmospheric Levels
Although the changing SST influences the ARs in the simulations, its impact is height dependent. In this section we analyze two representative levels: the lower level is from the surface to 850 hPa; the upper level is from 850 hPa to 300 hPa. These levels are selected because each contains about 50% of the water vapor transport. SUN ET AL.

10.1029/2020JD032885
13 of 24 The comparison of the relative skill at lower and upper levels is shown in Figure 12. At the lower level BSS I-WV and BSS IVT are all positive from day 1 to day 14, suggesting the coupled model better captures the water vapor in this level. In the second week, the improvement in IWV and IVT is about 12% and 4%, respectively (Table 4). However, the median BSSs in the upper level are almost neutral (between −2% and +2%) for both IVT and IWV, indicating the average impact of the SST on forecast skill is insignificant for the upper level. However improved forecast skill is apparent when splitting the strong and weak cooling AR events. As shown in Figure 13, the BSSs in strong cooling events are higher than those in the weak cooling events, and relative skill improvement of IWV and IVT in week two is 19% and 6% for the lower level and 10% and 3% for the upper layer (Table 4).

Interpreting the Forecast Skill
The comparison between CPL and ATM.STA demonstrates that the SST obtained in coupled run agrees better with the validation data than the persistent SST used in uncoupled run. It is also shown that the surface THFs, IWV, and IVT in the coupled run also agrees better with the validation data. Here, we first examine the impact of SST variations on THFs, IWV, and IVT in the simulations. We then investigate the components contributing to the total BSS (e.g., mean bias and standard deviation). This section aims to interpret the different skill scores shown in Section 4. Week 1

Impact of the SST Cooling
Week 2 Note. The average of the median BSSs in Figures 12 and 13 are shown. The standard deviations of the BSSs are shown in the parentheses. AR, atmospheric rivers; BSS, Brier skill score; IVT, interpolated vapor transport; IWV, interpolated water vapor.

Table 4
Summary of Relative Skill Improvements at Lower and Upper Atmospheric Levels the system where changes in THFs can impact atmospheric humidity, which can further impact the THF response. Yet, our analysis shows that the SST difference does delineate the skill of the two sets of ARs (strong cooling and weak cooling).
The differences in IWV and IVT due to SST variations are shown in Figure 14b and 14b, respectively. It can be seen that both IWV and IVT in CPL are smaller than those in ATM.STA. It is noted that the percentage differences of IWV and IVT are generally consistent because the mean wind speed is not sensitive to SST variations in the simulations. To investigate the differences in IWV and IVT between the simulations, we plotted the differences in evaporation, precipitation, and E-P (evaporation minus precipitation) in Figure 15. Each point in the background represents the accumulated evaporation and precipitation from the start of the simulations. It can be seen that both evaporation and precipitation in CPL are less than those in ATM.STA, and they are both one order of magnitude higher than the differences in E-P. If we compare E-P with the decrease of IWV in Figure 14b, they are generally consistent. Because both CPL and ATM.STA use the same boundary condition for the water vapor, we conclude that the difference in IWV is mainly due to the differences in E-P.
SUN ET AL.

Components Contributing to BSS
Although Section 5.1 demonstrated that both IWV and IVT decreases by the same percentage in CPL compared with ATM.STA, the results in Section 4.3 and Section 4.4 demonstrated greater skill improvement by the coupled model in forecasting IWV than in forecasting IVT, especially in the lower atmosphere. Because both IWV and IVT are used to describe the ARs, we examined the difference in BSS IWV and BSS IVT by comparing different components contributing to the total BSSs.
The BSS is computed by comparing the mean squared error (MSE) σ 2 , which combines information of the "mean bias" and the "standard deviation": where BIAS is the mean bias between model outputs and validation data; STD is the standard deviation between model outputs and validation data. Table 5 summarizes the MSEs, the biases, and the standard deviations of IWV and IVT in Section 4.3. In CPL, the mean IWV and IVT are both smaller than ATM.STA by about 1%; the mean biases of IWV and IVT are also smaller than those of ATM.STA; the standard deviations of IWV are similar to that of ATM.STA. When comparing the contribution of mean bias and standard deviation, we found that the IVT is far more variable because it is the integral of the product between water vapor and wind speed. Hence, when computing the BSSs, the improvement of mean bias in IWV is more important compared with IVT. Although the impact of SST variation on mean IWV and IVT are very similar by percentage (shown in Figure 14), the BSSs of IWV are much higher than those of IVT.
To investigate the difference in BSSs between lower and upper atmosphere, we summarized the MSEs, the biases, and the standard deviations in Table 6. We found that the mean IWV in both lower and upper atmosphere obtained in CPL are smaller than those in ATM.STA by about 1%, suggesting the impact of SUN ET AL.   SST on mean IWV is generally consistent in the upper and lower atmosphere. In the lower atmosphere, the mean biases are larger and the standard deviations are smaller, and thus the improvement of the MSE is more significant than the upper atmosphere (lower atmosphere: 9.2%; upper atmosphere: 0.6%).

Summary and Conclusion
A series of atmospheric river events were simulated using a regional coupled ocean-atmosphere model (SKRIPS v1.0). The coupled simulation results were compared with those in uncoupled simulations to demonstrate the ocean and atmosphere interactions during AR events. We found that the SST cooling in different cases can be significantly different, hence we highlighted two groups of simulations: (1) strong cooling ARs and (2) weak cooling ARs. The strong cooling group had the 31 AR events that occurred with the most significant SST cooling and the weak cooling group had the 31 AR events that occurred with the weakest cooling. The 31 intermediate cooling events were analyzed as part of the "all AR" statistics, but not in isolation.
Two representative simulations were selected to analyze different thermal interactions of strong and weak cooling ARs. CASE1 was west-east oriented with a maximum IVT of about 1,250 kg m −1 s −1 ; CASE2 was almost south-north oriented with a maximum IVT of about 900 kg m −1 s −1 . CASE1 exhibited much stronger SST cooling and surface energy loss, suggesting the influence of ARs on the ocean can differ significantly according to the events and background ocean state. When performing coupled simulations, the Brier skill score showed that simulated SST was about 20% more accurate than persistent SST. The surface turbulent heat fluxes resulting from the coupled simulations were about 10% more accurate. The improvement of the coupled model was even more pronounced in strong cooling AR events.
In addition, we investigated the skill improvement of the coupled model in simulating ARs. Due to the chaotic nature of the atmospheric system, we compared the statistics of BSSs in all simulations instead of comparing the snapshots of each event. In the present case study, both coupled and uncoupled models realistically captured the general characteristics of the atmospheric vertical integrals. For the strong cooling AR events, the coupled model showed improved skill in predicting IWV and IVT by 12% and 5%, respectively, for lead times of longer than 7 days. The differences between coupled and uncoupled simulations in weak cooling AR events are less significant.
The results presented here motivate further studies evaluating the effect of ocean-atmosphere coupling on AR events. Here we used a regional model to show that for runs out to 14 days coupling to an ocean model improved the simulation of AR characteristics. The impact of coupling on forecast skill on longer timescales is an important research topic, but best investigated with global models. Future work will involve exploring the response of SST to the atmosphere and ocean state (e.g., heat fluxes, wind stress, mixed layer deepening), the impact of the annual SST cycle, and the other characteristics of AR (e.g., AR intensity, orientation) on the coupling. In addition, the sensitivity of the coupled model to different physics schemes in WRF will be investigated.  ATM.STA, and the RMSEs of CPL and ATM.STA compared with ERA5. The aim is to demonstrate that the direct comparison of daily-averaged IWV and IVT suffers from the chaotic nature of the atmosphere in the present simulations.

Appendix
The snapshots of daily-averaged IVT contours of ARs in ERA5 are shown in Figure A1. We select every other day in January and early February, aiming to demonstrate the AR conditions in our study. It can be seen in Figure A1 that AR conditions are observed every day, covering about 20% of the computational domain. The differences between the simulation results obtained from CPL and ATM.STA (CPL−ATM. STA) are shown in Figure A2. We select the same representative simulations as Section 3 and show the daily-averaged IWV and IVT as obtained from these simulations. Generally, it can be seen that the IWV is smaller in CPL compared with ATM.STA, especially in CASE1. This is because the CPL captures the SST cooling and the reduction of E-P, which is a source of the water vapor. The comparison of IVT in Figure A2 shows that IVT is also smaller in CPL. The difference of IVT is associated with the difference of IWV. It can be also seen that a dipole pattern is observed in CASE2 after 8 and 10 days, which indicates a shift of the AR front between two simulations. The comparison between CPL and ERA5 (CPL-ERA5) is shown in Figure A3. It can be seen that CPL-ERA5 is three times larger than CPL-ATM.STA. We did not show ATM. STA-ERA5 because it is similar to CPL-ERA5. In CASE1, the coupled model over-estimates the IWV in the warmer sector of AR, but under-estimates the IWV in the cooler sector. However, in CASE2 the difference is not significant at warmer and cooler sectors. It can be seen in the figures that the differences of IWV/IVT SUN ET AL.
10.1029/2020JD032885 20 of 24 Figure A3. The difference of IWV/IVT between CPL and ERA5 data. The black contours highlight the AR region where IVT > 250 kg m −1 s −1 . The results obtained in CASE1 and CASE2 in Section 3 are shown. AR, atmospheric rivers; IVT, integrated vapor transport; IWV, integrated water vapor.
are chaotic because of the nonlinearty of the atmosphere, and thus it is challenging to investigate the physical processes that impact the distribution of water vapor without detailed experiments and process-based diagnostics. Instead, we examined the statistics of the skill of coupled and uncoupled models and detailed them in Section 4.

Appendix B
SST evolution and AR events The evolution of SST in CPL run is shown in Figure 5a. To illustrate the main course of the SST cooling, the evolution of Q net and mixed layer depth (MLD) is shown in Figure B1. Here the mixed layer depth is determined based on the definition in Kara et al. (2000). Figure B1 aims to show that Q net and MLD are associated with the SST cooling in the simulations. It can be seen in Figure B1a that the MLD increases when SST cools down in the simulations, and Figure. B1b shows that the mean Q net loss decreases when SST cools down. This suggests that Q net and mixed layer depth are correlated with the SST evolution.
Although we have compared the IWV and IVT of ARs between CPL and ATM.STA in Figure 14. The impact of the SST evolution in CPL run on ARs is not shown. To demonstrate the impact of SST evolution on the ARs, we plotted IWV and IVT as functions of SST variations in Figure B2. It can be seen that both IWV and IVT get smaller in CPL when SST cools down. From Figure 15, when SST cools down, the E-P gets smaller and the total water vapor of the domain is decreasing.
SUN ET AL.

Appendix C
Comparison between early and late January cases In Figure 5, we used the SST cooling to group the ARs in the simulations. It can be seen that most strong/weak cooling ARs occurred in the simulations are initialized on early/late January. Hence, we compared the cases initialized on the first 10 days and last 10 days (about 1/3 of all simulations).
The BSSs are plotted as functions of lead time in Figure C1. Here,  2 F is calculated between the ERA5 and the results of CPL;  2 R is calculated between the ERA5 and the results of ATM.STA. The median, the upper/ lower quartiles, and the maximum/minimum RMSEs are plotted in the figure. It can be seen that the median BSSIWV in early January cases is slightly better than late January cases, especially in the second week of the simulations (early January cases: 9.7%; late January cases: 3.6%). On the other hand, the median BSSIVT in early January cases is still better than late January cases, but the improvement is much smaller (early January cases: 2.0%; late January cases: 0.5%). Compared with Figure 9, using the SST cooling can better show the differences in the AR events in this case study. SUN ET AL.