Catchment‐scale skill assessment of seasonal precipitation forecasts across South Korea

Climate change is expected to make droughts more frequent and severe all over the world. South Korea is no exception and is already suffering from extreme droughts such as one that prolonged from 2013 to 2015 and caused nation‐wide damage. To mitigate drought damages, better management of existing water infrastructure is essential. A promising opportunity to improve operational decisions is to make use of seasonal weather forecasts that are provided by general circulation models and can be downscaled to the catchment scale. This study hence assesses the skill of seasonal forecasts over 20 catchments in South Korea, where the largest reservoirs are located. Datasets from four weather forecasting centres (ECMWF, UK Met Office, Météo France and DWD) have been evaluated over the period 2011–2020, and their skill quantified using the Continuous Ranked Probability Skill Score (CRPSS). We analyse how skill varies across the seasons and years, and if it can be linked to catchments characteristics. In doing so, we develop a methodology and a Python package to implement it, which is freely available for future applications to other regions. As for the study case, our results showed that among the four forecasting centres, ECMWF's forecasts were the most skilful in South Korea. In particular, seasonal forecasts outperform the climatology for 2 months of lead time and are more skilful during the wet season and in dry years. Linear bias correction is found to be useful to correct systematic seasonal biases, whereas we found no significant correlation between the catchment characteristics and forecast skill. We also investigated the possibility of anticipating dry years from seasonal forecasts and/or ENSO indices but found no significant link.

However, constructing new water infrastructure, such as reservoirs, is increasingly difficult in many countries (including South Korea) hence better management of existing infrastructures is essential (Gleick, 2002(Gleick, , 2003. Making better use of medium-and long-term weather forecasting information is a promising avenue to improve the performance of existing infrastructure and help mitigate drought damages. Quantitative seasonal forecasts generated from general circulation models (GCMs) have become increasingly available over the last decade (Bauer et al., 2015). GCMs rely on mechanistic representations of atmospheric processes to deliver numerical forecasts of variables such as precipitation, temperature and evaporation at a temporal and spatial resolution that are much higher than what can be generally achieved with statistical (data-driven) forecasting approaches (e.g., Jin et al., 2005;Kim & Kug, 2018). This higher resolution opens the possibility of using weather forecasts (possibly after downscaling) to force hydrological models at the catchmentscale, which is the most useful for informing water management decisions. Unlike short-term forecasts, which predict individual events that may happen over a temporal horizon of 1 or 2 weeks ahead, seasonal forecasts aim at predicting weather variables such up to 7 months ahead (Arnal et al., 2018;Coelho & Costa, 2010), a temporal horizon more adequate to support droughts management decisions. As uncertainty in weather forecasting is substantial, ensemble forecasts are typically used (Coelho & Costa, 2010;Collazo et al., 2022). In this paper, we will focus in particular on ensemble seasonal forecasts of precipitation, because lack of precipitation is a key trigger of droughts (Naumann et al., 2015). In order to establish the usability of these data products by water managers, the first step is to evaluate how reliable and skilful seasonal precipitation forecasts (SPFs) are.
Several studies investigated the skill of seasonal precipitation forecasts in different regions of the world. All studies concur in finding that the skill of seasonal precipitation forecasts is highest for the first (or second) month ahead and rapidly drops thereafter. This has been found in studies at the global scale (e.g., Roy et al., 2020) as well as at regional scale (e.g., Ogutu et al., 2016 for East Africa; Tchinda et al., 2022 for Central Africa). Second, the skill can significantly vary in space and time. Skill is typically higher in tropical regions (e.g., Manzanas et al., 2014;Roy et al., 2020;Weisheimer & Palmer, 2014), and more variable in other regions (e.g., Frías et al., 2010 for Spain;Bett et al., 2020 for China;Gubler et al., 2019 for South America). In addition, past research suggests that seasonal precipitation forecasts are more skilful during the wet season (e.g., Bett et al., 2020;Kolachian & Saghafian, 2019;Roy et al., 2020). A previous study also investigated the relationship between the forecast skill and regional (or catchment) characteristics, and their findings suggest that there is no significant relationship between them (e.g., Kolachian & Saghafian, 2019 for Iran).
Generally, seasonal forecasts have relatively coarse resolution (Crochemore et al., 2016) (although still higher than statistical forecasts). For example, the seasonal forecasts datasets available through the Copernicus Climate Data Store, which will be analysed in this paper, are provided at 1 × 1 horizontal ($100 km grid) resolution. Since we aim to assess the skill of precipitation forecasts at the catchment scale, downscaling method to correct systematic biases should be considered. Previous studies found that linear bias correction has potential for improving seasonal forecasts skill (e.g., Charles et al., 2013;Crochemore et al., 2016), as well as other more sophisticate bias correction methods (e.g., Kolachian & Saghafian, 2019;Manzanas & Gutiérrez, 2018;Zarei et al., 2021).
Lastly, there have also been many efforts to connect forecast skill to large-scale climate patterns such as El Niño-Southern Oscillation (ENSO). It has been demonstrated that forecast skill deteriorates when moving away from the El Niño region (e.g., Weisheimer & Palmer, 2014;Shirvani & Landman, 2015 for Iran;Ferreira et al., 2022 for South America).
To our knowledge, whereas many studies have analysed the possibility of producing statistical seasonal forecasts for South Korea at either national or regional scale Kim et al., 2020aKim et al., , 2020bKim & Kug, 2018;Lee & Julien, 2016;Noh & Ahn, 2022;Son et al., 2015), there has been no previous attempt to assess the skill of seasonal precipitation forecasts from GCMs across catchments in South Korea. Few studies (Hyun et al., 2020;Kim et al., 2021; described the GCM-based forecasting systems run by Korea Meteorology Administration (KMA) and the APEC (Asia-Pacific Economic Cooperation) Climate Centre (APCC), but they mainly focused on the comparison between different versions of the operational forecasting system and the skill of weather variables at a global scale.
This paper aims at filling this gap and examine the predictability of catchment-scale precipitation, as a first step to subsequently investigate predictability of river flows and usefulness for drought mitigation. Specifically, we address the following research questions: (1) Which forecasting centre (among those represented in the Copernicus Climate Data Store) offers the most skilful precipitation forecasts, at which lead time, and in which season? (2) Are there systematic biases in the seasonal forecasts? (3) Is statistical downscaling (bias correction) useful? (4) How does skill change with catchment characteristics such as catchment size and mean annual precipitation? (5) How does skill change in wet and dry years? (6) Can we anticipate if a year will be dry, based on ENSO indices or the seasonal forecasts themselves?
Another contribution of this study lies in developing and sharing a workflow, along with its Python implementation, for comprehensive assessment of seasonal forecasts produced by GCMs, which future users can easily apply to other regions. We hope that the availability of this package can contribute to lowering the barriers for the uptake of seasonal forecasts and expanding their use in South Korea as well as in other parts of the world.

| Data and study site
As of 2020, there are 1318 in situ precipitation measuring stations in South Korea, which are operated by the Ministry of Environment, the Korea Meteorologic Administration, and the National Water Resources Agency K-water (Ministry of Environment, 2021). Daily precipitation data are published after calibration and validation by the Ministry of Environment. Our study utilizes these precipitation data and the Thiessen polygon method to calculate the areal precipitation over a given catchment.
Naturally, South Korea is highly susceptible to both floods, because of its mountainous terrain and frequent intense rainfalls during the short rainy season, and droughts, because of its relatively small area and long dry season (Kyoung et al., 2011). Such spatial and temporal variability of precipitation represents a challenge for the management of surface water resources. To mitigate water-related disasters, many reservoirs have been constructed since the 1960s. Out of over 17,000 reservoirs currently present in the country, we selected 20 multi-purpose reservoirs that play a key role for water supply and flood control, providing 67% of the total storage volume and 95% of flood control capacity of the country. Their locations and the boundaries of the catchments drained by these reservoirs are illustrated in Figure 1. Some key characteristics are reported in Table 1.
Among these reservoirs, there exist considerable differences in the characteristics of the catchments they drain, in terms of their size (from 33 to 6648 km 2 ), annual mean precipitation (from 960 to 1477 mm), and annual variation of precipitation (standard deviation from 223 to 377 mm) as shown in Table 1. In general, reservoirs located in the South tend to have higher annual precipitation with more variability, whereas reservoirs in the Eastern inland have less annual precipitation and variability.
Another challenge for water management is the substantial variability in precipitation over the year and across years, as seen in Figure 2. Over 66% of the annual precipitation is concentrated during the wet season (June to September) due to extreme events such as intensive rainfall, typhoons and monsoon. Therefore, in this season there is not only a large amount of precipitation but also high interannual variability. On the other hand, the dry season (December-February) provides only 7% of the annual precipitation and exhibits a smaller interannual variability. In order to explore how the skill of seasonal forecasts varies over the year, in this study, we will divide the year into four seasons: dry (December-February), dryto-wet transition (March-May), wet (June-September), wet-to-dry transition (October-November).

| Seasonal precipitation forecasts
The Copernicus Climate Data Store provides seasonal forecast datasets at 1 × 1 horizontal resolution and daily temporal resolution for every month since 1993. These datasets come from eight forecasting centres, including the European Centre for Medium-Range Weather Forecasts (ECMWF), the UK Met Office, Météo France, the German Weather Service (DWD), the Euro-Mediterranean Centre on Climate Change (CMCC), the US National Centre for Environmental Prediction (NCEP), the Japan Meteorological Agency (JMA) and the Environment and Climate Change Canada (ECCC). Each forecasting centre has their own system to generate seasonal forecasts, and these systems use different forecast models (i.e., a different combination of atmosphere, land surface, ocean and sea-ice models) as well as different initialization and perturbation methods ( Table 2).
The forecasting centres also differ in terms of dataset availability, ensemble size and forecasting lead time. In particular, CMCC, NCEP and JMA do not provide forecast datasets in several months from 2017 to 2018, and ECCC does not provide forecast datasets for April every year from 2017 to 2020. In order to be able to assess the forecast skill over a continuous time period, we chose not to use datasets from these centres. We focus instead on datasets provided by the ECMWF, UK Met Office, Météo France and DWD. Specifically, we downloaded the datasets for 28 years from 1993 to 2020 for a region covering the Korean Peninsula (longitude: 126 -129 , latitude: 33 -40 ), extracted the time series of SPFs for each catchment shown in Table 1, and computed the catchment areal precipitation using the Thiessen polygon method.
Note that KMA (Korea Meteorological Administration) also provide seasonal forecasts, but only categorical (Above, Near, Below Normal) forecasts are available. APCC (APEC Climate Centre) provide numerical forecasts with monthly time scale which is not suitable to force catchment-scale hydrological models. Therefore, we did not include the forecasts from KMA and APCC in our comparison.

| Methodology
First, we compiled ensembles of Seasonal Precipitation Forecasts (SPFs) using datasets provided by ECMWF, UKMO, Météo France, DWD for 10 years (2011-2020) and generated ensembles of climatology from historical data  over 20 catchments. To quantify the skill of SPFs, we adopted the Continuous Ranked Probability Skill Score (CRPSS) (see section 2.3.1).
The observed annual precipitation between 1996 and 2020 for all 20 catchments is illustrated in Figure 3. Here, the black squares and triangles represent extremely wet and dry years, respectively. In order to assess the impact of bias correction (section 2.3.2), we divide the time period for which forecasts are available (1993-2020) into two and use the datasets from 1993 to 2010 to calculate the bias correction factors (blue line) and the datasets from 2011 to 2020 for assessing the bias corrected forecasts (red line).

| Skill assessment
Since SPFs are composed of multiple ensemble members, single-valued evaluation metrics such as the root-mean-square error, although applicable for example to the ensemble mean, are not very informative. A probabilistic methodology should instead be T A B L E 1 Properties of the 20 multipurpose reservoirs and the catchments they drain (K-water, Korea Water Resources Corporation, 2022)

River system
Reservoir name  adopted to evaluate the entire forecasts ensemble (Gneiting, 2011). There exist several metrics developed to evaluate the skill of ensemble forecasts, such as Brier scores (Brier, 1950), the relative operating characteristics curves (Mason, 1982) and the ranked probability score (Epstein, 1969). In particular, the continuous ranked probability score (CRPS) (Matheson & Winkler, 1976) measures the difference between the cumulative distribution function of the forecast ensemble and the reference dataset (i.e., the observations). Compared to other metrics, the CRPS has the advantage of being sensitive to the entire range of the forecast ensembles and being clearly interpretable since it is equal to the mean absolute error for a deterministic forecast (Hersbach, 2000); therefore, it is a widely used metric to assess the skill of ensemble forecasts (Leutbecher & Haiden, 2020). The CRPS is calculated as where F(x) represents the cumulative distribution of the SPFs ensemble, x and y are, respectively, the forecasted and observed precipitation, H is called the indicator function and is equal to 1 when x ≥ y and 0 when x < y. If the SPFs were perfect, i.e., all the ensemble members exactly matched the observations, the CRPS would be equal to 0. Conversely, the higher the CRPS, the lower the skill of the SPFs. The continuous ranked probability skill score (CRPSS) is a measure derived from the CRPS where the forecast skill is expressed in relative terms with respect to a "reference" forecasting method. The precipitation climatology (i.e., an ensemble build from historical precipitation observations) is typically applied as a reference, as it is known to be a "hard-to-beat" reference (Pappenberger et al., 2015;Peñuela et al., 2020). Here, we generated the climatology using historical daily precipitation records available from 1966 to 2010, hence leading to an ensemble of 45 members for each catchment. The CRPSS can be calculated as where CRPS Sys is the CRPS of the SPFs and CRPS Ref is the CRPS of the reference (i.e., climatology in this study). The values of CRPSS can range from -∞ to 1. When the CRPSS is positive (i.e., from 0 to 1), the forecasting system is more skilful than the reference and when the skill scores is negative, i.e., from −∞ to 0, the system is less skilful. When CRPSS is equal to zero, the forecasting system (SPFs) and the reference (climatology) have the same skill and when it is equal to 1, the forecast is perfect. Because in our study we analyse multiple catchments over several years, we need to aggregate the CRPSS values to provide a measure of the overall skill across spaces and time. If we do this by taking the average of the CRPSS values obtained in different catchments in different years, we generally find negative average values since the CRPSS distribution is skewed towards negative values, especially in the dry season (an example of these features is shown in Figure S1, Supporting Information). Moreover, the average CRPSS can be highly affected by the presence of a few abnormally high or low score values in specific catchments or years. This makes it difficult to appreciate how often SPFs are actually more or less skilful than climatology and compare products from different providers. Therefore, instead of using the average CRPSS we decided to measure the "overall F I G U R E 3 Observed annual precipitation (dots) from 1966 to 2020 in the 20 catchments feeding the reservoirs considered in this study. The black line represents the mean annual precipitation over the 20 catchments. The red line represents the period for assessing the seasonal forecasts (2011-2020), and the blue line represents the period used to compute the bias correction factors [Colour figure can be viewed at wileyonlinelibrary.com] skill" by calculating the chances of SPFs being more skilful than climatology. Our overall skill thus is a probabilistic form of CRPSS for multiple catchments and years, and it can be calculated as where N c and N y are the total number of catchments and years, respectively, the indicator function H is equal to 1 when CRPSS (c,y) > 0 (SPFs are more skilful than climatology in catchment c and year y) and 0 when CRPSS (c,y) ≤ 0 (climatology beats SPFs). If the overall skill is greater (lesser) than 50%, we conclude that the SPFs are generally more (less) skilful than climatology across catchments and years.

| Statistical downscaling via linear bias correction
Downscaling is a technique to refine GCM output, and it can be mainly classified into two approaches: dynamical and statistical downscaling (Fan et al., 2021;Misra et al., 2017;Shao & Li, 2012). Statistical downscaling methods are commonly favoured due to their simplicity and cost-effectiveness (Fan et al., 2021). Among various statistical downscaling methods currently in use, such as bias correction, quantile perturbation and event-based weather generator (Tabari et al., 2021), the bias correction method, which adjusts the output of climate models to make it more consistent with observed data, is widely used to enhance the performance of climate models (Keller et al., 2022;Maraun, 2016). However, there are many contradictory arguments regarding the application of bias correction to ensemble forecasts. Some researchers argue that a bias correction could be a way to hide rather than reduce uncertainty contained in a climate model (Ehret et al., 2012;Hagemann et al., 2011). Contrastingly, there is also evidence to support the usefulness of bias correction, for example, Crochemore et al. (2016) reported that the coarse resolution of seasonal forecasts can lead to systematic errors and therefore bias correction is recommended in practice. The SPF datasets available through the Copernicus Climate Data Store have a resolution of 1 × 1 , which is too coarse for the catchment sizes considered here (see Table 1). Therefore, we need to adopt a bias correction method for statistical downscaling of the forecasts.
Although there are various bias correction methods, such as variance scaling and quantile mapping, we applied the linear bias correction (linear scaling) method in this study. This method is a simple statistical downscaling approach (Melesse et al., 2019) and has been shown to be effective for precipitation (Azman et al., 2022;Shrestha et al., 2017). Furthermore, a previous study has demonstrated that the linear bias correction can yield similar results to more sophisticated methods (Crochemore et al., 2016).
Linear bias correction is based on calculating the average difference between observations (at monthly timescale) and forecasts produced by climate models over the same historical period. These differences, called "bias correction factors," are then applied to seasonal forecasts to generate bias-corrected forecasts. A previous study suggests that additive correction is preferable for temperature whereas multiplicative correction is preferable for variables such as precipitation, vapour pressure, solar radiation, and so on (Shrestha & Htut, 2016). Thus, the linear bias correction for precipitation can be expressed as where P Ã forecasted is the bias corrected forecast of daily precipitation, P forecasted is the original forecast before bias correction, b m is the bias correction factor for month m, μ m means monthly mean, and P observed is observed daily precipitation. In this study, the observations and forecasts datasets from 1993 to 2010 were used to compute the monthly bias correction factors b m (m = 1,…, 12), and these were then used to adjust SPFs for later years (2011-2020).

| Potential for anticipating dry and wet years
The last step in our analysis of SPFs will look at their skill in wet and dry years, and the possibility of anticipating if a year will be dry based on climate patterns such as El Niño-Southern Oscillation (ENSO) or the seasonal forecasts themselves.
First, we investigate whether the Multivariate ENSO Index (MEI) and Southern Oscillation Index (SOI) can be used to identify dry years. The MEI, calculated by the National Oceanic and Atmospheric Administration (NOAA), is a time series of the leading combined empirical orthogonal function (EOF) of five discrete variables (sea level pressure, sea surface temperature, zonal and meridional components of the surface wind, and outgoing longwave radiation) over the tropical Pacific basin (30 S-30 N and 100 E-70 W). MEI was adopted in this study because it carries multiple oceanic and atmospheric variables characterizing ENSO. The SOI is calculated by the Climatic Research Unit (University of East Anglia) and it is defined as the normalized pressure difference between Tahiti and Darwin. SOI was adopted in this study because this index has been used in previous studies regarding the teleconnections over South Korea (e.g., Jin et al., 2005;Kawamura et al., 2005;. Second, we consider SPFs themselves as a trigger to identify dry years. Specifically, we consider the mean precipitation ratio (MPR) between SPFs and climatology for all ensemble members, which is calculated as where N is the number of days in the forecasting horizon, μ i P entire ensembles seasonal forecasts À Á is the daily mean precipitation across the members of the SPFs ensemble, and μ i P entire ensembles climatology is the daily mean precipitation across the climatology ensemble. The ratio in Equation (5) gives an indication of the overall trend of the SPFs against climatology: a value greater (lesser) than 1 implies that, according to the SPFs, the coming season will be wetter (drier) than historical average. Second, we also calculate the MPR over the 5 ensemble members with lowest aggregate precipitation over the forecasting horizon, i.e., MPR 5 lowest ensembles = P N i = 1 μ i P 5 lowest ensembles seasonal forecasts À Á P N i = 1 μ i P 5 lowest ensembles where P 5 lowest ensembles seasonal forecasts and P 5 lowest ensembles climatology are precipitation in the 5 lowest ensemble members from SPFs and climatology, respectively. The reason for introducing the MPR over the 5 lowest ensemble members is because the most extreme ensemble members may provide some specific insights to identify particularly dry years.

| RESULTS
3.1 | Forecast skill from different forecasting centres before and after bias correction Figure 4a shows the overall skill (defined as the chances of SPFs being more skilful than climatology), before bias correction, for four different forecasting centres at lead time from 1 to 6 months (x-axis) in different seasons (y-axis). This overall skill is the average over the 20 catchments and over the 10 years period (see Equation (3)). Blue (red) colours denote that SPFs are more (less) skilful than climatology (on average).
The first panel in Figure 4a shows that SPFs provided by ECMWF are the most skilful, showing better skill than climatology, especially during the wet and the wet-to-dry transition seasons, at least at lead times up to 2 or 3 months. However, ECMWF forecasts are much less skilful during the dry season (December-February). To check the robustness of these conclusions, we also repeated the skill calculations using the entire available datasets (from 1993 to 2020) and found consistent results (see Figure S2). Figure 4b illustrates the bias correction factors of each catchment (x-axis) in different seasons (y-axis) computed from linear bias correction method (see Equation (4)). If this factor is greater (lesser) than 1, the SPFs underestimate (overestimate) the precipitation for a month, and the thicker (lighter) colour, the stronger (weaker) systematic bias. As can be seen for the figure, SPFs underestimate (overestimate) precipitation during the wet (dry) season, whereas they have relatively smaller bias in the dry-to-wet and wet-to-dry transition seasons. The bias correction factors from ECMWF tend to be smaller than the other centres, as expected given the generally higher skill of these SPFs seen in panel (a). Figure 4c shows the overall skill of SPFs after applying bias correction, which is notably improved in all the seasons. These improvements are evident for all four centres but are particularly evident in Météo France and DWD which have obvious seasonal bias. Although the skill in dry season after bias correction remains relatively lower than the other seasons, bias correction is more useful in this season.
Overall, the results in this section denote that the SPFs provided by ECMWF are more skilful that those of the other centres both before and after the bias correction. Therefore, the remaining of this study will be focussed on the forecast datasets from ECMWF.

| Forecast skill and catchment characteristics
We now analyse the relationship between forecast skill and catchment characteristics such as catchment size, annual mean precipitation and precipitation standard deviation (see Table 1). Figure 5 represents the Pearson's correlation coefficients (defined as covariance of the two variables divided by the product of their standard deviations) between the overall skill and each catchment characteristic for different lead time (x-axis) in accordance with conventional measure to interpreting a correlation coefficient (Schober et al., 2018). If the correlation coefficients are larger (smaller) than zero, the analysed variables (catchment characteristic and forecast skill) have proportional (inverse proportional) correlations. In calculating the correlation coefficients, it is important to use a sufficiently large number of samples (e.g., David, 1938 recommended the use of Pearson's correlations only if the sample size is greater than 25). Hence, for this analysis we have utilized the entire SPFs datasets of 28 years (from 1993 to 2020) instead of the shorter one (10 years from 2011 to 2020) used in the previous section to analyse the effects of bias correction.
(Therefore, the results shown in this section are drawn using non-bias corrected SPFs data only.) Figure 5a indicates that there is no notable correlation between catchment size and forecast skill. Figure 5b,c instead shows strong positive correlations during the dry (red cross symbol) and wet-to-dry transition (yellow plus symbol) seasons between skill and mean and standard deviation of annual precipitation in specific lead times. In other words, the wetter the catchments and the higher the interannual variability, the higher the forecast skill during the dry and wet-to-dry seasons. However, the annual mean correlation (black dots) generally remains at a moderate level. A comprehensive overview of the data used to calculate these correlations is provided in Figure S3.  Figure 6 shows the overall skill of SPFs from ECMWF at different lead time (x-axis) and in different seasons. The left column (panels a and b) refers to the average over 10 years (2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020), the middle column (c and d) refers to the skill in two wet years (2011 and 2020) and the right column (e and f ) refers to the average of two dry years (2015 and 2017). The first row (a, c, e) is before bias correction, the bottom row (b, d, f) is after bias correction. All data are the mean over 20 catchments.

| Forecast skill in wet and dry years
The top-left panel (Figure 6a) confirms the findings presented in section 3.1 (and Figure 4) showing that, before bias correction, SPFs are more skilful than climatology for up to 2 month of lead time, in all seasons except the dry season. In wet years (Figure 6c), non-bias-corrected SPFs are less skilful and this skill decrement is more apparent during the wet season. In dry years (Figure 6e) instead SPFs have the highest skill during the wet season, where they outperform climatology at all lead times. These results are confirmed when computing skill over a longer period (1993-2020) including 5 wet and dry years (see Figure S4).
After bias correction (bottom row of Figure 6), forecast skill improves for most seasons. The impact of bias correction is more significant in dry years (Figure 6f) than wet years (Figure 6d). Also, the impact of bias correction is different across seasons: for example, the skill in the dry season (red lines) is remarkably improved after bias correction whereas in the wet season (blue lines) it is mostly unchanged or even slightly deteriorated in the dry years.

| Potential for anticipating dry years
Due to the potential impact of insufficient precipitation during the wet season (June-September) on various social and economic activities, such as agriculture and water supply for the following year, the ability to predict precipitation for this season is particularly important. Interestingly, the results in section 3.3 show that the SPFs skill is particularly high during the wet season in dry years, suggesting that SPFs might be useful in these circumstances. On the other hand, SPFs seem to be less skilful than climatology in the same season of wet years, suggesting they should not be used then. Being able to discriminate in advance if a year will be particularly dry or wet would thus be very important. In this section, we will investigate if this is possible, using other sources of information (ENSO indices) and/or the SPFs themselves. Figure 7a,b shows the relationship between ENSO indices ((a) MEI and (b) SOI, x-axis) and accumulated precipitation during the wet season (y-axis) averaged over 20 catchments for 28 years . Here, the red circles indicate the 5 dry years in the study period and the green circles the other 25 years. The circle size is proportional to the magnitude of accumulated precipitation during the wet season.
As shown in the figure, the red circles are widely distributed along the x-axis for both indices, suggesting that dry years can occur regardless of the status of ENSO (i.e., in years of both high and low MEI and SOI indices). The figure also shows that there is no significant correlation between the total precipitation (size of the green circles) and magnitude of ENSO indices. Given the possibility that the connection to ENSO may be affected by the location of the catchments, we produced similar F I G U R E 5 Pearson's correlation coefficient between the overall skill of ECMWF SPFs and catchment characteristics (catchment size (a), annual mean precipitation (b), and standard deviation of annual mean precipitation (c), respectively; see Table 1) for 28 years from 1993 to 2020 at different lead time (x-axis) and for different seasons (see legend for colour coding) [Colour figure can be viewed at wileyonlinelibrary.com] plots by averaging catchments in the Northern, Central and Southern regions but still found no signals of dry years (see Figure S5).
We now investigate if SPFs provided by ECMWF can capture the signals of dry years in advance. Figure 7c,d shows the MPR (mean precipitation ratio between SPFs and climatology) calculated over the entire ensembles (c) and over the 5 ensemble members having the lowest aggregated precipitation (d) (see Equations (5) and (6)). Given that we are interested in signals for the wet season (June-September), we test forecasts with start months from March to June and 4 months of lead time. In Figure 7c,d, red triangles (grey cross) represent the MPRs in dry (ordinary) years. Note that, for consistency purposes, the results shown in Figure 7c,d are based on SPFs before bias correction. However, we also conducted the same analysis using bias corrected forecasts (see Figure S6) which led to similar conclusions.
The MPR over the entire ensembles ( Figure 7c) show that SPFs tend to provide drier forecasts than climatology, whereas MPR over the 5 lowest ensembles (Figure 7d) show that they provide slightly wetter forecast than climatology. However, in both cases we could not find a threshold value for MPR to distinguish dry years (red triangles) and ordinary years (grey cross). In short, this result suggests that it is difficult to anticipate dry years from SPFs. Additional experiments using different lead times from 3 to 6 months were also carried out, yet we could not find threshold values there either (see Figure S7).

| The skill of seasonal precipitation forecasts in South Korea
This study provides an overall picture of the performance of SPFs in South Korea. Our results confirm a number of findings from previous research in terms of forecast skill in relatively small catchments, and at the same time they are hinting at some limitations of SPFs for anticipating dry years. Our key findings are briefly summarized in Figure 8.
In this study, we compared SPFs provided by four forecasting centres (ECMWF, UK Met Office, Météo France, DWD) and showed that they have considerably different skill. While worldwide data from different forecasting centres is readily available, this type of analysis is lacking in the literature and generally a certain forecasting centre (or system) is selected without justification. Our results showed that ECMWF generally provides the  (Figure 4). A possible explanation for these results is the significant difference in ensemble size used by these forecasting centres (25 or 51 for ECMWF, 7 or 2 for UK Met Office, depending on years; see Table 2). Previous studies have shown that increasing the ensemble size has the potential to improve the forecasts skill (Mullen & Buizza, 2002;Smith et al., 2015). Further analysis of our datasets confirms this relationship: indeed, when the ensemble size of ECMWF forecasts is reduced to that of UK Met Office, the difference in skill almost disappears ( Figure S8). It should be noted that, in this study, we only considered ensemble SPFs provided on the first day of each month, but the UK Met Office provides daily updates to their SPFs. Hence future study may look at whether including these updates to progressively increase the UK Met Office ensemble size over the month, might increase their forecast skill. Another explanation for the difference in skill (for the similar ensemble size, as in the comparison between ECMWF, Météo France and DWD) is the different underlying GCMs and initialization approaches (again see Table 2). For example, the higher horizontal resolution (36 km) of ECMWF's Atmosphere and Land surface model could be responsible for the better skill given the previous findings that using finer resolution can improve the forecasting performance (Mullen & Buizza, 2002). However, other features of the model may be equally important, and it is very difficult to explain which specific features of the ECMWF system led to a higher skill in a particular study area.
SPFs provided by ECMWF showed higher skill than climatology for up to 2 months of lead time. This result generally corroborates previous findings in the literature suggesting that the SPFs are more skilful than climatology for the first (or second) month of lead time only, and that skill deteriorates as the lead time increases (e.g., Roy et al., 2020 for global scale; Tchinda et al., 2022 for Central Africa). Furthermore, our results showed that the forecasts skill varies with season and is much higher in the wet season than the dry season, which is consistent with the studies by Kolachian and Saghafian (2019) for Iran, Bett et al. (2020) for China and Roy et al. (2020) for global scale.
Our results also confirm that there exist systematic biases in SPFs, consistently with previous literature, in which regional systematic biases in SPFs have been reported (e.g., Dunstone et al., 2016;Ehret et al., 2012;Weisheimer & Palmer, 2014). In addition, we found that bias characteristics vary with season, as also found in Ogutu et al. (2016), with systematic underestimation of precipitation during the wet season and systematic overestimation in the dry season, as illustrated in Figure 9. Note that as a consequence of this underestimation during the wet season of dry years, forecasts in this season can actually be quite skilful.
Because of these systematic biases, forecast skill is generally improved by linear bias correction, as also found in previous studies (e.g., Charles et al., 2013;Crochemore et al., 2016;Ferreira et al., 2022). The exception is the wet season of dry years, where bias correction F I G U R E 7 Two left panels: relationship between ENSO indices ((a) MEI, (b) SOI) from December to May and accumulated precipitation during the coming wet season (June-September). Two right panels: mean precipitation ratio (MPR) between SPFs (ECMWF) before bias correction and climatology in the 4 months prior to the wet season (over the entire ensemble members (c) and over the 5 ensemble members having the lowest aggregated precipitation (d)). In all panels, data are averaged over the 20 catchments and each symbol represents a respective year from 1993 to 2020 [Colour figure can be viewed at wileyonlinelibrary.com] could be detrimental because it increases the forecasted precipitation away from the observations (see Figure 9). Therefore, assuming that forecasts should be particularly useful for water management during dry years, it would be beneficial not to apply bias correction during the wet season. We also attempted at linking skill in specific catchments to catchment characteristics but found no significant correlation in general. This is consistent with previous studies that found no significant relationship between catchment characteristics (annual mean precipitation) and forecast skill (e.g., Kolachian & Saghafian (2019) for Iran). However, at the same time we also found positive correlation between skill and annual precipitation in the dry and wet-to-dry seasons.
Although SPFs provide skilful forecast over the climatology during the wet season of dry years, we found that the limitation lies in anticipating dry years. A previous study also investigated to show if SPFs from ECMWF have the potential to predict extreme dry conditions over South America (Ferreira et al., 2022), and found that this potential is shown mainly in the tropical sectors having strong connection with ENSO (e.g., Weisheimer & Palmer, 2014;Shirvani & Landman, 2015 for Iran). Our result also shows that both SPFs and ENSO failed to capture a reliable signal of a dry year in South Korea. Since it is demonstrated that the correlation between precipitation in wet season and ENSO is insignificant in South Korea (e.g., Ho et al., 2016), it stands to reason that SPFs also have limitations for anticipating dry years considering a close-knit relationship between ENSO and SPFs. Thus, to apply SPFs for anticipating dry years, technological improvements for the regions with weak ENSO teleconnections should be carried out.

| Limitations and perspective for future research
Our study posed some meaningful findings on SPFs; however, we are also aware that our research may have some limitations. First, in this study we adopted the CRPSS method to assess the skill of SPFs, but the skill assessment results can vary depends on what assessing method we use. Therefore, using multiple skill scoring methods instead of one may provide with a better understanding of how good the forecast skills are.
Second, our findings are confined to South Korea and cannot be directly transferred to other countries. However, along with this paper we are sharing an open-source python package, named SEAsonal FORecasts Management (SEAFORM; https://github.com/uobwatergroup/ seaform.git) and Jupyter Notebooks to enable others to replicate our skill assessment to other parts of the world. For example, it would be worth investigating whether bias-corrected SPF data are more skilful in regions with lower interannual variability, or where precipitation exhibits a stronger correlation to the ENSO signal. The package can also be applied to analyse other weather variables beyond precipitation, such as temperature, evaporation, etc. which are provided through Copernicus Climate Data Store. Equally, the package facilitates future revision of the forecast skill assessment as new forecast product become available.
Lastly, this study primarily focuses on the skill of precipitation forecasts and yet does not address their practical value, that is, the usefulness of forecasts in informing decisions (Anghileri et al., 2016). This is not a given as skilful forecasts may not fully translate into improved operational performance (Boucher et al., 2012;Chiew et al., 2003). Our future work will thus concentrate on extending the skill assessment of SPFs to seasonal streamflow forecasts and evaluating the value of seasonal forecasts for assisting water management decisions, such as reservoir operations.

| CONCLUSION
This work presents an analysis of forecast skill of seasonal precipitation forecasts (SPFs) at catchment-scale over South Korea for the period from 2011 to 2020. We found that ECMWF's SPFs are the most skilful, and generally outperform climatology for up to 2 months of lead time. Linear bias correction is generally useful to correct systematic seasonal biases whereas we found no significant correlation between the catchment characteristics and forecast skill. SPFs showed the highest skill during the wet season in dry years, where they outperform climatology at all lead times up to 6 months, nevertheless limitation lies in anticipating if the year will be dry.
Although these specific results are only valid for the studied area, we also can draw some more general conclusions. First, we found that forecast skill vary greatly across forecasting centres and that SPFs contain specific seasonal systematic biases. Hence, we suggest to pre-evaluate the skill using datasets from diverse forecasting centres to check which centre has the highest skill for the region of interest, and to understand their systematic biases. Second, consistently with previous literature, we also found that the skill of SPFs varies with lead times, seasons, in dry and wet years, and with bias correction. Thus, comprehensive consideration being given the features and factors affecting the skill is essential in applying SPFs.
At the outset of this paper, we posed six questions regarding the skill of SPFs. Addressing those questions can help determine the features of the forecast skill and thus inform better use of SPFs in a given region. We hope that the workflow we developed to address these questions, as well as the open-source SEAFORM package that implements it, will be constructive in assessing and applying SPFs not only in South Korea but also in other countries.