Climate change projections and extremes for Costa Rica using tailored predictors from CORDEX model output through statistical downscaling with artificial neural networks

Despite intense research on climate change (CC), regional studies for Central America, which is considered a CC hot spot, remain scarce. The information provided by general circulation models (GCMs) is too coarse to accurately reproduce local‐scale climatic features, which are needed for impact assessment. Thus, downscaling techniques are employed to address this scale mismatch. Costa Rica is the present case study, for which suitable predictors were tailored for downscaling related to regional climatic characteristics, such as the Inter‐Tropical Convergence Zone, El Niño Southern Oscillation, the Caribbean Low‐Level Jet, and the Mid‐Summer Drought. Statistical downscaling models were calibrated for precipitation, maximum and minimum temperature, using the perfect prognosis methodology by means of station data, ERA‐INTERIM reanalysis and artificial neural networks, yielding satisfactory results. As found in several studies, the temperature models replicated more accurately the statistics of the observed datasets. However, here, through the implemented approach and the tailored predictors, the precipitation models conveyed an improvement compared to standard methods. Projected daily climate was obtained employing CORDEX data under the RCP8.5 scenario for the central region of the country. Overall, the changes in climate estimated by the end of the 21st century agree with coarser‐scale projections. Finally, projected climate extremes indices were calculated and rendered further details on the intensity of future CC by the end of the century.


| INTRODUCTION
Central America (CAM) has been identified as "the main emerging tropical hot-spot", with a reduction in total precipitation and an increase in its variability by the end of the 21st century, due to climate change (CC) (Giorgi 2006). The latter could exacerbate existing socioeconomic struggles and affect the high biodiversity present in the developing countries situated on this isthmus. Aguilar et al. (2005) compared data from more than 200 CAM stations between 1961 and 2003 and found that the area is certainly warming in recent decades. Temperature extremes also vary accordingly. Additionally, three countries in the region (Honduras, Nicaragua, and Guatemala) were among the top 10 in the long-term climate risk index (Kreft et al. 2016).
CAM is a narrow isthmus with a very spatially variable climate due to factors such as the steep mountain range that extends almost across its full length, the Inter-Tropical Convergence Zone (ITCZ), the Mid-Summer Drought (MSD), El Niño Southern Oscillation (ENSO), and the Caribbean Low-Level Jet (CLLJ). The convergence of the trade winds near the Equator results in a low-pressure zone, which coupled with almost constant solar radiation throughout the year causes humid air to rise in the region, triggering a great number of thunderstorms and precipitation along the ITCZ band (Barry and Chorley 2009).
The region possesses a well-defined rainy season between May and October (Hastenrath 1967), which is roughly when the ITCZ crosses the area. This period exhibits a bimodal distribution with two precipitation peaks during May-June and September-October, with a relative minimum between July and August (Magaña et al. 1999). This relatively dry period is called MSD.
There have been multiple efforts to explain and categorize ENSO (e.g., Walker, 1932;Troup, 1965;Bjerknes, 1966). Wolter (1989) found it to be the most influential circulation mode, even outside the Pacific Ocean. In CAM, El Niño is related with droughts, loss of crops and cattle, while the counterpart of the phenomenon, La Niña, is associated with heavy rainfall and floods (Intergovernmental Panel of Climate Change (IPCC) 1997).
The CLLJ has shown to be a dominant climatologic feature of the region (Amador 1998;Magaña et al. 1999;Whyte et al. 2007;Wang 2007;Amador 2008). It is defined as the maximum speed of easterly wind at the 925 hPa level in the Caribbean region. The CLLJ possesses a semi-annual variation with peaks in boreal winter and summer (January and July), and minimums in spring and fall. During July the Caribbean basin experiences a maximum of the CLLJ and sea level pressure (SLP), combined with a relative minimum of precipitation (MSD) and minimum tropical cyclogenesis (Wang 2007).
The study area is Costa Rica (CR), which is a country with several climate zones, complex terrain, different precipitation characteristics, high biodiversity contrasts within small distances and an energy matrix consisting of approximately~75% hydropower, thus highly dependent on precipitation. The focus region is the country's Great Metropolitan Area, where more than half of its population resides. Consequently, it is necessary to understand the multiple implications of CC in the region.
General circulation models (GCMs) are the most powerful tools for building future CC projections (Huebener et al. 2007). The GCMs employed in climate studies and projections are computed at a spatial resolution of few hundred kilometres and possess significant differences when compared to the observations (Flato et al. 2013), which renders their information unsuitable as input for impact models (von Storch 1995). Because of this coarseness, coupled with limitations in the representation of small-scale processes and the heterogeneity of their boundary characteristics, GCMs cannot precisely describe certain characteristics of regional and global climate (Hidalgo and Alfaro 2014). It is therefore necessary to resolve this spatial discrepancy through a downscaling methodology, named dynamical and statistical. The goal of downscaling is to project the synoptic and large-scale information embedded in low resolution variables to the regional and local scale (von Storch et al. 1993), whereby its result represents climate more accurately while accounting for regional characteristics such as local topography.
Dynamical downscaling (DD), or nested model downscaling, refers to the use of initial and boundary conditions from GCM output to drive high resolution regional climate models (RCMs) (Hallett 2001). Statistical downscaling (SD) uses the GCM or RCM output as predictors of the locally observed data (predictands) by applying statistically derived relationships to obtain the future climate projections (CPs). This linkage is obtained through a transfer function (TF) between scales (Hewitson and Crane 1996). SD requires lower computing power, compared to DD. Multiple SD methods have been described in order to tackle this scale mismatch issue (e.g., von Storch et al., 1993;Hewitson and Crane, 1996;Wilby and Wigley, 1997;Cavazos and Hewitson, 2005).
High resolution CPs are needed to accurately evaluate and rapidly adapt to CC and its likely consequences on crucial subjects such as water availability, agriculture, society, economy and biodiversity. Significant efforts have been made in the region to reduce the scale at which the CPs are available. The Coordinated Regional Climate Downscaling Experiment (CORDEX 2018) offers multiple RCM output variables based on CMIP5 (Taylor et al. 2012b) projections with a spatial resolution of 0.44 (~50 km) at the CAM domain. Hidalgo et al. (2013) used the Bias Correction and Spatial Downscaling (BCSD) method to downscale 30 monthly GCMs from the A1B emission scenario to a 0.5 × 0.5 grid in CAM. These models were used to project temperature, precipitation and runoff. This study found a median reduction in yearly precipitation between 5-10% and 10-30% in runoff for the period 2050-2099. It also revealed an average increase of~4 C in southern CAM (where CR is) during the same interval. Nevertheless, it was stated that the output of the study is not suitable to depict extreme weather or for impact assessment studies, due to the spatial and temporal resolution. Imbach et al. (2018) employed the Eta Regional Climate Model to downscale HADGEM2-ES projections for the period 2021-2050 assuming the representative concentration pathway (RCP) 4.5 scenario, to an 8 km resolution for CAM. An overall reduction in precipitation was found, especially during the rainy season. Precipitation extremes showed decreasing trends except in CR and Panamá. Increasing temperature was found throughout the region, being greater in the northern part of the isthmus. Stennett-Brown et al. (2017) performed station scale SD using the statistical downscaling model (SDSM) (Wilby et al. 2002) for Belize and several islands in the Caribbean. The explained variance (R 2 ) between modelled and observed predictands had values ranging from 4 to 60% for temperature and 1-11% for precipitation, which are common performance measurements for this type of model, nevertheless, they convey high uncertainty. Additionally, climate extreme indices (CEI) (Karl and Easterling 1999) were calculated and rendered an increase of consecutive dry days (CDD), increase (decrease) in warm (cool) days and nights. SDSM provides a set of predetermined predictors, from coarse GCM output, which are loaded to the program. This feature might exclude predictors that could transmit a physical forcing to the local climate. Therefore, there is a possibility to improve these models by including tailored predictors related to regional climatic characteristics. Ribalaygua et al. (2018) performed local SD at the highly biodiverse Gulf of Fonseca between El Salvador, Honduras, and Nicaragua, where an increase in precipitation and temperature was identified by the end of the 21st century. In this study, coarse reanalysis and GCM output were used to obtain the CPs. Different sets of predictors were analysed which contained mainly wind components (at the pressure levels 1,000, 700, 500, and 250 hPa), geostrophic wind components (1,000 and 500 hPa) and specific humidity at 700 hPa. To our knowledge, the last two mentioned examples are the only studies of local-scale CPs available in the region. Due to their particularities and distances, their outcomes cannot be extrapolated to CR.
The aim of this study is to apply SD at the focus region to generate local-scale CPs of temperature and precipitation for the 21st century at daily temporal resolution, using the CMIP5 climate models. In order to improve the overall performance of the methodology, a group of tailored predictors was proposed to describe the climatic particularities of the region, related to the ITCZ, MSD, ENSO, CLLJ, and convective precipitation. artificial neural networks (ANNs) were employed to build the TFs between scales, using reanalysis data to fit the observed station data. Several ANNs models were trained, under several conditions, for each predictand. The best-performing models were selected, according to their performance emulating the historical data, by means of statistical metrics and suitable skill scores.
The chosen TFs were coupled with the output from CORDEX to project the local climate. This specific set of CPs were employed because it has been found that RCMs reproduce, due to their higher spatial resolution, better than GCMs, the topographic features and the variability of observed temperature and rainfall (Karmalkar et al. 2011;Onyutha et al. 2016). The daily CPs allow a comprehensive survey of the forthcoming climate conditions, which combined with a CEI analysis, renders valuable information for decision makers and engineers regarding water availability, agriculture, energy production, and general impact assessment studies. This paper is structured as follows: Section 2 presents the employed station data, reanalysis and GCMs/RCMs. In Section 3 we describe the downscaling methodology, the selection of predictors, the model used to create the TFs, the tools employed to analyse the data and the experimental workflow. Section 4 presents the results and discussion related to the model calibration process, the choice of models and predictors, the CPs, the extreme climate analysis, and the advantages and constraints of the used methodology. Lastly, Section 5 renders a summary of the investigation, its conclusions and further research outlook.

| DATA
The observed station data used in this study were supplied by the Costa Rican National Meteorological Institute (IMN). Daily observations of maximum (Tmax) and minimum temperatures (Tmin) and precipitation were provided for various stations in the central region of the country, from which an excerpt was taken to apply the SD methodology (see Table 1).
Overall, the temperature shows relatively small changes throughout the year, which is characteristic of the tropics. Nevertheless, a minimum is observed during boreal winter (see Figure 1). For precipitation, a bimodal distribution is seen through the rainy season with a relative minimum around July and August (MSD) along with the dry season from November to April.
The reanalysis data used in this project is ERA-INTERIM (Dee et al. 2011), with a spatial resolution of 0.75 . It provides data from 1979 and will be continuously updated until 2019. Therefore, the observed station data employed was from 1979 until 2017, depending on the record length of each station (Table 1). Several climate variables were extracted for its use as predictors (described in Section 3.1.1) to generate the TFs.
The projected predictors were obtained from the dynamically downscaled models of the CORDEX initiative available for the CAM domain. Ten CORDEX CAM runs are available with daily temporal resolution, which were downscaled using the RCM models RCA4 and RegCM4-3 (CORDEX 2018) to a spatial resolution of 0.44 . This domain embodies the entire area of interest of this study. The boundary GCMs were selected due to their proven skill modelling near surface temperature, precipitation and ENSO teleconnections, according to Hidalgo and Alfaro (2014), taking into account their availability in the online databases (ESGF) (Cinquini et al. 2014).
Two CORDEX runs were selected, the first is based on the model developed by the EC-EARTH consortium, run label r12i1p1, overall ranked #14 (Hidalgo and Alfaro 2014). The second is based on MPI-ESM-LR, provided by the Max Planck Institute for Meteorology, run label r1i1p1, ranked #15 (Hidalgo and Alfaro 2014). Both GCMs were downscaled with the RCA4 model of the Swedish Meteorological and Hydrological Institute (SMHI). For further details on the run labels see Taylor et al. (2012a). The analysed RCP scenario is the 8.5 WÁm -2 . The SMHI provided additional variables not available in the databases, such as zonal wind at 925 hPa.

| Statistical downscaling
SD implies establishing quantitative relationships between large or mesoscale atmospheric variables (predictors) and local surface variables (predictands) (Wilby et al. 2004). Most SD methods create a link between the predictors x and the predictands y by a statistical model F (.) or TF (Maraun and Widmann 2018). The calibration of the model is done with the observed data y obs . The simulation of the local climate y sim is obtained through the use of the model F(.) and a simulated array of predictors x sim . Here, a perfect prognosis (PP) approach was used to generate the TFs, which requires temporal correspondence between predictors and predictands, therefore the need of reanalysis data.
In order to properly use the PP methodology, the following assumptions (Maraun and Widmann, 2018, p. 142) should be satisfied: (1) The predictors have to be realistically and bias free simulated. Also, under CC conditions, the response of the predictors to external drivers should be credibly represented.
(2) Predictors that explain a large portion of the local variability on the temporal scale of interest should be selected. For CPs, it is necessary to include all relevant predictors that determine future changes. (3) The influence of the predictors on the predictand, and also the interaction between predictors, needs to be incorporated in the downscaling model. Under CC conditions, it is likely that unseen climate will happen, thus, the model should allow for at least moderate extrapolation. Therefore, suitable predictors that explain the statistical climate aspects of interest are required.
The last two premises cover what is known as timeinvariance or stationarity assumption: "if all predictors relevant for CC are included, and their influence on the predictand is sensibly modelled, also beyond observed states, the model is valid in a future climate" (Maraun and Widmann 2018). In practice, these models are merely estimations of the mesoscale processes, resulting in an approximation of the time-invariance. Also, the assumption that future predictors should be in the range of the observed predictors has been proved to be invalid for key predictors (Wilby et al. 2004). Thus, the assumption that the model is appropriate for moderate extrapolations is made. The following subsections describe the selection of predictors and the TF.

| Predictor selection
The selection of predictors depends on the region and season to be downscaled, in order to properly convey the basic requirements for a predictor: informative and a highly predictive power (Maraun et al. 2010). Under CC conditions measures of humidity are needed to obtain the change of the water-holding capacity of the atmosphere due to global warming (Wilby and Wigley 1997). Another factor to take into account is the availability of the desired predictors in the GCMs/RCMs databases, since not all the desired variables at the different pressure levels can be obtained. This reduces the range of predictors to be selected. A group of predictors based on Cavazos and Hewitson (2005) was selected, where predictors for both temperature and precipitation for different regions in the world were analysed. Vertical wind was discarded in this study since it is not available in the CORDEX database. Also, since a large portion of the local precipitation is related to convective processes, tropospheric stability indices (TSI) will be employed as predictors. TSI are indices that provide simple metrics to characterize tropospheric conditions that promote convection (Barfus and Bernhofer 2015). In this study, three different TSI were calculated and used as predictors in the SD procedure. The indices are the total totals index (TT), the vertical total index (VT) (Miller 1972), and the severe weather index (SWEAT) (Miller et al. 1971).
Lastly, a group of predictors related to regional climatic characteristics, such as the CLLJ and ENSO, was tailored for the study region. To include the influence of the CLLJ, the averaged zonal wind at 925 hPa, sea surface temperature (SST) and sea level pressure (SLP) were calculated, within the area 12.5-17.5 N and 80-70 W, as defined by (Wang, 2007, Figure 2). Since this area is several hundreds of kilometres away from the study region, a predictor with a one-day delay in the zonal wind (925 hPa) was included to compensate the distance travelled by the CLLJ. Wolter and Timlin (1993) defined the multivariate ENSO index (MEI) as the first principal component of six observed fields: SLP, zonal and meridional surface wind components; SST, near-surface air temperatures; and total cloudiness. It is calculated on a bi-monthly basis between the coordinates 5 N-5 S and 170-120 W. Due to the MEI's time scale, its relative low percentage of total explained variance (Wolter and Timlin 1993), and the great distance from CR, a region between the Equator and the Pacific coast of CR was proposed to directly evaluate the influence of the six variables that compose MEI on the predictands. This region is defined by the coordinates 4-9 N and 89-84 W, shown in Figure 2 as Pacific. The purpose of this selected area is to obtain predictors that contain information about the large and mediumscale climatic drivers related with both ENSO and the ITCZ. Table 2 contains the details of the raw surface and atmospheric predictors, and in which set it was employed, hereafter known as predictor set one (PS1) F I G U R E 1 Yearly plots of daily observed station data. In case of precipitation, a square root transformation was applied to the y axis and two (PS2). The final selection of predictors, consisting of 38 different ones, is the union of the predictors shown in Table 2 with the lagged zonal wind at 925 hPa in the CLLJ region (used in PS1), the tropospheric stability indices (PS1) and the geometrical thickness between the 850 and 500 hPa levels (th8) used in both PS1 and PS2. PS1 contains 35 predictors and PS2, 24, which is based only on Cavazos and Hewitson (2005). A large number of the predictors in PS2 are also employed in many other studies (e.g., Goly et al., 2014;Ribalaygua et al., 2018).
In addition to the PSs described, a third set of predictors (PS3) was employed for each predictand. This set consists of the 12 highest correlated predictors, from the The columns 1 and 2 refer to the predictor sets PS1 and PS2, respectively. Abbreviations: T, temperature; U, zonal wind; V, meridional wind; C, averaged in the CLLJ area; P, averaged in the proposed region in the Pacific Ocean; L, locally interpolated to each station. overall set, for each individual predictand. Thus, there were numerous combinations of PS3.

| Transfer function
The predictor-predictand relationships were modelled using ANNs, which have demonstrated to adequately determine empirical relationships between atmospheric variables and surface climate parameters (e.g., Hewitson and Crane, 1994;Cavazos, 2000). Nevertheless, some mechanisms that favour precipitation belong to smaller scale processes than the ones captured by GCMs/RCMs, hence, a parametrization of these phenomena is performed. Therefore, the linkages derived with ANNs are affected by this generalization. This renders known issues reproducing the statistics and occurrence of precipitation (Coppola et al. 2006). The ANNs models ran in R language (R Core Team 2018) employing the "neuralnet" R package (Fritsch and Guenther 2016). The dataset was divided in 90% to calibrate the ANNs and 10% independent validation data, known as Cal/Val datasets. Since there are no general rules on ANN architecture, several combinations were tested to optimize the models.
The activation function employed was Sigmoid, which enables the linear architecture nature of ANNs to model nonlinear functions, such as precipitation. The function used to compute the error is the sum of squared errors (SSE). The ANN learning algorithm used was resilient backpropagation with weight backtracking (Rprop) by Riedmiller (1994); which has shown to yield faster convergence and higher accuracy than backpropagation (Prasad et al. 2013), due to its resilient weight updatevalues routine. This algorithm is not based solely on a learning rate, to avoid the possibility of skipping a local minimum, including increase and decrease parameters.
Scaling the input dataset to a common range reduces the convergence time of the learning algorithm. Numerous normalization methods are employed to optimize the ANNs, such as min-max, standardization, median, MAD (median absolute deviation), and tanh-estimators (Hampel et al. 1981). The tanh-estimators are reported to be robust, highly efficient and not sensible to outliers (Jain et al. 2005).
The normalization known as modified tanhnormalization was applied (Latha and Thangasamy 2011). It does not calculate the Hampel estimators, instead the mean and standard deviation of the variable are used, which is a simpler and faster method. This normalization does not oblige the maximum value to be equal to 1, which provides advantages when extrapolation is required by the TF, contributing to satisfy the third assumption of PP. Equation (1) represents the estimator applied to the input layers, where x is each individual value, μ the mean and σ the standard deviation of each layer.
ANNs have proved to possess good predictive power compared to traditional models. Yet, their interpretation remains challenging, requiring an understanding of the independent effects of each predictor on the predictand. Intrator and Intrator (2001) developed an approach to interpret the ANNs, coined as generalized weights (GWs), which are, in a simplified manner, comparable to the weights of a multiple linear regression. Thus, the GWs were calculated and utilized to explain the influence of the predictors on each model.

| Model evaluation
To evaluate the performance of the ANNs, several metrics were calculated to assist the model selection process. The selected metrics are: BIAS, mean absolute error (MAE), mean squared error (MSE), Pearson correlation coefficient (R), normalized standard deviation (σ ), pierce skill score (PSS, Pierce et al., 2009, Equation (2)), which was conceived to measure the similarity between two climate patterns, and the Taylor skill score (TSS, Taylor, 2001, Equation (3)). In the following equation m and o stand for the mean of the modelled and the observed values, respectively, while σ o represents the standard deviation of the observed data.
Taylor diagrams (TDs) were also employed to evaluate the behaviour of the calibrated models. TDs supply a concise statistical summary of the performance of the modelled values against the observed data, based on their correlation, root-mean-squared error and ratio of their standard deviations. For further details see Taylor (2001).

| Climate extreme indices (CEI)
The expert team on climate change detection and indices (ETCCDI) defined a set of climate indices that provide a detailed statistical summary of precipitation and temperature, centred mainly on their extremes (Karl and Easterling 1999). Table 3 presents the complete set of 27 indices, although some indices are irrelevant for the study region, such as frost and ice days. The calculations were done using the R package "climdex.pcic" (Bronaugh 2015).

| Workflow
The datasets were split by seasons according to the regional climate: the dry season from November to April (NA), the first peak of the rainy season from May to June (MJ), the MSD between July and August (JA), and the highest peak of the rainy season from September to October (SO). The models were also trained annually to evaluate the influence of the proposed predictors on both temporal aggregations. The divided datasets of observed data were then merged with the three PSs from the ERA-INTERIM reanalysis described in Section 3.1.1, independently, to train the ANNs under 10 random training datasets and six different architectures (as represented in Figure 3) for each variable and station.
Subsequently, employing the methods to evaluate the models (Section 3.2), a group of top performing models was selected. The TFs embedded in these models were coupled with the predictors extracted from CORDEX to project the climate at a daily level until the end of the 21st century. Then, further evaluation of the hindcast period, using the CORDEX datasets, was done to discern the most suitable models per predictand to calculate the CEI.

| Model calibration and selection
The final selection of models was done in a two-step approach, for both seasonal and yearly temporal aggregations. First, the models were ranked according to their PSS. Then, a revision using the methods described in Section 3.2 was done, to ensure the selection of the best model. The latter was modified only in the cases where a sensible difference in the results was expected, without sacrificing performance. The PSS ranking based selection predominantly found the overall best skilled TF. However, three model selections were changed after the inspection, being BIAS the main reason to change the selection. Usually, a decrease in the BIAS was coupled with an improvement of TSS, MAE and/orσ . This was done to remove any persistent differences from the future projections, hence, misleading results. Table 4 presents the metrics for the case of precipitation at station 84003 during MJ. In this example, the model first ranked as #2 was selected: PSS for #2 is slightly lower, 0.169 versus 0.166, which renders no practical difference, but its BIAS is considerably lower, 1.13 versus 0.18 mmÁday -1 , which can affect greatly the projected results.
Higher reliability is expected from unbiased models, therefore, the aforementioned model preselection was modified. Also, the TSS rendered a slightly better result, which was calculated using R 0 (Equation (3)) as the mean of all correlation values for each temporal aggregation per predictand. Observe how PS1, which includes the TSI and the proposed tailored predictors in the Pacific and the Caribbean, was present eight out of 10 times in the best-performing models for this example, which can be interpreted as an overall improvement of the skill when using the customized predictors. The latter applies for most of the seasonal calibrated models, where all three PSs were calibrated. The remaining model selection changes were done during SO for precipitation for station 84003 and for Tmax at station 73018. Additionally, it is worth pointing out that an averaging of the ANNs weights for the different runs under the same HLA was also evaluated. Nonetheless, in most cases this approach rendered noisy results with remarkably low skill when compared to the presented results. Table 5 comprises the characteristics and metrics of all the selected models. From the 36 seasonal models, 23 were trained with PS1, 10 with PS2 and 3 with PS3. Because of the latter and the aforesaid general improved performance with PS1, coupled with the fact that the training time of the yearly models was significantly longer, due to their substantial larger datasets, only PS1 was calibrated for these models. Furthermore, 29 of the selected seasonal models were trained using a single hidden layer, probably due to the simplification induced by grouping the predictands into similar climate patterns (seasons). Thus, the learning algorithm did not require intricate functions to accurately reproduce the data. In contrast, for six out of nine final yearly models the ANN architecture had two hidden layers, which could be interpreted as an effort of the algorithm to replicate the data by adding complexity to the TF, since the complete dataset was used for the calibration of these models.
Additionally, Figure 4 shows the TDs of the final selection of models by temporal aggregation and by station, simplifying the comparison between the models. In general, the models performed better for NA, for example, the PSS during NA was significantly higher than for the remaining seasons, specially for precipitation at station 84003 where its PSS was 0.516 and from 0.102 to 0.185 for the other seasons (see Table 5). This improved skill for precipitation during NA can also be appreciated in Figure 4c, as a shorter distance between the symbols representing these models and the reference value (È), when compared to the other seasons. The latter applies also for Tmax ( Figure 4a) and Tmin (Figure 4b), which is also reflected in the values of R andσ. These results suggest that the relative higher atmospheric stability of this period facilitates its modelling. Nevertheless, it is essential to note how for all models the variability of the predicted data was lower than that of its observed counterpart (allσ < 1). This perceptible deficiency of the models could affect the results of the CPs and the CEI. Typically in the cases of Tmax and Tmin the seasons related to the rainy period had similar PSS, R andσ , except for Tmax during JA at 84003 and 84018, and Tmin during SO at 84003, where a relative lower skill was found. For precipitation, SO-related models had considerably lower skill. SO overlaps with enhanced tropical cyclone activity in the Atlantic, which increases atmospheric instability. The latter translates to greater variability of the observed data, thus, challenging modelling conditions for the ANNs.
Some models selected for precipitation presented substantially high BIAS, for example, station 73018 during MJ and SO (−0.637 and 0.754 mmÁday -1 , respectively). Nevertheless, their selection was maintained because the models with significant improvement in BIAS had serious deficiencies in other metrics, for example, ranked #28 and #44 (BIAS = 0.093 and −0.173 mmÁday -1 , respectively). Ribalaygua et al. (2018) obtained also for CAM similar values of MAE and BIAS for the seasonal precipitation models, being MAE = 6.2 and BIAS = −1.80 mmÁday -1 the worst case scenarios. These metrics were only available for precipitation. However, the measurements presented in Table 5 for Tmax and Tmin revealed significant improvements compared to precipitation.
Despite their inaccuracy, the yearly models had a fair performance (see Figure 4d) compared to the analogous study in the region. Although CR is not within the studied zones, Stennett-Brown et al. (2017) achieved, for yearly models, a maximum R 2 of 53, 60 and 11% while the equivalent values obtained in the present study would be 61, 73 and 29% (Tmax, Tmin, and precipitation, respectively), as derived from Table 5. The mentioned paper presents only the ranges of BIAS,σ and R 2 , therefore, is difficult to make additional comparisons. Nevertheless, a perceptible improvement was obtained, especially for precipitation, with the proposed predictors and methodology employed. Moreover, the metrics of the yearly models were mostly better than for seasonal models.

Note:
The short names used for the predictors correspond to the ones in Table 2. Note that the pressure levels and/or region of the predictors (C and P) were concatenated to the short names.
GWs symbols based on the explained variance per predictor: However, it is crucial to further evaluate their ability to mimic their variability, extremes and temporal allocation, as discussed in Section 4.4.

| Predictors
Following the selection of the final TFs for the CPs, a ranking of the predictors with higher influence on each model was made using the GWs as shown in Table 6, which is complementary to Table 5. One of the most outstanding results of the seasonal precipitation models was the case of station 84018, during SO. This model was obtained with PS3 (only the 12 highest correlated predictors) and a rather simple ANN architecture consisting of one hidden layer with six neurons (see Table 5). This means that the precipitation process is, compared to other models, easily explained by few predictors and a straight-forward TF for this specific season and location. The top six predictors for this example are specific humidity at 500 hPa (q500 with 16.7% of averaged explained variance), zonal wind at 850 hPa (U850 ! 16.2%), near-surface temperature (T2m ! 12.7%), and meridional wind at 700 hPa (V700 ! 12.3%), all at the local level, and total cloud cover and zonal wind, at the proposed region in the Pacific Ocean in front of the coast of CR (TCC-P ! 10.5% and U-P ! 8.7%, respectively), accounting for~77.1% of the total variance. The last two predictors reveal the synoptic correlation between the humidity conveyed by clouds in the Pacific, their movement towards CR and precipitation events. Both specific humidity at the mid-troposphere and near-surface temperature are related to vertical motion and convective processes that ease precipitation (Cavazos and Hewitson 2005). Also, the most influential predictor, q500, gives a measure of the amount of precipitation for wet days (Holton and Hakim 2012), which is crucial in the rainy season. The influence of U850 can be interpreted as a conveyor of humidity from both oceans to mainland.
During NA it was observed for all variables that specific (q) and relative humidity at the surface (rh), together with tropospheric thickness (th8), play a major role in the models. The latter intensifies in the case of precipitation, or its absence during this season, where these three predictors are among the seven more influential for all stations. Relative humidity is suited to conclude if precipitation will occur on a given day (Ribalaygua et al. 2018). This was appropriately estimated by the ANNs with rh found within the top predictors in most TFs for precipitation. Among the common top predictors for the seasonal precipitation models were: SLP-P, and local: rh, q500, th8, and geopotential (z500, z700, and z850). SLP at the Pacific Ocean (SLP-P) was among the top four predictors during the rainy season for station 84003, which is the closest station to the Pacific coast. This finding confirms the relevance of including tailored regional predictors. Low-pressure systems tend to form in this region during this time of the year due to the shifting of the ITCZ (Hidalgo et al. 2015), hence the major role of SLP-P during the rainy season, as detected by the ANNs.
In the case of seasonal Tmax, specifically at station 73018, a great influence of the predictors calculated for the Caribbean region was found during all seasons. This station is situated in a watershed that flows into the Caribbean Sea and is also the closest to it. The latter ratifies the skill gain with the proposed predictors, specially the CLLJ index (U925-C) and SLP-C. In general, among the top predictors for seasonal Tmax were found: near surface maximum daily temperature (Tx2m), th8, SLP-P, geopotential height at different pressure levels and TT. Similar results for the most influential predictors were obtained for seasonal Tmin, exchanging Tx2m for its minimum counterpart (Tn2m). Moreover, another model belonging to PS3 was observed (MJ at 84003 for Tmin), where an HLA of only eight neurons was used, suggesting a simpler TF for this specific predictand.
The main reason for not using the principal components of grouped variables as predictors to train the ANNs is that in this study it was essential to be able to determine the added value, skill and/or influence of the proposed predictors. This traceability becomes blurred by the principal components analysis, since it summarizes the information in terms of explained variance per predictor, which turns fuzzier when coupled with ANNs. Nevertheless, for further research, we encourage to employ them and the hereby proposed predictors that demonstrated valuable skill.

| Climate projections
The CPs y sim are obtained after applying the simulated PS x sim to the TFs calibrated for each predictand. Two models were selected per each predictand: a seasonally (which contains four models) and a yearly calibrated one. There is a limited number of CORDEX RCMs available for CAM, therefore, a selection was made from the obtainable models according to Hidalgo and Alfaro (2014). The simulated outputs from two RCP8.5 RCMs were used to downscale the historical and future climate, until the end of the 21st century. Thus, a total of four projections per predictand were obtained. The projections were not averaged to represent the variability of the different combinations of TFs and RCMs. Figure 5 shows the boxplots of all the daily projected predictands grouped by period.
Most of the simulations possess an appreciable BIAS during the historical run period between observed and hindcast data. The selected ANNs models were chosen procuring an almost negligible BIAS. Therefore, this difference suggests a BIAS between the predictors of the RCM model and the reanalysis data and/or a variation of their ranges. Thus, the predictors were analysed during the historical period and sensible differences in the distribution of the datasets were found among them. Approximately only 15% of the predictors achieved a p value >.05 for both Student's t test and F test. This feature could be explained by the parametrizations of the RCMs, which could have produced data in a slightly different range than the reanalysis employed to train the ANNs. The latter could cause the noisy behaviour of some projections, where the predictand range was extended in comparison to the observed dataset, specially in the yearly models, such as the for precipitation at stations 84003 and 84018, Tmax at station 84003 and Tmin at 73018 (see Figure 5). Hence, a delta-change approach was used to compare the results of the projections.
Generally, the average temperature projections agree with coarser scale estimates. For example, Knutti and Sedláček (2013) found a mean intermodal surface temperature change of~4 C in the region at the end of the century, with high robustness for CMIP5, which agrees with the present results. However, we analysed only two different RCMs. Nevertheless, a similar warming trend, as the observed particularly for Tmin, is expected for an ensemble analysis, which in order to add reliability to the CPs, is encouraged for further research in CR and CAM. Additionally, the locally projected changes in temperature agree with the only antecedent SD studies found for the region (Stennett-Brown et al. 2017;Ribalaygua et al. 2018).
Precipitation is a much more complex variable to reproduce reliably. Knutti and Sedláček (2013) found intermodal inconsistencies in the late-century projections for precipitation during the boreal winter near the study area. CR is precisely in the confluence zone of several precipitation change ratios without intermodal robustness. Nevertheless, the study region is projected to have a yearly relative change in precipitation approximately between −10% and − 20%. The mentioned study does not show the variability during the local peak of the rainy  1979-2005, M to 2020-2050 and L to 2070-2100. In case of precipitation, a square root transformation was applied to the y axis season, therefore it is not possible to compare the annual change in precipitation. Yet, employing the seasonal models, which had a better performance and a less noisier behaviour than the yearly ones ( Figure 5), for stations 84003 and 84018, an overall change in precipitation of the same magnitude as in Knutti and Sedláček (2013) was found. At the regional level, Ribalaygua et al. (2018) found a considerable increase in precipitation during the dry season of approximately 60%, almost no difference during both peaks of the rainy season, and a slight decrease (around −10%) during the MSD at the Gulf of Fonseca. Although the distance and climatic differences between this area and the focus region are not substantial, the results for precipitation seem to be considerably divergent. Such different behaviour for precipitation in these adjacent areas is also seen in Knutti and Sedláček (2013) due to intermodal inconsistencies and low robustness, particularly during the dry season.
In the case of the yearly model-based projections for precipitation, a larger amount of extreme rainfall values is seen than in the observed data, especially with the EC-Earth model, which confirms the extrapolation capacity of the methodology. Seasonal precipitation models tend to model average daily precipitation well but lack the skill to mimic the amount of extreme events. All of the aforementioned statements should be taken into account when calculating the CEI and deeply scrutinized for future model improvements.

| Extreme climate
In order to provide reliable projected extreme climate and CEI, a detailed analysis of the performance of the ANN models for extremes was carried out for the hindcast period at station 84003, which more accurately represents the conditions where a large portion of the country's population lives. It is essential to distinguish between the noise or misrepresented past conditions and actual CC signal embedded in the projected predictors. Therefore, a set of statistical plots was arranged to summarize the model behaviour under extreme climate, as described in Figure 6. Even though the results presented there refer to the best-performing models, some deficiencies were detected, especially for the yearly models. According to their PDFs, the days with precipitation ranging from 1 to 10 mmÁday -1 and extreme values (>100 mmÁday -1 ), were greatly overestimated, unlike days without precipitation, in accordance with Dai (2006). Thus, the projections with these TFs will also lack the necessary skill to produce reliable CEI results.
On the other hand, the seasonal models yielded better skill modelling both the PDFs and the daily behaviour of F I G U R E 6 Statistical summary during the historical run period  for precipitation at station 84003. The top-left plot shows the violin plots for the observed and modelled values. The horizontal lines inside the violin plots correspond to the quantiles 25, 50, and 75%. The diamonds (♦) represent the mean values. The top-right plot displays the joint probability density function (PDF) for each of the datasets. The lower plots present the mean daily smoothed plot (thick lines) along with the 5 and 95% quantiles (thin lines). Lastly, the bottom-right plot overlaps the mean smoothed plots. A square root transformation was applied to the y axis climate. Especially, S.MPI resembles better the observed data, despite the underestimation during MJ. S.EC does not reproduce the MSD at all, while setting the peak of precipitation during these months (JA). Therefore, the model that reproduced better the statistics for this predictand is S.MPI, which was used to calculate the CEI. Since the seasonal models were calibrated separately, there were large jumps at the boundaries between seasons, which should be addressed in succeeding research under different machine learning algorithms, in order to account for these transitions.
An analogous summary is shown in Figure 7 for Tmax. Seasonal runs have higher BIAS and variability than yearly models, while the highest values are considerably lower (~32 C) for both compared to a maximum recorded value of~36 C. The yearly models yielded an overall better performance, yet, overestimate the range of Tmax. All models tend to underestimate the values during SO with a particularly noisy behaviour, which is reflected by a larger gap between the 5 and 95% quantile lines. Thus, the indices minimum and maximum Tmax (TXx and TXn) were discarded, since their results will not provide reliable information. The model that best replicates the observed statistics for Tmax is Y.EC. Nevertheless, to calculate the CEI with RCM output correspondence, Y.MPI was selected. This meant not using the best available model. Subsequently, the same summary was done for Tmin (not shown). The values obtained with the yearly models have a higher BIAS and a smaller variability than the observed data. The seasonal models were able to reproduce better the variability of the data. Abrupt changes or mismatches in the estimates nearby MJ were observed, especially in the EC-EARTH model, but also for MPI. Except for this trait, S.MPI successfully reproduces the observed dataset throughout the year. Thus, this model was used to calculate the CEI.
Hence, the CEI for station 84003 were calculated using the aforementioned models and a base-period spanning between 1979 and 2005. Figure 8 presents the boxplots of a selection of indices for the different periods analysed. There, the influence of certain traits of the selected models for the CEI calculation can be noticed, for example, the S. MPI model yielded a higher than observed average precipitation during JA and SO (see Figure 6). This is reflected in the average yearly precipitation (PRCPTOT), where the mean observed rate is 1941 mmÁyear -1 and the modelled one is 3,038 mmÁyear -1 . Please note: there was missing data in the observed dataset, which could have affected the indices.
Furthermore, the 5% quantile during SO ( Figure 6) suggests that the amount of days without precipitation decreases, which affected indices such as the number of wet days per year (R1mm with an average of 135.8 for the observed data and 166.4 for the historical run data) and simple daily intensity (SDII, 14.2 vs. 18.3 mmÁyear -1 ). Likewise, the 75% quantile of the observed data is below 10 mmÁday -1 , while the modelled one is~20 mm day -1 . Thus, the heavy (R10mm) and very heavy precipitation days (R20mm) per year indices were greatly overestimated. For instance, the observed R10mm has a yearly mean of 64.6 days and the modelled for the historical run 108 days, while the observed R20mm is 33.8 days and the modelled one is 63.7 days. Similarly, maximum 1 and 5 days precipitation are overestimated. For more details on the dispersion of the CEIs, see Figure 8.
F I G U R E 7 Same as Figure 6 but for maximum temperature at station 84003. No y axis transformation Although the mentioned precipitation indices in the historical run are mostly overestimated compared to those observed, the changes within the modelled indices among the analysed periods are noteworthy (deltachange approach). Overall, as mentioned above, lower amounts of yearly precipitation are expected. Almost all the precipitation related indices are expected to decrease over time, for example, an average 31% less PRCPTOT by the period 2070-2,100. These differences agree with Sillmann et al. (2013b) for CAM on a coarser scale at the end of the century for an ensemble under the RCP8.5 scenario.
As suggested by the projected CEI, at the end of the century there will be a significantly warmer climate in the region. The cold nights index (TN10p), which presents the ratio of days below the temperature related to the 10% quantile per year calculated from the historical run period, will be almost zero (0.05%). The cold days index (TX10p) will also decrease greatly to 1.6%, on average. While the warm nights (TN90p) and warm days indices (TX90p) will increase greatly to 90.1 and 43%, respectively.
Moreover, the amount of tropical nights (TR) reproduced by the models projects an extremely large change from an average value of 9-275.9 daysÁyear -1 at the end of the century. Also, the diurnal temperature range will decrease, due to greater warming of Tmin than Tmax, which leads to warmer overall climatic conditions. In summary, both daily Tmin and Tmax will increase, yet Tmin will have a more intense change than Tmax. These results agree with the values presented by other authors (e.g., Morak et al., 2013;Sillmann et al., 2013a,b), derived on a larger scale, both for historical and projected scenarios.

| SUMMARY AND OUTLOOK
The SD methodology under a PP approach was applied, to our knowledge, for the first time to project daily localscale climate in CR until the end of the 21st century, by means of dynamically downscaled CMIP5 models (CORDEX CAM domain) for the RCP8.5 scenario. Three different PSs, extracted from the ERA-INTERIM reanalysis, were employed to build the TFs between scales, in order to emulate observed data of precipitation, Tmin and Tmax at three locations. These functions were optimized through a large number of ANNs, employing F I G U R E 8 Boxplots of a selection of CEIs per period at station 84003 using the MPI model. Obs and H correspond to the historical run period, 1979-2005, M to 2020-2050 and L to 2070-2100. A square root transformation was applied to the y axis of the plots marked with an asterisk several combinations of architectures, temporal aggregations and random training subdatasets.
Considerable skill improvement was found with PS1, which included the proposed tailored predictors aiming to describe the regional climatic characteristics, such as ENSO, CLLJ, ITCZ, MSD and convective precipitation. Many of these predictors were calculated in sections of the oceans enveloping the study region, where synoptic and physical relationships with local-scale climate were expected and confirmed by the ANNs.
The performance of the TFs varied, although with satisfactory results. As expected, the ANNs trained for temperature had a better skill than those for precipitation. Accurate reproduction of the statistics, occurrence and amount of precipitation is a known issue, for both DD and SD, due to the parametrizations used for small-scale processes that favour convective precipitation and its associated BIAS, which affects the TFs. Nevertheless, the best-performing models provided valuable information on future climate.
Generally, drastic changes in the climate regime are expected by the end of the century, such as a considerable decrease in the yearly total precipitation and a significant increase in temperature, specially Tmin (~4 C), using a delta-change approach, which agrees with larger scale projections. In order to provide insightful data projected CEI were calculated at station 84003, located in La Argentina, Grecia, CR. The amount of cold days and nights per year at the end of the century will almost cease to exist while the warm days and nights will rise steeply by~80% and~33%, respectively. The mean number of tropical nights per year will increase by~267 days and the yearly average precipitation will decrease~31%. Since only one model at one station was analysed for the CEI, a multi-site local-scale ensemble approach is required to provide more robust and spatially detailed information for the region.
Further research employing vanguard machine learning algorithms is intended in order to enhance the performance of the TFs, particularly for extreme events of both precipitation and temperature, which are essential for CC impact evaluation and where there is potential for improvement. Moreover, additional DD efforts with higher spatial resolution and number of models, such as those available for other domains, are required to strengthen statistical robustness, awareness and mitigation capabilities towards CC in this and other overlooked regions.
The combined effects of the aforementioned changes of the three climatic variables analysed could lead to serious issues concerning water availability, food security, socioeconomics, energy production, and biodiversity conservation. The expected reduction in annual precipitation and greater evaporation due to rising temperatures will diminish the runoff and thus, the energetic potential of a country that relies heavily on hydropower. But primarily, it is imperative to assess the probable repercussions that anthropogenic CC imposes on the invaluable~6% of the global biodiversity contained in Costa Rican territory, in order to urgently implement adaptation schemes.